prblem with space after "ś", "ć" ERROR: UnicodeError: invalid character index

62 views
Skip to first unread message

program...@gmail.com

unread,
Oct 20, 2016, 1:34:03 PM10/20/16
to julia-users
What wrong  ?
julia> o=open("string.txt","w")
IOStream(<file string.txt>)

julia> write(o,b)
12

julia> close(o)

julia> o=open("string.txt")
IOStream(<file string.txt>)

julia> temp=readline()

"\r\n"

julia> temp=readline(o)
"sa sdś aa,1"

julia>

julia> temp[6]
'ś'

julia> temp[7]
ERROR: UnicodeError: invalid character index
 in slow_utf8_next(::Array{UInt8,1}, ::UInt8, ::Int64) at .\strings\string.jl:67
 in next at .\strings\string.jl:92 [inlined]
 in getindex(::String, ::Int64) at .\strings\basic.jl:70

julia> temp
"sa sdś aa,1"

julia> eltype(temp)
Char

julia> typeof(temp)
String

julia> temp[3]
' '

Paul

program...@gmail.com

unread,
Oct 20, 2016, 1:35:41 PM10/20/16
to julia-users
Julia ver Version 0.5.0 (2016-09-19 18:14 UTC) 64 bits
File is UTF8 without BOM

Kristoffer Carlsson

unread,
Oct 20, 2016, 1:43:22 PM10/20/16
to julia-users

program...@gmail.com

unread,
Oct 20, 2016, 1:43:25 PM10/20/16
to julia-users
string.txt

Kristoffer Carlsson

unread,
Oct 20, 2016, 1:43:47 PM10/20/16
to julia-users
Specifically,

  • Conceptually, a string is a partial function from indices to characters: for some index values, no character value is returned, and instead an exception is thrown. This allows for efficient indexing into strings by the byte index of an encoded representation rather than by a character index, which cannot be implemented both efficiently and simply for variable-width encodings of Unicode strings.

program...@gmail.com

unread,
Oct 20, 2016, 1:49:36 PM10/20/16
to julia-users
Big Thx but how to work with this data ?
Paul

Ismael Venegas Castelló

unread,
Oct 20, 2016, 2:06:26 PM10/20/16
to julia-users
You can use collect(graphemes(s))[3], for example, see also this SO question and it's answers:

http://stackoverflow.com/questions/39501900/truncate-string-in-julia/39505998#39505998

program...@gmail.com

unread,
Oct 20, 2016, 2:28:28 PM10/20/16
to julia-users
Thx, i slower but works:) and next question : ho to find Char in collection ?
julia> str="abc"
"abc"

julia> findin(str,"c")
1-element Array{Int64,1}:
 3

julia>

julia> temp=collect(graphemes(str))
3-element Array{SubString{String},1}:
 "a"
 "b"
 "c"

julia> findin(temp,"c")
0-element Array{Int64,1}

julia> findin(temp,'c')
0-element Array{Int64,1}

julia> find(temp,'c')
ERROR: MethodError: no method matching find(::Array{SubString{String},1}, ::Char)
Closest candidates are:
  find(::Function, ::Any) at array.jl:1082
  find(::Any) at array.jl:1116

programistawpf

unread,
Oct 20, 2016, 3:26:01 PM10/20/16
to julia...@googlegroups.com
Ok, I finded: findin(temp,[","])...

for i=1:nol
temp=(collect(graphemes(readline(o))))
poz=(findin(temp,[","])[end])+1
write(io, temp[poz])
write(io, "\n")
if mod(i,10^6)==0 println(i) end
end

But is very slow :/ someting faster ? I have 10^ 8 line of tekst :/

Paul


W dniu 2016-10-20 20:28, program...@gmail.com pisze:
Message has been deleted

Kristoffer Carlsson

unread,
Oct 20, 2016, 3:45:58 PM10/20/16
to julia-users
What do you actually want to do? Extract all characters after a comma?

programistawpf

unread,
Oct 20, 2016, 3:49:07 PM10/20/16
to julia...@googlegroups.com
Between n comma and last comma
Paul
W dniu 2016-10-20 21:45, Kristoffer Carlsson pisze:

Germán Aquino

unread,
Oct 21, 2016, 10:07:52 AM10/21/16
to julia-users
Hello,
I'm not sure if this is what you need, but you could use:

str = "abc"
temp = collect(graphemes(str))
find(temp .== "c")
Reply all
Reply to author
Forward
0 new messages