UTF8, how to procesed text data

142 views
Skip to first unread message

program...@gmail.com

unread,
Oct 19, 2016, 7:46:38 AM10/19/16
to julia-users
Data file is coding UTF8 but i cant procedsed this datain Julia ? What wrong ?

o=open("data.txt")

julia> temp=readline(io)
"3699778,13,2,gdbiehz jablej gupując szybgi Injehnej dg 26 paździehniga,1\n"

julia> temp[61:65]
"aźdz"

julia> findin(temp[61:65],"d")
ERROR: invalid UTF-8 character index
 in next at utf8.jl:64
 in findin at array.jl:1179

Paul

Stefan Karpinski

unread,
Oct 19, 2016, 8:51:46 AM10/19/16
to Julia Users

Milan Bouchet-Valat

unread,
Oct 19, 2016, 8:56:15 AM10/19/16
to julia...@googlegroups.com
Le mercredi 19 octobre 2016 à 04:46 -0700, program...@gmail.com a
écrit :
You didn't say what version of Julia you're using. The bug seems to
happen on 0.4.7, but not on 0.5.0, so I'd encourage you to upgrade.

(Note that in general you shouldn't index into strings with arbitrary
integers: only values referring to the beginning of a Unicode code
point are valid.)


Regards

program...@gmail.com

unread,
Oct 19, 2016, 9:02:42 AM10/19/16
to julia-users
Version 0.3.12, udate to 5 ?

Milan Bouchet-Valat

unread,
Oct 19, 2016, 9:04:13 AM10/19/16
to julia...@googlegroups.com
Le mercredi 19 octobre 2016 à 06:02 -0700, program...@gmail.com a
écrit :
> Version 0.3.12, udate to 5 ?
Yes. 0.3.x versions are unsupported for some time now.


Regards

Krisztián Pintér

unread,
Oct 19, 2016, 10:03:31 AM10/19/16
to julia-users

On Wednesday, October 19, 2016 at 1:46:38 PM UTC+2, program...@gmail.com wrote:

julia> temp[61:65]
"aźdz"

julia> findin(temp[61:65],"d")
ERROR: invalid UTF-8 character index

in addition to other answers, you have the search function that is string-specific, and might work with older versions (can't test now)

programistawpf

unread,
Oct 20, 2016, 7:38:51 AM10/20/16
to julia...@googlegroups.com
I need list of "," [3,6,8] ... but not only  first .. "3"
Paul
W dniu 2016-10-19 16:03, Krisztián Pintér pisze:

program...@gmail.com

unread,
Oct 20, 2016, 1:32:15 PM10/20/16
to julia-users
Julia ver 5 is OK , but is new problem with space after ś, ć . More in new post...

Paul

Gregory Salvan

unread,
Oct 21, 2016, 4:37:00 AM10/21/16
to julia...@googlegroups.com
Hi,
there is a library that let you specify the encoding type when opening files:

https://github.com/nalimilan/StringEncodings.jl
Reply all
Reply to author
Forward
0 new messages