Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Character encoding

198 views
Skip to first unread message

Petr Laznovsky

unread,
May 3, 2013, 2:23:57 PM5/3/13
to
Have .csv file downloaded from internet and need to convert encoding into cp852.

ENCA says the file is UTF-8 but "iconv.exe -f utf-8 -t cp852 c:\work\platby\vypis.csv
>c:\work\platby\vypis-conv.csv" give "iconv: c:\work\platby\vypis.csv: cannot convert"

win_iconv.exe -f utf-8 -t cp1250 c:\work\platby\vypis.csv
conversion error: Illegal byte sequence

Where the problem can be? Original file seems to be OK.

L.

JJ

unread,
May 3, 2013, 9:41:19 PM5/3/13
to
iconv/win_iconv don't play nice if the source file contains BOM, which is
this byte sequence: EF BB BF. If the file has it (at start of file), delete
it using a hex editor, then convert it with iconv. If the file doesn't have
it, it's probably not in UTF-8 encoding, is corrupted, or no longer encoded
in UTF-8 (e.g.: got coverted by the program you used to download it).

Liviu

unread,
May 3, 2013, 10:46:38 PM5/3/13
to
"Petr Laznovsky" <nob...@nowhere.com> wrote...
>
> Have .csv file downloaded from internet and need to convert encoding
> into cp852.
>
> ENCA says the file is UTF-8 but "iconv.exe -f utf-8 -t cp852
> c:\work\platby\vypis.csv >c:\work\platby\vypis-conv.csv"
> give "iconv: c:\work\platby\vypis.csv: cannot convert"
>
> win_iconv.exe -f utf-8 -t cp1250 c:\work\platby\vypis.csv
> conversion error: Illegal byte sequence

Not familiar with iconv, but there should be some option for a more
verbose diagnostic pointing to the exact offending line and character.

That said, it might be that the file is not UTF-8 after all, or was
otherwise corrupted. Can't really tell without seeing the file itself.

FWIW it's possible to convert UTF-8 to an 8-bit codepage-encoded
text file without 3rd party utilities (basically, convert to UTF-16LE
then downconvert to a given codepage - note that the latter is lossy)
e.g. http://www.dostips.com/forum/viewtopic.php?p=21364#p21364

Liviu


Liviu

unread,
May 3, 2013, 10:47:15 PM5/3/13
to
"JJ" <d...@nah.meh> wrote...
>
> iconv/win_iconv don't play nice if the source file contains BOM,
> which is this byte sequence: EF BB BF

Don't know iconv firsthand, but that would be surprising on the part
of a utility specifically meant for converting between encodings.
Perhaps it has command line switches to deal with UTF-8 BOMs?

> If the file doesn't have it, it's probably not in UTF-8 encoding,

With this, I disagree. UTF-8 is a single-byte encoding, and therefore
doesn't need a BOM to disambiguate byte order. Most UTF-8
encoded files don't carry the BOM, it's not required and generally not
recommended except in cases where it's used as a tag or signature.

Liviu


Petr Laznovsky

unread,
May 4, 2013, 4:00:35 AM5/4/13
to
Dne 4.5.2013 3:41, JJ napsal(a):
> On Fri, 03 May 2013 20:23:57 +0200, Petr Laznovsky wrote:
>> Have .csv file downloaded from internet and need to convert encoding into
>> cp852.
>>
>> ENCA says the file is UTF-8 but "iconv.exe -f utf-8 -t cp852
>> c:\work\platby\vypis.csv
>>> c:\work\platby\vypis-conv.csv" give "iconv: c:\work\platby\vypis.csv:
>>> cannot convert"
>>
>> win_iconv.exe -f utf-8 -t cp1250 c:\work\platby\vypis.csv conversion
>> error: Illegal byte sequence
>>
>> Where the problem can be? Original file seems to be OK.
>
> iconv/win_iconv don't play nice if the source file contains BOM, which is
> this byte sequence: EF BB BF.

That`s the point!

> If the file has it (at start of file), delete
> it using a hex editor, then convert it with iconv.

No need hex editor, prepend this line before iconv command work fine:

tail --bytes=+4 vypis.csv >vypis-a.csv

Thank you very much!

L.




0 new messages