convert special characters - Non-ISO extended-ASCII text

4893 views
Skip to first unread message

tob...@astroboymail.com

unread,
Aug 11, 2008, 8:58:19 AM8/11/08
to
Hello,

I downloaded a Webpage with some Czech characters with wget.

Firefox shows all encodings correct but vi not.

After removing html-tags file command said: "Non-ISO extended-ASCII
text"

When looking at the file in vi I get

modre hory - ka<9e>dý veèer

INSTEAD OF

modre hory - každý večer

I already checked ascii2uni but since <9e> is not an espace sequence I
could only use "ascii2uni -a R" (convert raw heximal numbers options),
but this is unreliable.

Anybody an idea what would be the smartest way to do this? Maybe I
could just manually translate (with tr) the needed characters?

Regards,
Toby

tob...@astroboymail.com

unread,
Aug 11, 2008, 9:33:09 AM8/11/08
to

I already tried iconv but it couldn't convert the relevant character
<e9> in UTF-8.

iconv --from-code=ISO-8859-1 --to-code=UTF-8 INFILE > OUTFILE

Cheers,
Toby

tob...@astroboymail.com

unread,
Aug 11, 2008, 9:49:48 AM8/11/08
to

OK :) I found out my webdocument is in ISO-8859-2 (or more specific
czech which is a subset of ISO-8859-2) and not ISO-8859-1. So I guess
I should just convert it to UTF-8 and be able to further process it. I
tried

iconv -f ISO-8859-2 -t UTF-8 INFILE > OUTFILE

Unfortunately the <e9> stays inside. Is it maybe because I do not have
installed the this ISO locale? How can I install it in ubuntu?

Regards,
Toby

dufka....@gmail.com

unread,
May 11, 2013, 3:49:05 AM5/11/13
to
If it's czech text, try:

iconv -f CP1250 -t UTF-8 INFILE > OUTFILE
Reply all
Reply to author
Forward
0 new messages