I downloaded a Webpage with some Czech characters with wget.
Firefox shows all encodings correct but vi not.
After removing html-tags file command said: "Non-ISO extended-ASCII
text"
When looking at the file in vi I get
modre hory - ka<9e>dý veèer
INSTEAD OF
modre hory - každý večer
I already checked ascii2uni but since <9e> is not an espace sequence I
could only use "ascii2uni -a R" (convert raw heximal numbers options),
but this is unreliable.
Anybody an idea what would be the smartest way to do this? Maybe I
could just manually translate (with tr) the needed characters?
Regards,
Toby
I already tried iconv but it couldn't convert the relevant character
<e9> in UTF-8.
iconv --from-code=ISO-8859-1 --to-code=UTF-8 INFILE > OUTFILE
Cheers,
Toby
OK :) I found out my webdocument is in ISO-8859-2 (or more specific
czech which is a subset of ISO-8859-2) and not ISO-8859-1. So I guess
I should just convert it to UTF-8 and be able to further process it. I
tried
iconv -f ISO-8859-2 -t UTF-8 INFILE > OUTFILE
Unfortunately the <e9> stays inside. Is it maybe because I do not have
installed the this ISO locale? How can I install it in ubuntu?
Regards,
Toby