thanks,
hagai
> I have a hebrew text file, which I want to read in python
> I don't know which encoding I need to use & how I do that
As for the "how", look to the codecs module -- but if you don't know
what codec the textfile is written in, I know of no ways to guess from
here!-)
Alex
To open an encoded file for reading, use
f = codecs.open(file, 'r', encoding='...')
Now, calls like 'f.readline()' will return unicode strings.
Here's an example, using a file in UTF-8 I have laying around:
>>> f = codecs.open("/users/jepler/txt/UTF-8-demo.txt", "r", "utf-8")
>>> for i in range(5): print repr(f.readline())
...
u'UTF-8 encoded sample plain-text file\n'
u'\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\n'
u'\n'
u'Markus Kuhn [\u02c8ma\u02b3k\u028as ku\u02d0n] <mk...@acm.org> \u2014 1999-08-20\n'
u'\n'
Jeff
> I have a hebrew text file, which I want to read in python
> I don't know which encoding I need to use
that's not a good start. but maybe it's one of these:
http://sites.huji.ac.il/tex/hebtex_fontsrep.html
?
> how I do that
f = open(myfile)
text = f.readline()
followed by one of
text = text.decode("iso-8859-8")
text = text.decode("cp1255")
text = text.decode("cp862")
alternatively, use:
f = codecs.open(myfile, "r", encoding)
to get a stream that decodes things on the fly.
</F>
hagai