> Whenever I use utf encoded dict and aff file there is no problem.
> But if the encoding is something else, this gives "utf-8 encoding
> error" when I initialize enchant.dict object.
> 1)How can I use these non unicode dictionary with pyenchant.
The underlying enchant library works exclusively in utf-8, so this is
all PyEnchant knows how to work with. The encoding of the dictionary
files is supposed to be handled transparently by the spellchecker
backend (from your description I'm guessing you use the MySpell
backend).
So the short answer is: you cant. If the dictionary files are not
readable by the spellchecker backend, there's nothing pyenchant can do
about it.
Of course, this could be caused by a bug in enchant. If you post some
additional details I'll try to look into it a little deeper. Please
include the full error traceback from PyEnchant, details of your python
platform, and if possible the dict and aff file that is giving you
trouble.
> 2) How can I get the current encoding of a .dic file using pyenchant?
There's no API for it, since it's supposed to be handled transparently.
However, the .aff file should contain an encoding marker near the top of
the file. On my machine they all contain the line "SET ISO8859-15". If
you really need to determine the encoding, you could try parsing it out
of the file in this format.
You could also try guessing the encoding with a library such as chardet:
http://chardet.feedparser.org/
Cheers,
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ry...@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details
You can email them direct to me at this address (ry...@rfk.id.au) rather
than going through google groups.
Ryan
--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
> I am sending the zip file. Inside you will see two different encoded
> version of the same dictionary file. Both .aff file have specified
> "SET utf-8" although the real encoding is not utf-8 for both of them.
> As you initialize the enchant.dict object with the iso encoded .dic
> file, you'll see lots of error.
This is the expected behaviour. The file explicitly declares that it is
encoded in utf-8, so it is read as utf-8 data. If I change the encoding
declaration to "SET ISO-8859-2" so that it matches the on-disk encoding,
then the file loads and works fine.
It would be better if this raised an actual exception instead of just
printing errors to the console, but I'm not sure if I can hook into the
dict reading routines at such a low level...
Cheers,