Help needed for using non-unicoded dic files

14 views
Skip to first unread message

asheq hamid

unread,
Oct 8, 2009, 8:43:16 AM10/8/09
to pyenchant users
Hi all,
I am using pyenchant. This is very helpful. Whenever I use utf encoded
dict and aff file there is no problem. But if the encoding is
something else, this gives "utf-8 encoding error" when I initialize
enchant.dict object.
1)How can I use these non unicode dictionary with pyenchant.
2) How can I get the current encoding of a .dic file using pyenchant?

Thanking you,
Asheq

Ryan Kelly

unread,
Oct 8, 2009, 8:33:22 PM10/8/09
to pyencha...@googlegroups.com

Hi Asheq,

> Whenever I use utf encoded dict and aff file there is no problem.
> But if the encoding is something else, this gives "utf-8 encoding
> error" when I initialize enchant.dict object.
> 1)How can I use these non unicode dictionary with pyenchant.

The underlying enchant library works exclusively in utf-8, so this is
all PyEnchant knows how to work with. The encoding of the dictionary
files is supposed to be handled transparently by the spellchecker
backend (from your description I'm guessing you use the MySpell
backend).

So the short answer is: you cant. If the dictionary files are not
readable by the spellchecker backend, there's nothing pyenchant can do
about it.

Of course, this could be caused by a bug in enchant. If you post some
additional details I'll try to look into it a little deeper. Please
include the full error traceback from PyEnchant, details of your python
platform, and if possible the dict and aff file that is giving you
trouble.

> 2) How can I get the current encoding of a .dic file using pyenchant?

There's no API for it, since it's supposed to be handled transparently.

However, the .aff file should contain an encoding marker near the top of
the file. On my machine they all contain the line "SET ISO8859-15". If
you really need to determine the encoding, you could try parsing it out
of the file in this format.

You could also try guessing the encoding with a library such as chardet:

http://chardet.feedparser.org/

Cheers,

Ryan


--
Ryan Kelly
http://www.rfk.id.au | This message is digitally signed. Please visit
ry...@rfk.id.au | http://www.rfk.id.au/ramblings/gpg/ for details

signature.asc

asheq hamid

unread,
Oct 11, 2009, 12:45:01 AM10/11/09
to pyenchant users
Hi,
Thanks for your reply. I wanted to send you the .dic file and .aff
file. But I don't see any attachment option here. How can I send it to
you?
Thanks,
Asheq
> Ryan Kellyhttp://www.rfk.id.au |  This message is digitally signed. Please visit
> r...@rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/for details
>
>  signature.asc
> < 1KViewDownload

Ryan Kelly

unread,
Oct 11, 2009, 3:06:52 AM10/11/09
to pyencha...@googlegroups.com

> Thanks for your reply. I wanted to send you the .dic file and .aff
> file. But I don't see any attachment option here. How can I send it to
> you?

You can email them direct to me at this address (ry...@rfk.id.au) rather
than going through google groups.

Ryan


--
Ryan Kelly


http://www.rfk.id.au | This message is digitally signed. Please visit

Ryan Kelly

unread,
Oct 11, 2009, 8:25:05 PM10/11/09
to Asheq Hamid, pyencha...@googlegroups.com

Hi Asheq,

> I am sending the zip file. Inside you will see two different encoded
> version of the same dictionary file. Both .aff file have specified
> "SET utf-8" although the real encoding is not utf-8 for both of them.
> As you initialize the enchant.dict object with the iso encoded .dic
> file, you'll see lots of error.

This is the expected behaviour. The file explicitly declares that it is
encoded in utf-8, so it is read as utf-8 data. If I change the encoding
declaration to "SET ISO-8859-2" so that it matches the on-disk encoding,
then the file loads and works fine.

It would be better if this raised an actual exception instead of just
printing errors to the console, but I'm not sure if I can hook into the
dict reading routines at such a low level...


Cheers,

signature.asc
Reply all
Reply to author
Forward
0 new messages