Differences between dictionaries + defaults

58 views
Skip to first unread message

Matt Bennett

unread,
Mar 3, 2015, 3:08:57 PM3/3/15
to pyencha...@googlegroups.com
Hello,

I have an OS X machine with enchant 1.6.0 (from homebrew) and pyenchant 1.6.6. Homebrew also installed aspell, but enchant appears to be using the MySpell provider:

>>> import enchant
>>> enchant.list_dicts()
[('de_DE', <Enchant: Myspell Provider>), ('en_AU', <Enchant: Myspell Provider>), ('en_GB', <Enchant: Myspell Provider>), ('en_US', <Enchan
t
: Myspell Provider>), ('fr_FR', <Enchant: Myspell Provider>)]

There are some surprising differences between the "en_US" and "en_GB" dictionaries, for example:

>>> import enchant
>>> us = enchant.Dict('en_US')
>>> gb = enchant.Dict('en_GB')
>>> us.check('timeout')
True
>>> gb.check('timeout')
False


I found the dictionary files in <virtualenv>/lib/python2.7/site-packages/enchant/share/enchant/myspell/, and there are other oddities too. For example en_GB.dic contains "i.e." (which surely gets tokenised) but omits "i". So using i.e. in a sentence results in a spelling error.

Where do these dictionaries come from? I note they're not part of the source distribution of pyenchant.

Figuring my dictionaries might just be out of date, I attempted to grab the latest version from https://wiki.openoffice.org/wiki/Dictionaries as per the tutorial. Several dead links later I ended up at http://wordlist.aspell.net/dicts/, which appears (?) to be the canonical source of myspell/hunspell dictionaries.

I downloaded some and overwrote the previous files in <virtualenv>/lib/python2.7/site-packages/enchant/share/enchant/myspell/ and they seem to fix all of the above inconsistencies. Is there somewhere global I can update so that all new installations of pyenchant get a copy of these dictionaries?

Thanks,
Matt.




Ryan Kelly

unread,
Mar 4, 2015, 3:13:31 PM3/4/15
to pyencha...@googlegroups.com
On 4/03/2015 02:19, Matt Bennett wrote:
> Hello,
>
> I have an OS X machine with enchant 1.6.0 (from homebrew) and pyenchant
> 1.6.6. Homebrew also installed aspell, but enchant appears to be using
> the MySpell provider:
> [..snip..]
> I found the dictionary files in
> <virtualenv>/lib/python2.7/site-packages/enchant/share/enchant/myspell/,
> and there are other oddities too. For example en_GB.dic contains "i.e."
> (which surely gets tokenised) but omits "i". So using i.e. in a sentence
> results in a spelling error.
>
> Where do these dictionaries come from? I note they're not part of the
> source distribution of pyenchant.

They're built into the binary distributions via logic here:


https://github.com/rfk/pyenchant/tree/master/tools/pyenchant-bdist-osx-sources

The MySpell dictionaries were the simplest ones to ship bundled, which
is why they're available by default. The ones there are likely many
years old by now.

> Figuring my dictionaries might just be out of date, I attempted to grab
> the latest version from https://wiki.openoffice.org/wiki/Dictionaries as
> per the tutorial. Several dead links later I ended up at
> http://wordlist.aspell.net/dicts/, which appears (?) to be the canonical
> source of myspell/hunspell dictionaries.
>
> I downloaded some and overwrote the previous files
> in <virtualenv>/lib/python2.7/site-packages/enchant/share/enchant/myspell/ and
> they seem to fix all of the above inconsistencies. Is there somewhere
> global I can update so that all new installations of pyenchant get a
> copy of these dictionaries?

A PR against the above location would be great! There's also a
pyenchant-bdist-win32-sources directory that could be updated in parallel.


Cheers,

Ryan
Reply all
Reply to author
Forward
0 new messages