Names of available languages to install

34 views
Skip to first unread message

fgra...@gmail.com

unread,
Oct 4, 2014, 5:40:22 PM10/4/14
to pyhy...@googlegroups.com
Hi Dr. Leo:

Congratulations first for what seems to be a very well thought and managed package!

I have just installed it and am beginning to know it, so may be I am missing something that solves my problem, but until now I have been unable to find it.

I just want to install a dictionary for hyphenating in the latin language. *OpenOffice* allows the installation of such a language, so I assume it exists also in *LibreOffice*, which is the repository the module searchs by default. But even in *OpenOffice* I have not found the exact name I should ask for: just 'la', 'la_ANY', 'la_IT', etc.? All my tries have been in vain. It would be useful that the module provided a function that informed about what dictionaries *are available* in any given repository.

Could you help me with this?

Thanks in advance

Francisco

Dr. Leo

unread,
Oct 5, 2014, 5:30:32 AM10/5/14
to pyhy...@googlegroups.com
Hi Francisco,

thanks for your feedback.

This would be a very nice feature indeed. But I could not find a clean way to implement it:

The install function in the dictools module reads dictionaries via http from the git repo of LibreOffice. See http://cgit.freedesktop.org/libreoffice/dictionaries/plain/. I am unaware of an API feature of LibreOffice to obtain a list of installable dicts, say, through an xml file in that repo. I may be wrong. So please let me know about anything I might have overlooked.

To solve your problem, you have the following options:

1. Retrieve the dic file for latin in your local OpenOffice or LibraOffice installation or download it from somewhere, and copy it to pyhyphen's package root. Install it via the 'register' function in the dictools module. You should then be able to instantiate the Hyphenator specifying the language. You may hve to specify the directory keyword argument.

2. Retrieve the local path of the latin dict and instantiate Hyphenator with the keyword arg directory = <path>.

3. If 1 and 2 don't work for you, read the dictools module to understand how dictionaries are installed, i.e. how their metadata including local path is stored in a pickled file. You can then add the info on latin manually.

I hope this ehlps.
 
Regards

Leo
--
You received this message because you are subscribed to the Google Groups "pyhyphen" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyhyphen+u...@googlegroups.com.
To post to this group, send email to pyhy...@googlegroups.com.
Visit this group at http://groups.google.com/group/pyhyphen.
For more options, visit https://groups.google.com/d/optout.

fgra...@gmail.com

unread,
Oct 6, 2014, 6:44:12 PM10/6/14
to pyhy...@googlegroups.com
Many thanks, Leo, for your quick and informative response.

Exploring the link that you provide to the *LibreOffice* repository it is now clear that latin is not included in it. On my own I had finally found how things work in *OpenOffice*: each language is considered an *extension* and all the requiered information about it is bundled in an *.oxt* archived file. In the case of latin its name is *dict-la_2013-03-31.oxt*. In its subdirectory *hyph_la* one finally finds the *utf-8* coded file wich has the patterns, called *hyph_la.dic*.

Following your instructions I have copied it to */lib/site-packages/hyphen* and have tried to install it following your first method, but without success. On the contrary, your second suggestion has worked nicely:

                h=Hyphenator(directory = r'C:/Documents and Settings/HP_Administrator/Anaconda3/lib/site-packages/hyphen/hyph_la.dic')

has worked nicely. I will try to make the two other methods work too, if I can.

One suggestion in this sense would be that perhaps you could without much effort split the *install* function into two: one for dealing with the localization of the source in the web and the downloading of the data and another for the internal *registration* into *hyphen*.

I have also tried without success to find documentation for the *hnj.pyd* library. Could you point me somewhere?

Many thanks and regards

Francisco

Dr. Leo

unread,
Oct 7, 2014, 2:00:11 AM10/7/14
to pyhy...@googlegroups.com
Hi Francisco,




Following your instructions I have copied it to */lib/site-packages/hyphen* and have tried to install it following your first method, but without success. On the contrary, your second suggestion has worked nicely:

I haven't look at pyhyphen for a long time. The first method probably requires that the metadata be stored in the pickle file.


                h=Hyphenator(directory = r'C:/Documents and Settings/HP_Administrator/Anaconda3/lib/site-packages/hyphen/hyph_la.dic')

has worked nicely. I will try to make the two other methods work too, if I can.

One suggestion in this sense would be that perhaps you could without much effort split the *install* function into two: one for dealing with the localization of the source in the web and the downloading of the data and another for the internal *registration* into *hyphen*.

I thought I had done so. At least I had the intention. Any patch or pull request would be much appreciated for the next release. I will, however, move the repo to bitbucket or github at some point. 

I have also tried without success to find documentation for the *hnj.pyd* library. Could you point me somewhere?

The link to hnj_hyphen is in README.txt or in the pypi cover page for that matter. The C library is part of hunspell. The Python extension module *.pyd is compiled from the C lib.
Reply all
Reply to author
Forward
0 new messages