all lower case in python3.3

14 views
Skip to first unread message

Matthias Kasemann

unread,
Nov 15, 2012, 10:19:38 AM11/15/12
to pyhy...@googlegroups.com
Hi,

on OSX 10.8, python 3.3 all responses from PyHyphen calls come back with lower case:

>>> h_en.pairs('London'): [['lon', 'don']]
>>> h_de.pairs('Hausmeister'): [['haus', 'meister'], ['hausmeis', 'ter']]
>>> h_de.syllables('Hausmeister'): ['haus', 'meis', 'ter']

Any idea where to look or what to correct?

Regards
MK

Dr.Leo

unread,
Nov 16, 2012, 3:59:15 AM11/16/12
to pyhy...@googlegroups.com
Hi,

on win32 and Python33 the bug occurs as well.

I suspect it is caused by some backward-incompatible changes to the C API in Python33 as it introduces some changes to the unicode and string representation.

The hyphen.Hyphenator.pairs and syllables methods always pass lowercased words to the C function in hnjmodule.c. The 'mode' flag tells the C function if, after hyphenating, it has to upper the whole word or the first letter or nothing. Mixed words with lower and uppercase letters are discarded before calling the C function. All this is based on the assumption that the machinery in hyphen.c does not properly handle upper-case words. This was definetely so in previous versions, but I haven't checked for later versions of libhyphen.

Maybe changes to Python's C API interfere with this lowering / uppering staff at some point. I do not see where though as 'London' is represented by 8 bits ascii, even in utf8.

I fear I won't have the time to track down this issue. So any helping hand is much appreciated. One could also ask for help in some developer forum such as comp.lang.python.

This bug is really critical. Thanks for spotting it.

Leo
--
You received this message because you are subscribed to the Google Groups "pyhyphen" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pyhyphen/-/lHnqGjHP9WwJ.
To post to this group, send email to pyhy...@googlegroups.com.
To unsubscribe from this group, send email to pyhyphen+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pyhyphen?hl=en.

Dr.Leo

unread,
Nov 16, 2012, 2:43:47 PM11/16/12
to pyhy...@googlegroups.com

Hi,

I have just pushed some new changesets to the hg repository which should solve the problem. I have just remove the upper/lowercase stuff from the pairs() and syllables() methods. This has the effect that 'CAPITALIZED' words will be passed to the hyphenator as such and thus not be hyphenated, i.e. libhyphen still does not handle UPPERCASED words. This is a feature now exposed by PyHyphen. Words such as 'London' are passed unchanged as wel. I have tested it on Python 3.3 and it works. This is, of course, merely a workaround, not a bugfix. I still don't know why the C extension behaves differently under P3.3.

Doing all this I have stumbled upon another problem and fixed it: the LibreOffice people changed their directory structure for dictionaries. So the dict_url in setup.py had to be changed.

Please check if all this works for you. I will publish v2.1 on pypi on Wednesday.

Leo



-------- Original-Nachricht --------
Betreff: Re: [PyHyphen] all lower case in python3.3
Datum: Fri, 16 Nov 2012 09:59:15 +0100
Von: Dr.Leo <fhax...@gmail.com>
Antwort an: fhax...@gmail.com
An: pyhy...@googlegroups.com

Matthias Kasemann

unread,
Nov 19, 2012, 7:20:48 AM11/19/12
to pyhy...@googlegroups.com, fhax...@gmail.com, fhax...@googlemail.com
Hi,

In PyHyphen2.0.1 and the 'lowercase' problem seems to be fixed!!

Congratulation and thank you for the quick response. For me it would have meant long hours to dig into it...

Regards Matthias
Reply all
Reply to author
Forward
0 new messages