Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Alphabetics respect to a given locale

11 views
Skip to first unread message

candide

unread,
Apr 1, 2011, 4:55:42 PM4/1/11
to
How to retrieve the list of all characters defined as alphabetic for the
current locale ?

eryksun ()

unread,
Apr 1, 2011, 7:08:47 PM4/1/11
to
On Friday, April 1, 2011 4:55:42 PM UTC-4, candide wrote:
>
> How to retrieve the list of all characters defined as alphabetic for the
> current locale ?

Give this a shot:

In [1]: import string

In [2]: print string.letters
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

In [3]: import locale

In [4]: locale.getlocale()
Out[4]: (None, None)

In [5]: locale.setlocale(locale.LC_ALL, 'English_Great Britain')
Out[5]: 'English_United Kingdom.1252'

In [6]: print string.letters
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
ƒSOZsozYªµºAAAAÄÅÆÇEÉEEIIIIDÑOOOOÖOUUUÜY_
ßàáâaäåæçèéêëìíîïdñòóôoöoùúûüy_ÿ

The strings for locales are different for POSIX vs Windows systems.

http://docs.python.org/library/locale.html

Emile van Sebille

unread,
Apr 1, 2011, 7:16:33 PM4/1/11
to pytho...@python.org
On 4/1/2011 1:55 PM candide said...

> How to retrieve the list of all characters defined as alphabetic for the
> current locale ?

I think this is supposed to work, but not for whatever reason for me
when I try to test after changing my locale (but I think that's a centos
thing)...

import locale
locale.setlocale(locale.LC_ALL,'')
import string
print string.lowercase

I don't see where else this might be for python.

However, you can test if something is alpha:

>>> val = u'caf' u'\xE9'
>>> val.isalpha()
True
>>>

... and check its unicode category

>>> import unicodedata
>>> unicodedata.category(u'a')
'Ll' # Letter - lower case
>>> unicodedata.category(u'A')
'Lu' # Letter - upper case
>>> unicodedata.category(u'1')
'Nd' # Number - decimal?
>>> unicodedata.category(u'\x01')
'Cc' #


HTH,

Emile

candide

unread,
Apr 2, 2011, 9:18:18 AM4/2/11
to
Le 01/04/2011 22:55, candide a écrit :
> How to retrieve the list of all characters defined as alphabetic for the
> current locale ?


Thanks for the responses. Alas, neither solution works.

Under Ubuntu :

# ----------------------
import string
import locale

print locale.getdefaultlocale()
print locale.getpreferredencoding()

locale.setlocale(locale.LC_ALL, "")

print string.letters

letter_class = u"[" + u"".join(unichr(c) for c in range(0x10000) if
unichr(c).isalpha()) + u"]"

#print letter_class
# ----------------------

prints the following :


('fr_FR', 'UTF8')
UTF-8
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz


I commented out the letter_class printing for outputing a flood of
characters not belonging to the usual french character set.


More or less the same problem under Windows, for instance,
string.letters gives the "latin capital letter eth" as an analphabetic
character (this is not the case, we never use this letter in true french
words).

pyt...@bdurham.com

unread,
Apr 2, 2011, 10:13:49 AM4/2/11
to pytho...@python.org
Candide,

Perhaps the Python Babel project has something that might help out?
http://babel.edgewall.org/

If this works out for you can you share your learning with the rest of
us? :)

Thanks and good luck!

Malcolm

0 new messages