The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Newsgroups: comp.lang.python
From: Christian Heimes <li...@cheimes.de>
Date: Fri, 14 Sep 2012 00:00:45 +0200
Local: Thurs, Sep 13 2012 6:00 pm
Subject: Re: Least-lossy string.encode to us-ascii?
Am 13.09.2012 23:26, schrieb Tim Chase:
> I've got a bunch of text in Portuguese and to transmit them, need to
The unidecode [1] package contains a large mapping of unicode chars to
> have them in us-ascii (7-bit). I'd like to keep as much information > as possible, just stripping accents, cedillas, tildes, etc. So > "serviço móvil" becomes "servico movil". Is there anything stock > that I've missed? I can do mystring.encode('us-ascii', 'replace') > but that doesn't keep as much information as I'd hope. ASCII. It even supports cool stuff like Chinese to ASCII: >>> import unidecode
Bei Jing
>>> print u"\u5317\u4EB0" 北亰 >>> print unidecode.unidecode(u"\u5317\u4EB0") icu4c and pyicu [2] may contain more methods for conversion but they
>>> import icu
icu.Locale.getUS())
>>> rbf = icu.RuleBasedNumberFormat(icu.URBNFRuleSetTag.SPELLOUT, >>> rbf.format(23)
u'one hundred thousand'
u'twenty-three' >>> rbf.format(100000) Regards,
[1] http://pypi.python.org/pypi/Unidecode/0.04.9
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||