Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Converting from UTF-8 to ASCII for gecos field.

414 views
Skip to first unread message

Prentice Bisbal

unread,
Oct 14, 2010, 2:47:42 PM10/14/10
to perl...@perl.org
Greetings.

Can anyone suggest a good way of converting a string from UTF-8 to IA5
(ASCII) for the gecos attribute. For example, I have the hypotheical
user �r�� C�rtm�n, with a lot of accented characters in his name
converting his name to ASCII using this code:

my $gecos = encode('ascii', $cn);

Turns it into this ugly mess:

gecos: ?r?? C?rtm?n

Anyone know of any decent perl functions that could turn it into
something more readable, like "Eric Cartman"?

Trying to setup regular expressions would be a nightmare.


--
Prentice

Peter Karman

unread,
Oct 14, 2010, 3:07:38 PM10/14/10
to perl...@perl.org
Prentice Bisbal wrote on 10/14/10 1:47 PM:

> Greetings.
>
> Can anyone suggest a good way of converting a string from UTF-8 to IA5
> (ASCII) for the gecos attribute. For example, I have the hypotheical
> user �r�� C�rtm�n, with a lot of accented characters in his name
> converting his name to ASCII using this code:
>
> my $gecos = encode('ascii', $cn);
>
> Turns it into this ugly mess:
>
> gecos: ?r?? C?rtm?n
>
> Anyone know of any decent perl functions that could turn it into
> something more readable, like "Eric Cartman"?
>

use Search::Tools::Transliterate;

my $ascifier = Search::Tools::Transliterate->new( ebit => 0 );

my $gecos = $ascifier->convert($gecos);

--
Peter Karman . http://peknet.com/ . pe...@peknet.com

Prentice Bisbal

unread,
Oct 14, 2010, 3:52:54 PM10/14/10
to perl...@perl.org
Graham Barr wrote:

> On Oct 14, 2010, at 13:47 , Prentice Bisbal wrote:
>> Greetings.
>>
>> Can anyone suggest a good way of converting a string from UTF-8 to IA5
>> (ASCII) for the gecos attribute. For example, I have the hypotheical
>> user �r�� C�rtm�n, with a lot of accented characters in his name
>> converting his name to ASCII using this code:
>>
>> my $gecos = encode('ascii', $cn);
>>
>> Turns it into this ugly mess:
>>
>> gecos: ?r?? C?rtm?n
>>
>> Anyone know of any decent perl functions that could turn it into
>> something more readable, like "Eric Cartman"?
>
> Here are a couple of modules on CPAN that might be of some help
>
> http://search.cpan.org/perldoc?Text::Undiacritic
> http://search.cpan.org/perldoc?Lingua::Translit
>
> Graham.
>
>

Text::Undiacritic was exactly what I needed.

Thanks.


--
Prentice

0 new messages