Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Making Montreal match Montréal using re
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Expand all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Skip Montanaro  
View profile  
 More options Jan 27 1998, 3:00 am
Newsgroups: comp.lang.python
From: Skip Montanaro <s...@automatrix.com>
Date: 1998/01/27
Subject: Making Montreal match Montréal using re

I'm using the default locale (whatever US ASCII is), but I'd like to match
some words that have accented characters using the re module.  For instance,
I'd like the Americanized "Montreal" to match the French "Montréal".  I
thought the way to do this changed with 1.5 and the re module, but a search
of the Python locator and the re module documentation didn't turn up
anything useful.  How do I do this?

Thx,

Skip Montanaro    | Musi-Cal: http://concerts.calendar.com/
s...@calendar.com | Python Support: http://www.pythonpros.com/
(518)372-5583     | XEmacs: http://www.automatrix.com/~skip/xemacs/tip.html


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Guido van Rossum  
View profile  
 More options Jan 27 1998, 3:00 am
Newsgroups: comp.lang.python
From: Guido van Rossum <gu...@CNRI.Reston.Va.US>
Date: 1998/01/27
Subject: Re: Making Montreal match Montréal using re

> I'm using the default locale (whatever US ASCII is), but I'd like to match
> some words that have accented characters using the re module.  For instance=
> ,
> I'd like the Americanized "Montreal" to match the French "Montr=E9al".  I
> thought the way to do this changed with 1.5 and the re module, but a search=

> of the Python locator and the re module documentation didn't turn up
> anything useful.  How do I do this?

If you don't want to change the locale, you'll have to make an
explicit translation table (so you'll have to decide exactly which
accented characters you want to map to which other characters) and
translate the string using string.translate().

--Guido van Rossum (home page: http://www.python.org/~guido/)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
neves  
View profile  
 More options Jan 28 1998, 3:00 am
Newsgroups: comp.lang.python
From: ne...@inf.puc-rio.br
Date: 1998/01/28
Subject: Re: Making Montreal match Montréal using re

In article <199801270355.WAA12...@eric.CNRI.Reston.Va.US>,
  Guido van Rossum <gu...@CNRI.Reston.Va.US> wrote:

> > I'm using the default locale (whatever US ASCII is), but I'd like to match
> > some words that have accented characters using the re module.  For

instance=

> If you don't want to change the locale, you'll have to make an
> explicit translation table (so you'll have to decide exactly which
> accented characters you want to map to which other characters) and
> translate the string using string.translate().

What about ignoring case in accented languages?
I'd like to match '\351' and '\311', the same accented letter just that one is
capitalized.

Is the only way using string.translate()? Isn't it too slow?

If it is so, has anyone already created this translation table?

regards,
Paulo


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Guido van Rossum  
View profile  
 More options Jan 28 1998, 3:00 am
Newsgroups: comp.lang.python
From: Guido van Rossum <gu...@CNRI.Reston.Va.US>
Date: 1998/01/28
Subject: Re: Making Montreal match Montréal using re

[me]

> > If you don't want to change the locale, you'll have to make an
> > explicit translation table (so you'll have to decide exactly which
> > accented characters you want to map to which other characters) and
> > translate the string using string.translate().

[Paulo]

> What about ignoring case in accented languages?
> I'd like to match '\351' and '\311', the same accented letter just
> that one is capitalized.

> Is the only way using string.translate()? Isn't it too slow?

It's implemented in C so it should be FAST!

> If it is so, has anyone already created this translation table?

You can easily build a table yourself using string.maketrans():
e.g. string.maketrans("\351", "\311") returns a translation table that
maps \351 to \311 (the arguments are strings that are to be mapped
one-by-one).

But you're posting from Brazil, I presume -- if you set up your locale
correctly, shouldn't you be able to use \w and re.IGNORECASE to get
the right effect?

--Guido van Rossum (home page: http://www.python.org/~guido/)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Making Montreal match Montréal? using re" by Andrew Kuchling
Andrew Kuchling  
View profile  
 More options Jan 28 1998, 3:00 am
Newsgroups: comp.lang.python
From: Andrew Kuchling <a...@magnet.com>
Date: 1998/01/28
Subject: Re: Making Montreal match Montréal? using re

ne...@inf.puc-rio.br wrote:
>What about ignoring case in accented languages?
>I'd like to match '\351' and '\311', the same accented letter just that one is
>capitalized.

>Is the only way using string.translate()? Isn't it too slow?

        I'm afraid string.translate or string.lower are your only
options; at the moment, a fixed table is used to map between upper and
lower-case, because making it fully dynamic based upon the re.LOCALE
flag was really messy.

akuchl...@acm.org             http://starship.skyport.net/crew/amk/
Modern disillusion is unlikely to last forever, and nothing rings so
hollow as the angst of yesterday.
        -- Robertson Davies, "Reading"


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "Accents (Was: Re: Making Montreal match Montral using re)" by Paulo Eduardo Neves
Paulo Eduardo Neves  
View profile  
 More options Jan 28 1998, 3:00 am
Newsgroups: comp.lang.python
From: Paulo Eduardo Neves <ne...@inf.puc-rio.br>
Date: 1998/01/28
Subject: Accents (Was: Re: Making Montreal match Montral using re)

Guido van Rossum wrote:
> [Paulo]
> > What about ignoring case in accented languages?
> > I'd like to match '\351' and '\311', the same accented letter just
> > that one is capitalized.

> > Is the only way using string.translate()? Isn't it too slow?

> It's implemented in C so it should be FAST!

Sure. But it is probably slower to do string.locale() and a pattern
matching than working with a pattern matching that ignores the case.

Other problem is that I'd lost my original text.

> > If it is so, has anyone already created this translation table?

> You can easily build a table yourself using string.maketrans():
> e.g. string.maketrans("\351", "\311") returns a translation table that
> maps \351 to \311 (the arguments are strings that are to be mapped
> one-by-one).

No problem, I just thought someone else should have already done that.

> But you're posting from Brazil, I presume --

Yes.

>if you set up your locale
> correctly, shouldn't you be able to use \w and re.IGNORECASE to get
> the right effect?

I've tried it in windows and it doesn't work. The windows is an english
version but the language is set to Brazilian Portuguese.

I've just tried it in unix and it also didn't work.
See my test, probably the chars 'é' and 'É' won't look good in your
email, they are the chars '\351' and '\311', respectly:

/home/neves> locale
LANG=pt_BR
LC_COLLATE="pt_BR"
LC_CTYPE="pt_BR"
LC_MONETARY="pt_BR"
LC_NUMERIC="pt_BR"
LC_TIME="pt_BR"
LC_MESSAGES="pt_BR"
LC_ALL=
[nazareth]/home/neves> python
Python 1.5 (#12, Jan 22 1998, 22:21:32) [C] on aix4
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam

>>> a = '\351\351\311' * 3
>>> print a
ééÉééÉééÉ
>>> from re import *
>>> p = compile(r'(é+)', I)
>>> m = p.search(a)
>>> print m.group(1)

éé

It should have matched the whole string, right?

--
Paulo Eduardo Neves
mailto:ne...@inf.puc-rio.br    Rio de Janeiro - Brasil
Pager-> Central:(021)532-4499  Cod.:213 99 64


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Guido van Rossum  
View profile  
 More options Jan 28 1998, 3:00 am
Newsgroups: comp.lang.python
From: Guido van Rossum <gu...@CNRI.Reston.Va.US>
Date: 1998/01/28
Subject: Re: Accents (Was: Re: Making Montreal match Montral using re)

No, you should have added

        >>> import locale
        >>> locale.setlocale(locale.LC_ALL, "")

at the start of your session, and passed I+L (or IGNORECASE+LOCALE) as
the flags to compile().

Unfortunately I just heard from Andrew Kuchling that the re module
doesn't do this right yet.  Nevertheless, if it *did* do it right, you
would still have to do what I said here (i.e. the locale is not used
automatically; you must call setlocale() *and* pass the L flag to
compile).

--Guido van Rossum (home page: http://www.python.org/~guido/)


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Paulo Soares  
View profile  
 More options Jan 28 1998, 3:00 am
Newsgroups: comp.lang.python
From: Paulo Soares <psoa...@consiste.pt>
Date: 1998/01/28
Subject: RE: Accents (Was: Re: Making Montreal match Montral using re)

On Wednesday, January 28, 1998 20:00, Paulo Eduardo

I have an application in Visual C++ (win95/NT) where I use the
'setlocale(LC_TIME, "portuguese")' to make sure that the result of
strftime is always a portuguese string regardless of the window version
where it runs. It works with the US version of win95 and the portuguese
one. Perhaps if you explicity set the locale it will work.

Best Regards,
Paulo Soares
psoa...@consiste.pt


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google