Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why re.match()?

1 view
Skip to first unread message

kj

unread,
Jul 1, 2009, 1:56:55 PM7/1/09
to

For a recovering Perl-head like me it is difficult to understand
why Python's re module offers both match and search. Why not just
use search with a beginning-of-string anchor? I find it particularly
puzzling because I have this (possibly mistaken) idea that the
Python design philosophy tends towards minimalism, a sort of Occam's
razor, when it comes to language entities; i.e. having re.match
along with re.search seems to me like an "unnecessary multiplication
of entities". What am I missing?

TIA!

kj

Duncan Booth

unread,
Jul 1, 2009, 2:52:34 PM7/1/09
to
kj <no.e...@please.post> wrote:

RTFM?

http://docs.python.org/library/re.html#matching-vs-searching says:

> Note that match may differ from search even when using a regular
> expression beginning with '^': '^' matches only at the start of the
> string, or in MULTILINE mode also immediately following a newline. The
> �match� operation succeeds only if the pattern matches at the start of
> the string regardless of mode, or at the starting position given by
> the optional pos argument regardless of whether a newline precedes
> it.

So, for example:

>>> re.compile("c").match("abcdef", 2)
<_sre.SRE_Match object at 0x0000000002C09B90>
>>> re.compile("^c").search("abcdef", 2)
>>>

The ^ anchors you to the start of the string (or start of line in multiline
mode) even if you specify a non-zero position. The match method just tells
you if the pattern matches at the specified starting position regardless of
whether it is the start of the string.

Carl Banks

unread,
Jul 1, 2009, 6:33:01 PM7/1/09
to

It always seemed redundant to me also (notwithstanding Duncan Booth's
explanation of slight semantic differences). However, I find myself
using re.match much more often than re.search, so perhaps in this case
a "second obvious way" is justified.


Carl Banks

MRAB

unread,
Jul 1, 2009, 7:50:16 PM7/1/09
to pytho...@python.org
re.match is anchored at a certain position, whereas re.search isn't. The
re module doesn't support Perl's \G anchor.

John Machin

unread,
Jul 1, 2009, 9:20:29 PM7/1/09
to

re.search is obstinately determined when it comes to flogging dead
horses:

C:\junk>\python26\python -mtimeit -s"import re" "re.match('xxx',
'y'*100)"
100000 loops, best of 3: 3.37 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.search('^xxx',
'y'*100)"
100000 loops, best of 3: 7.01 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.match('xxx',
'y'*1000)"
100000 loops, best of 3: 3.85 usec per loop

C:\junk>\python26\python -mtimeit -s"import re" "re.search('^xxx',
'y'*1000)"
10000 loops, best of 3: 37.9 usec per loop

C:\junk>

kj

unread,
Jul 1, 2009, 11:49:57 PM7/1/09
to
In <Xns9C3BCA27AB...@127.0.0.1> Duncan Booth <duncan...@invalid.invalid> writes:
>So, for example:

>>>> re.compile("c").match("abcdef", 2)
><_sre.SRE_Match object at 0x0000000002C09B90>
>>>> re.compile("^c").search("abcdef", 2)
>>>>

I find this unconvincing; with re.search alone one could simply
do:

>>> re.compile("^c").search("abcdef"[2:])
<_sre.SRE_Match object at 0x75918>

No need for re.match(), at least as far as your example shows.

Maybe there are times when re.match() is more "convenient" in some
way, but it is awfully Perlish to "multiply language elements" for
the sake of this trifling convenience.

kynn

Steven D'Aprano

unread,
Jul 2, 2009, 12:14:36 AM7/2/09
to
On Thu, 02 Jul 2009 03:49:57 +0000, kj wrote:

> In <Xns9C3BCA27AB...@127.0.0.1> Duncan Booth
> <duncan...@invalid.invalid> writes:
>>So, for example:
>
>>>>> re.compile("c").match("abcdef", 2)
>><_sre.SRE_Match object at 0x0000000002C09B90>
>>>>> re.compile("^c").search("abcdef", 2)
>>>>>
>>>>>
> I find this unconvincing; with re.search alone one could simply do:
>
>>>> re.compile("^c").search("abcdef"[2:])
> <_sre.SRE_Match object at 0x75918>
>
> No need for re.match(), at least as far as your example shows.

Your source string "abcdef" is tiny. Consider the case where the source
string is 4GB of data. You want to duplicate the whole lot, minus two
characters. Not so easy now.


> Maybe there are times when re.match() is more "convenient" in some way,
> but it is awfully Perlish to "multiply language elements" for the sake
> of this trifling convenience.

No, it's absolutely Pythonic.


>>> import this
...
Although practicality beats purity.

--
Steven

kj

unread,
Jul 2, 2009, 7:19:40 AM7/2/09
to

>On Thu, 02 Jul 2009 03:49:57 +0000, kj wrote:

>> In <Xns9C3BCA27AB...@127.0.0.1> Duncan Booth
>> <duncan...@invalid.invalid> writes:
>>>So, for example:
>>
>>>>>> re.compile("c").match("abcdef", 2)
>>><_sre.SRE_Match object at 0x0000000002C09B90>
>>>>>> re.compile("^c").search("abcdef", 2)
>>>>>>
>>>>>>
>> I find this unconvincing; with re.search alone one could simply do:
>>
>>>>> re.compile("^c").search("abcdef"[2:])
>> <_sre.SRE_Match object at 0x75918>
>>
>> No need for re.match(), at least as far as your example shows.

>Your source string "abcdef" is tiny. Consider the case where the source
>string is 4GB of data. You want to duplicate the whole lot, minus two
>characters. Not so easy now.

I'm sure that it is possible to find cases in which the *current*
implementation of re.search() would be inefficient, but that's
because this implementation is perverse, which, I guess, is ultimately
the point of my original post. Why privilege the special case of
a start-of-string anchor? What if you wanted to apply an end-anchored
pattern to some prefix of your 4GB string? Why not have a special
re method for that? And another for every possible special case?

If the concern is efficiency for such cases, then simply implement
optional offset and length parameters for re.search(), to specify
any arbitrary substring to apply the search to. To have a special-case
re.match() method in addition to a general re.search() method is
antithetical to language minimalism, and plain-old bizarre. Maybe
there's a really good reason for it, but it has not been mentioned
yet.

kj

Hrvoje Niksic

unread,
Jul 2, 2009, 7:55:39 AM7/2/09
to
kj <no.e...@please.post> writes:

> For a recovering Perl-head like me it is difficult to understand
> why Python's re module offers both match and search. Why not just
> use search with a beginning-of-string anchor?

I need re.match when parsing the whole string. In that case I never
want to search through the string, but process the whole string with
some regulat expression, for example when tokenizing. For example:

pos = 0
while pos != len(s):
match = TOKEN_RE.match(s, pos)
if match:
process_token(match)
pos = match.end()
else:
raise ParseError('invalid syntax at position %d' % pos)

Steven D'Aprano

unread,
Jul 3, 2009, 4:19:23 AM7/3/09
to
On Thu, 02 Jul 2009 11:19:40 +0000, kj wrote:

> I'm sure that it is possible to find cases in which the *current*
> implementation of re.search() would be inefficient, but that's because
> this implementation is perverse, which, I guess, is ultimately the point
> of my original post. Why privilege the special case of a
> start-of-string anchor?

Because wanting to see if a string matches from the beginning is a very
important and common special case.


> What if you wanted to apply an end-anchored
> pattern to some prefix of your 4GB string? Why not have a special re
> method for that? And another for every possible special case?

Because they're not common special cases. They're rare and not special
enough to justify special code.


> If the concern is efficiency for such cases, then simply implement
> optional offset and length parameters for re.search(), to specify any
> arbitrary substring to apply the search to. To have a special-case
> re.match() method in addition to a general re.search() method is
> antithetical to language minimalism, and plain-old bizarre. Maybe
> there's a really good reason for it, but it has not been mentioned yet.

There is, and it has. You're welcome to keep your own opinion, but I
don't think you'll find many experienced Python coders will agree with it.


--
Steven

kj

unread,
Jul 3, 2009, 9:38:20 AM7/3/09
to
In <025db0a6$0$20657$c3e...@news.astraweb.com> Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> writes:

>On Thu, 02 Jul 2009 11:19:40 +0000, kj wrote:

>> If the concern is efficiency for such cases, then simply implement
>> optional offset and length parameters for re.search(), to specify any
>> arbitrary substring to apply the search to. To have a special-case
>> re.match() method in addition to a general re.search() method is
>> antithetical to language minimalism, and plain-old bizarre. Maybe
>> there's a really good reason for it, but it has not been mentioned yet.

>There is, and it has.

I "misspoke" earlier. I should have written "I'm *sure* there's
a really good reason for it." And I think no one in this thread
(myself included, of course) has a clue of what it is. I miss the
days when Guido still posted to comp.lang.python. He'd know.

Regarding the "practicality beats purity" line, it's hard to think
of a better motto for *Perl*, with all its practicality-oriented
special doodads. (And yes, I know where the "practicality beats
purity" line comes from.) Even *Perl* does not have a special
syntax for the task that re.match is supposedly tailor-made for,
according to the replies I've received. Given that it is so trivial
to implement all of re.match's functionality with only one additional
optional parameter for re.search (i.e. pos), it is absurd to claim
that re.match is necessary for the sake of this special functionality.
The justification for re.match must be elsewhere.

But thanks for letting me know that I'm entitled to my opinion.
That's a huge relief.

kj

MRAB

unread,
Jul 3, 2009, 9:53:42 AM7/3/09
to pytho...@python.org
As I wrote, re.match anchors the match whereas re.search doesn't. An
alternative would have been to implement Perl's \G anchor, but I believe
that that was invented after the re module was written.

Aahz

unread,
Jul 3, 2009, 10:31:36 AM7/3/09
to
In article <h2l1kc$2tj$1...@reader1.panix.com>, kj <no.e...@please.post> wrote:
>In <025db0a6$0$20657$c3e...@news.astraweb.com> Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> writes:
>>On Thu, 02 Jul 2009 11:19:40 +0000, kj wrote:
>>>
>>> If the concern is efficiency for such cases, then simply implement
>>> optional offset and length parameters for re.search(), to specify any
>>> arbitrary substring to apply the search to. To have a special-case
>>> re.match() method in addition to a general re.search() method is
>>> antithetical to language minimalism, and plain-old bizarre. Maybe
>>> there's a really good reason for it, but it has not been mentioned yet.
>>
>>There is, and it has.
>
>I "misspoke" earlier. I should have written "I'm *sure* there's
>a really good reason for it." And I think no one in this thread
>(myself included, of course) has a clue of what it is. I miss the
>days when Guido still posted to comp.lang.python. He'd know.

You may find this enlightening:

http://www.python.org/doc/1.4/lib/node52.html
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha

Bruno Desthuilliers

unread,
Jul 3, 2009, 11:22:40 AM7/3/09
to
kj a �crit :
(snipo

> To have a special-case
> re.match() method in addition to a general re.search() method is
> antithetical to language minimalism,

FWIW, Python has no pretention to minimalism.

kj

unread,
Jul 3, 2009, 12:30:33 PM7/3/09
to
In <h2l4o8$sn3$1...@panix3.panix.com> aa...@pythoncraft.com (Aahz) writes:

>In article <h2l1kc$2tj$1...@reader1.panix.com>, kj <no.e...@please.post> wrote:

>You may find this enlightening:

>http://www.python.org/doc/1.4/lib/node52.html

Indeed. Thank you.

kj

Lie Ryan

unread,
Jul 3, 2009, 12:34:49 PM7/3/09
to
Steven D'Aprano wrote:
> On Thu, 02 Jul 2009 11:19:40 +0000, kj wrote:
>
>> I'm sure that it is possible to find cases in which the *current*
>> implementation of re.search() would be inefficient, but that's because
>> this implementation is perverse, which, I guess, is ultimately the point
>> of my original post. Why privilege the special case of a
>> start-of-string anchor?
>
> Because wanting to see if a string matches from the beginning is a very
> important and common special case.
>

I find the most oddest thing about re.match is that it have an implicit
beginning anchor, but not implicit end anchor. I thought it was much
more common to ensure that a string matches a certain pattern, than just
matching the beginning. But everyone's mileages vary.

kj

unread,
Jul 6, 2009, 4:40:49 PM7/6/09
to

Assuming that you mean by this that Python's authors have no such
pretensions:

"There is real value in having a small language."

Guido van Rossum, 2007.07.03
http://mail.python.org/pipermail/python-3000/2007-July/008663.html

So there.

BTW, that's just one example. I've seen similar sentiments expressed
by Guido over and over and over: any new proposed enhancement to
Python must be good enough in his mind to justify cluttering the
language. That attitude counts as minimalism in my book.

The best explanation I have found so far for re.match is that it
is an unfortunate bit of legacy, something that would not be there
if the design of Python did not have to be mindful of keeping old
code chugging along...

kj

Diez B. Roggisch

unread,
Jul 6, 2009, 5:10:28 PM7/6/09
to
kj schrieb:

language != libraries.

Diez

Rhodri James

unread,
Jul 6, 2009, 7:16:00 PM7/6/09
to pytho...@python.org
On Mon, 06 Jul 2009 21:40:49 +0100, kj <no.e...@please.post> wrote:

> In <4a4e2227$0$7801$426a...@news.free.fr> Bruno Desthuilliers
> <bruno.42.de...@websiteburo.invalid> writes:
>

>> kj a écrit :


>> (snipo
>>> To have a special-case
>>> re.match() method in addition to a general re.search() method is
>>> antithetical to language minimalism,
>
>> FWIW, Python has no pretention to minimalism.
>
> Assuming that you mean by this that Python's authors have no such
> pretensions:
>
> "There is real value in having a small language."
>
> Guido van Rossum, 2007.07.03
> http://mail.python.org/pipermail/python-3000/2007-July/008663.html

re.match() is part of the library, not the language. The standard
library is in no sense of the word small. It has a mild tendency
to avoid repeating itself, but presumably the stonkingly obvious
optimisation possibilities of re.match() over re.search() are
considered worth the (small) increase in size.

--
Rhodri James *-* Wildebeest Herder to the Masses

Terry Reedy

unread,
Jul 6, 2009, 11:50:08 PM7/6/09
to pytho...@python.org
kj wrote:

> "There is real value in having a small language."
>
> Guido van Rossum, 2007.07.03
> http://mail.python.org/pipermail/python-3000/2007-July/008663.html
>
> So there.

small != minimal

>
> BTW, that's just one example. I've seen similar sentiments expressed
> by Guido over and over and over: any new proposed enhancement to
> Python must be good enough in his mind to justify cluttering the
> language. That attitude counts as minimalism in my book.
>
> The best explanation I have found so far for re.match is that it
> is an unfortunate bit of legacy, something that would not be there
> if the design of Python did not have to be mindful of keeping old
> code chugging along...

It is possible that someone proposed removing re.match for 3.0, but I do
not remember any such discussion. Some things were dropped when that
contraint was (teporarily) dropped.

tjr

Bruno Desthuilliers

unread,
Jul 7, 2009, 7:51:11 AM7/7/09
to
kj a �crit :

> In <4a4e2227$0$7801$426a...@news.free.fr> Bruno Desthuilliers <bruno.42.de...@websiteburo.invalid> writes:
>
>> kj a �crit :
>> (snipo
>>> To have a special-case
>>> re.match() method in addition to a general re.search() method is
>>> antithetical to language minimalism,
>
>> FWIW, Python has no pretention to minimalism.
>
> Assuming that you mean by this that Python's authors have no such
> pretensions:
>
> "There is real value in having a small language."
>
> Guido van Rossum, 2007.07.03
> http://mail.python.org/pipermail/python-3000/2007-July/008663.html

There are some differences between "small" and "minimal"...

> So there.
>
> BTW, that's just one example. I've seen similar sentiments expressed
> by Guido over and over and over: any new proposed enhancement to
> Python must be good enough in his mind to justify cluttering the
> language. That attitude counts as minimalism in my book.

And in mine, it counts as "keeping the language's evolution under
control" - which is still not the same thing as being "minimalist". If
Python really was on the "minimalist" side, you wouldn't even have
"class" or "def" statements - both being mostly syntactic sugar. And
let's not even talk about @decorator syntax...

Paul Rudin

unread,
Jul 7, 2009, 8:03:54 AM7/7/09
to
Bruno Desthuilliers <bruno.42.de...@websiteburo.invalid> writes:

> kj a écrit :


>> In <4a4e2227$0$7801$426a...@news.free.fr> Bruno Desthuilliers <bruno.42.de...@websiteburo.invalid> writes:
>>

>>> kj a écrit :


>>> (snipo
>>>> To have a special-case
>>>> re.match() method in addition to a general re.search() method is
>>>> antithetical to language minimalism,
>>
>>> FWIW, Python has no pretention to minimalism.
>>
>> Assuming that you mean by this that Python's authors have no such
>> pretensions:
>>
>> "There is real value in having a small language."
>>
>> Guido van Rossum, 2007.07.03
>> http://mail.python.org/pipermail/python-3000/2007-July/008663.html
>
> There are some differences between "small" and "minimal"...
>

There's also a difference between the language and its standard
library.

Bruno Desthuilliers

unread,
Jul 7, 2009, 9:23:49 AM7/7/09
to
Paul Rudin a écrit :

Indeed.

pdpi

unread,
Jul 7, 2009, 9:37:02 AM7/7/09
to
On Jul 2, 4:49 am, kj <no.em...@please.post> wrote:

> In <Xns9C3BCA27ABC36duncanbo...@127.0.0.1> Duncan Booth <duncan.bo...@invalid.invalid> writes:
>
> >So, for example:
> >>>> re.compile("c").match("abcdef", 2)
> ><_sre.SRE_Match object at 0x0000000002C09B90>
> >>>> re.compile("^c").search("abcdef", 2)
>
> I find this unconvincing; with re.search alone one could simply
> do:
>
> >>> re.compile("^c").search("abcdef"[2:])

given large enough values of "abcdef", you just allocated several megs
for no good reason, when re.compile("c").match("abcdef", 2) would
process "abcdef" in-place.

0 new messages