Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular Expression bug?

12 views
Skip to first unread message

jose isaias cabrera

unread,
Mar 2, 2023, 2:23:12 PM3/2/23
to
Greetings.

For the RegExp Gurus, consider the following python3 code:
<code>
import re
s = "pn=align upgrade sd=2023-02-"
ro = re.compile(r"pn=(.+) ")
r0=ro.match(s)
>>> print(r0.group(1))
align upgrade
</code>

This is wrong. It should be 'align' because the group only goes up-to
the space. Thoughts? Thanks.

josé

--

What if eternity is real? Where will you spend it? Hmmmm...

Chris Angelico

unread,
Mar 2, 2023, 2:28:53 PM3/2/23
to
On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com> wrote:
>
> Greetings.
>
> For the RegExp Gurus, consider the following python3 code:
> <code>
> import re
> s = "pn=align upgrade sd=2023-02-"
> ro = re.compile(r"pn=(.+) ")
> r0=ro.match(s)
> >>> print(r0.group(1))
> align upgrade
> </code>
>
> This is wrong. It should be 'align' because the group only goes up-to
> the space. Thoughts? Thanks.
>

Not a bug. Find the longest possible match that fits this; as long as
you can find a space immediately after it, everything in between goes
into the .+ part.

If you want to exclude spaces, either use [^ ]+ or .+?.

ChrisA

2QdxY4Rz...@potatochowder.com

unread,
Mar 2, 2023, 2:30:03 PM3/2/23
to
On 2023-03-02 at 14:22:41 -0500,
jose isaias cabrera <jic...@gmail.com> wrote:

> For the RegExp Gurus, consider the following python3 code:
> <code>
> import re
> s = "pn=align upgrade sd=2023-02-"
> ro = re.compile(r"pn=(.+) ")
> r0=ro.match(s)
> >>> print(r0.group(1))
> align upgrade
> </code>
>
> This is wrong. It should be 'align' because the group only goes up-to
> the space. Thoughts? Thanks.

The bug is in your regular expression; the plus modifier is greedy.

If you want to match up to the first space, then you'll need something
like [^ ] (i.e., everything that isn't a space) instead of that dot.

Mats Wichmann

unread,
Mar 2, 2023, 2:38:04 PM3/2/23
to
On 3/2/23 12:28, Chris Angelico wrote:
> On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com> wrote:
>>
>> Greetings.
>>
>> For the RegExp Gurus, consider the following python3 code:
>> <code>
>> import re
>> s = "pn=align upgrade sd=2023-02-"
>> ro = re.compile(r"pn=(.+) ")
>> r0=ro.match(s)
>>>>> print(r0.group(1))
>> align upgrade
>> </code>
>>
>> This is wrong. It should be 'align' because the group only goes up-to
>> the space. Thoughts? Thanks.
>>
>
> Not a bug. Find the longest possible match that fits this; as long as
> you can find a space immediately after it, everything in between goes
> into the .+ part.
>
> If you want to exclude spaces, either use [^ ]+ or .+?.


https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

jose isaias cabrera

unread,
Mar 2, 2023, 2:38:59 PM3/2/23
to
On Thu, Mar 2, 2023 at 2:32 PM <2QdxY4Rz...@potatochowder.com> wrote:
>
> On 2023-03-02 at 14:22:41 -0500,
> jose isaias cabrera <jic...@gmail.com> wrote:
>
> > For the RegExp Gurus, consider the following python3 code:
> > <code>
> > import re
> > s = "pn=align upgrade sd=2023-02-"
> > ro = re.compile(r"pn=(.+) ")
> > r0=ro.match(s)
> > >>> print(r0.group(1))
> > align upgrade
> > </code>
> >
> > This is wrong. It should be 'align' because the group only goes up-to
> > the space. Thoughts? Thanks.
>
> The bug is in your regular expression; the plus modifier is greedy.
>
> If you want to match up to the first space, then you'll need something
> like [^ ] (i.e., everything that isn't a space) instead of that dot.

Thanks. I appreciate your wisdom.

avi.e...@gmail.com

unread,
Mar 2, 2023, 5:41:36 PM3/2/23
to
José,

Matching can be greedy. Did it match to the last space?

What you want is a pattern that matches anything except a space (or whitespace) followed b matching a space or something similar.

Or use a construct that makes matching non-greedy.

Avi

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmai...@python.org> On Behalf Of jose isaias cabrera
Sent: Thursday, March 2, 2023 2:23 PM
To: pytho...@python.org
Subject: Regular Expression bug?

Greetings.

For the RegExp Gurus, consider the following python3 code:
<code>
import re
s = "pn=align upgrade sd=2023-02-"
ro = re.compile(r"pn=(.+) ")
r0=ro.match(s)
>>> print(r0.group(1))
align upgrade
</code>

This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks.

josé

--

What if eternity is real? Where will you spend it? Hmmmm...
--
https://mail.python.org/mailman/listinfo/python-list

jose isaias cabrera

unread,
Mar 2, 2023, 8:07:35 PM3/2/23
to
On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <ma...@wichmann.us> wrote:
>
> On 3/2/23 12:28, Chris Angelico wrote:
> > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com>
wrote:
> >>
> >> Greetings.
> >>
> >> For the RegExp Gurus, consider the following python3 code:
> >> <code>
> >> import re
> >> s = "pn=align upgrade sd=2023-02-"
> >> ro = re.compile(r"pn=(.+) ")
> >> r0=ro.match(s)
> >>>>> print(r0.group(1))
> >> align upgrade
> >> </code>
> >>
> >> This is wrong. It should be 'align' because the group only goes up-to
> >> the space. Thoughts? Thanks.
> >>
> >
> > Not a bug. Find the longest possible match that fits this; as long as
> > you can find a space immediately after it, everything in between goes
> > into the .+ part.
> >
> > If you want to exclude spaces, either use [^ ]+ or .+?.
>
> https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

This re is a bit different than the one I am used. So, I am trying to match
everything after 'pn=':

import re
s = "pm=jose pn=2017"
m0 = r"pn=(.+)"
r0 = re.compile(m0)
s0 = r0.match(s)
>>> print(s0)
None

Any help is appreciated.

avi.e...@gmail.com

unread,
Mar 2, 2023, 8:35:39 PM3/2/23
to
It is a well-known fact, Jose, that GIGO.

The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.


>>> s = "pn=jose pn=2017"
...
>>> s0 = r0.match(s)
>>> s0
<re.Match object; span=(0, 15), match='pn=jose pn=2017'>



-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmai...@python.org> On Behalf Of jose isaias cabrera
--
https://mail.python.org/mailman/listinfo/python-list

Alan Bawden

unread,
Mar 2, 2023, 8:49:10 PM3/2/23
to
Assuming that you were expecting to match "pn=2017", then you probably
don't want the 'match' method. Read its documentation. Then read the
documentation for the _other_ methods that a Pattern supports. Then you
will be enlightened.

- Alan

Cameron Simpson

unread,
Mar 2, 2023, 8:55:39 PM3/2/23
to
On 02Mar2023 20:06, jose isaias cabrera <jic...@gmail.com> wrote:
>This re is a bit different than the one I am used. So, I am trying to
>match
>everything after 'pn=':
>
>import re
>s = "pm=jose pn=2017"
>m0 = r"pn=(.+)"
>r0 = re.compile(m0)
>s0 = r0.match(s)

`match()` matches at the start of the string. You want r0.search(s).
- Cameron Simpson <c...@cskk.id.au>

jose isaias cabrera

unread,
Mar 2, 2023, 10:27:47 PM3/2/23
to
Thanks. Darn it! I knew it was something simple.

jose isaias cabrera

unread,
Mar 2, 2023, 10:35:46 PM3/2/23
to
On Thu, Mar 2, 2023 at 8:35 PM <avi.e...@gmail.com> wrote:
>
> It is a well-known fact, Jose, that GIGO.
>
> The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.

It is not GIGO. pm=project manager. pn=project name. I needed search()
rather than match().

>
> >>> s = "pn=jose pn=2017"
> ...
> >>> s0 = r0.match(s)
> >>> s0
> <re.Match object; span=(0, 15), match='pn=jose pn=2017'>
>
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmai...@python.org> On Behalf Of jose isaias cabrera
> Sent: Thursday, March 2, 2023 8:07 PM
> To: Mats Wichmann <ma...@wichmann.us>
> Cc: pytho...@python.org
> Subject: Re: Regular Expression bug?
>
> On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <ma...@wichmann.us> wrote:
> >
> > On 3/2/23 12:28, Chris Angelico wrote:
> > > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com>
> wrote:
> > >>
> > >> Greetings.
> > >>
> > >> For the RegExp Gurus, consider the following python3 code:
> > >> <code>
> > >> import re
> > >> s = "pn=align upgrade sd=2023-02-"
> > >> ro = re.compile(r"pn=(.+) ")
> > >> r0=ro.match(s)
> > >>>>> print(r0.group(1))
> > >> align upgrade
> > >> </code>
> > >>
> > >> This is wrong. It should be 'align' because the group only goes up-to
> > >> the space. Thoughts? Thanks.
> > >>
> > >
> > > Not a bug. Find the longest possible match that fits this; as long as
> > > you can find a space immediately after it, everything in between goes
> > > into the .+ part.
> > >
> > > If you want to exclude spaces, either use [^ ]+ or .+?.
> >
> > https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy
>
> This re is a bit different than the one I am used. So, I am trying to match
> everything after 'pn=':
>
> import re
> s = "pm=jose pn=2017"
> m0 = r"pn=(.+)"
> r0 = re.compile(m0)
> s0 = r0.match(s)
> >>> print(s0)
> None
>
> Any help is appreciated.
> --
> https://mail.python.org/mailman/listinfo/python-list
>


jose isaias cabrera

unread,
Mar 2, 2023, 10:37:25 PM3/2/23
to
Yes. I need search. Thanks.
0 new messages