Regular Expression bug?

jose isaias cabrera

unread,

Mar 2, 2023, 2:23:12 PM3/2/23

to

Greetings.

For the RegExp Gurus, consider the following python3 code:
<code>
import re
s = "pn=align upgrade sd=2023-02-"
ro = re.compile(r"pn=(.+) ")
r0=ro.match(s)
>>> print(r0.group(1))
align upgrade
</code>

This is wrong. It should be 'align' because the group only goes up-to
the space. Thoughts? Thanks.

josé

--

What if eternity is real? Where will you spend it? Hmmmm...

Chris Angelico

unread,

Mar 2, 2023, 2:28:53 PM3/2/23

to

On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com> wrote:
>
> Greetings.
>
> For the RegExp Gurus, consider the following python3 code:
> <code>
> import re
> s = "pn=align upgrade sd=2023-02-"
> ro = re.compile(r"pn=(.+) ")
> r0=ro.match(s)
> >>> print(r0.group(1))
> align upgrade
> </code>
>
> This is wrong. It should be 'align' because the group only goes up-to
> the space. Thoughts? Thanks.
>

Not a bug. Find the longest possible match that fits this; as long as
you can find a space immediately after it, everything in between goes
into the .+ part.

If you want to exclude spaces, either use [^ ]+ or .+?.

ChrisA

2QdxY4Rz...@potatochowder.com

unread,

Mar 2, 2023, 2:30:03 PM3/2/23

to

On 2023-03-02 at 14:22:41 -0500,

jose isaias cabrera <jic...@gmail.com> wrote:

> For the RegExp Gurus, consider the following python3 code:
> <code>
> import re
> s = "pn=align upgrade sd=2023-02-"
> ro = re.compile(r"pn=(.+) ")
> r0=ro.match(s)
> >>> print(r0.group(1))
> align upgrade
> </code>
>
> This is wrong. It should be 'align' because the group only goes up-to
> the space. Thoughts? Thanks.

The bug is in your regular expression; the plus modifier is greedy.

If you want to match up to the first space, then you'll need something
like [^ ] (i.e., everything that isn't a space) instead of that dot.

Mats Wichmann

unread,

Mar 2, 2023, 2:38:04 PM3/2/23

to

On 3/2/23 12:28, Chris Angelico wrote:
> On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com> wrote:
>>
>> Greetings.

>>
>> For the RegExp Gurus, consider the following python3 code:
>> <code>
>> import re
>> s = "pn=align upgrade sd=2023-02-"
>> ro = re.compile(r"pn=(.+) ")
>> r0=ro.match(s)
>>>>> print(r0.group(1))
>> align upgrade
>> </code>
>>
>> This is wrong. It should be 'align' because the group only goes up-to
>> the space. Thoughts? Thanks.
>>
>

> Not a bug. Find the longest possible match that fits this; as long as
> you can find a space immediately after it, everything in between goes
> into the .+ part.
>
> If you want to exclude spaces, either use [^ ]+ or .+?.

https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

jose isaias cabrera

unread,

Mar 2, 2023, 2:38:59 PM3/2/23

to

On Thu, Mar 2, 2023 at 2:32 PM <2QdxY4Rz...@potatochowder.com> wrote:
>
> On 2023-03-02 at 14:22:41 -0500,

> jose isaias cabrera <jic...@gmail.com> wrote:
>
> > For the RegExp Gurus, consider the following python3 code:
> > <code>
> > import re
> > s = "pn=align upgrade sd=2023-02-"
> > ro = re.compile(r"pn=(.+) ")
> > r0=ro.match(s)
> > >>> print(r0.group(1))
> > align upgrade
> > </code>
> >
> > This is wrong. It should be 'align' because the group only goes up-to
> > the space. Thoughts? Thanks.
>

> The bug is in your regular expression; the plus modifier is greedy.
>
> If you want to match up to the first space, then you'll need something
> like [^ ] (i.e., everything that isn't a space) instead of that dot.

Thanks. I appreciate your wisdom.

avi.e...@gmail.com

unread,

Mar 2, 2023, 5:41:36 PM3/2/23

to

José,

Matching can be greedy. Did it match to the last space?

What you want is a pattern that matches anything except a space (or whitespace) followed b matching a space or something similar.

Or use a construct that makes matching non-greedy.

Avi

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmai...@python.org> On Behalf Of jose isaias cabrera
Sent: Thursday, March 2, 2023 2:23 PM
To: pytho...@python.org
Subject: Regular Expression bug?

Greetings.

For the RegExp Gurus, consider the following python3 code:
<code>
import re
s = "pn=align upgrade sd=2023-02-"
ro = re.compile(r"pn=(.+) ")
r0=ro.match(s)
>>> print(r0.group(1))
align upgrade
</code>

This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks.

josé

--

What if eternity is real? Where will you spend it? Hmmmm...

--
https://mail.python.org/mailman/listinfo/python-list

jose isaias cabrera

unread,

Mar 2, 2023, 8:07:35 PM3/2/23

to

On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <ma...@wichmann.us> wrote:
>
> On 3/2/23 12:28, Chris Angelico wrote:

> > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com>
wrote:
> >>

> >> Greetings.
> >>
> >> For the RegExp Gurus, consider the following python3 code:
> >> <code>
> >> import re
> >> s = "pn=align upgrade sd=2023-02-"
> >> ro = re.compile(r"pn=(.+) ")
> >> r0=ro.match(s)
> >>>>> print(r0.group(1))
> >> align upgrade
> >> </code>
> >>
> >> This is wrong. It should be 'align' because the group only goes up-to
> >> the space. Thoughts? Thanks.
> >>
> >

> > Not a bug. Find the longest possible match that fits this; as long as
> > you can find a space immediately after it, everything in between goes
> > into the .+ part.
> >
> > If you want to exclude spaces, either use [^ ]+ or .+?.
>
> https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

This re is a bit different than the one I am used. So, I am trying to match
everything after 'pn=':

import re
s = "pm=jose pn=2017"
m0 = r"pn=(.+)"
r0 = re.compile(m0)
s0 = r0.match(s)
>>> print(s0)
None

Any help is appreciated.

avi.e...@gmail.com

unread,

Mar 2, 2023, 8:35:39 PM3/2/23

to

It is a well-known fact, Jose, that GIGO.

The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.

>>> s = "pn=jose pn=2017"
...
>>> s0 = r0.match(s)
>>> s0
<re.Match object; span=(0, 15), match='pn=jose pn=2017'>

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmai...@python.org> On Behalf Of jose isaias cabrera

--
https://mail.python.org/mailman/listinfo/python-list

Alan Bawden

unread,

Mar 2, 2023, 8:49:10 PM3/2/23

to

Assuming that you were expecting to match "pn=2017", then you probably
don't want the 'match' method. Read its documentation. Then read the
documentation for the _other_ methods that a Pattern supports. Then you
will be enlightened.

- Alan

Cameron Simpson

unread,

Mar 2, 2023, 8:55:39 PM3/2/23

to

On 02Mar2023 20:06, jose isaias cabrera <jic...@gmail.com> wrote:
>This re is a bit different than the one I am used. So, I am trying to
>match
>everything after 'pn=':
>
>import re
>s = "pm=jose pn=2017"
>m0 = r"pn=(.+)"
>r0 = re.compile(m0)
>s0 = r0.match(s)

`match()` matches at the start of the string. You want r0.search(s).
- Cameron Simpson <c...@cskk.id.au>

jose isaias cabrera

unread,

Mar 2, 2023, 10:27:47 PM3/2/23

to

Thanks. Darn it! I knew it was something simple.

jose isaias cabrera

unread,

Mar 2, 2023, 10:35:46 PM3/2/23

to

On Thu, Mar 2, 2023 at 8:35 PM <avi.e...@gmail.com> wrote:
>
> It is a well-known fact, Jose, that GIGO.
>
> The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.

It is not GIGO. pm=project manager. pn=project name. I needed search()
rather than match().

>
> >>> s = "pn=jose pn=2017"
> ...
> >>> s0 = r0.match(s)
> >>> s0
> <re.Match object; span=(0, 15), match='pn=jose pn=2017'>
>
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmai...@python.org> On Behalf Of jose isaias cabrera
> Sent: Thursday, March 2, 2023 8:07 PM
> To: Mats Wichmann <ma...@wichmann.us>
> Cc: pytho...@python.org
> Subject: Re: Regular Expression bug?
>

> On Thu, Mar 2, 2023 at 2:38 PM Mats Wichmann <ma...@wichmann.us> wrote:
> >

> > On 3/2/23 12:28, Chris Angelico wrote:

> > > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jic...@gmail.com>
> wrote:
> > >>
> > >> Greetings.
> > >>
> > >> For the RegExp Gurus, consider the following python3 code:
> > >> <code>
> > >> import re
> > >> s = "pn=align upgrade sd=2023-02-"
> > >> ro = re.compile(r"pn=(.+) ")
> > >> r0=ro.match(s)
> > >>>>> print(r0.group(1))
> > >> align upgrade
> > >> </code>
> > >>
> > >> This is wrong. It should be 'align' because the group only goes up-to
> > >> the space. Thoughts? Thanks.
> > >>
> > >
> > > Not a bug. Find the longest possible match that fits this; as long as
> > > you can find a space immediately after it, everything in between goes
> > > into the .+ part.
> > >
> > > If you want to exclude spaces, either use [^ ]+ or .+?.
> >
> > https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy
>

> This re is a bit different than the one I am used. So, I am trying to match
> everything after 'pn=':
>
> import re
> s = "pm=jose pn=2017"
> m0 = r"pn=(.+)"
> r0 = re.compile(m0)
> s0 = r0.match(s)

> >>> print(s0)
> None
>
> Any help is appreciated.
> --
> https://mail.python.org/mailman/listinfo/python-list
>

jose isaias cabrera

unread,

Mar 2, 2023, 10:37:25 PM3/2/23

to

Yes. I need search. Thanks.