Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

re module: Nothing to repeat, but no sre_constants.error: nothing to repeat ?

1,820 views
Skip to first unread message

Devin Jeanpierre

unread,
Feb 13, 2012, 11:38:03 PM2/13/12
to pytho...@python.org
Hey Pythonistas,

Consider the regular expression "$*". Compilation fails with the
exception, "sre_constants.error: nothing to repeat".

Consider the regular expression "(?=$)*". As far as I know it is
equivalent. It does not fail to compile.

Why the inconsistency? What's going on here?

-- Devin

Vinay Sajip

unread,
Feb 14, 2012, 8:20:48 AM2/14/12
to
$ is a meta character for regular expressions. Use '\$*', which does
compile.

Regards,

Vinay Sajip

Devin Jeanpierre

unread,
Feb 14, 2012, 8:33:12 AM2/14/12
to Vinay Sajip, pytho...@python.org
On Tue, Feb 14, 2012 at 8:20 AM, Vinay Sajip <vinay...@yahoo.co.uk> wrote:
> $ is a meta character for regular expressions. Use '\$*', which does
> compile.

I mean for it to be a meta-character.

I'm wondering why it's OK for to repeat a zero-width match if it is a
zero-width assertion.

-- Devin

Vlastimil Brom

unread,
Feb 14, 2012, 10:05:51 AM2/14/12
to Devin Jeanpierre, pytho...@python.org
2012/2/14 Devin Jeanpierre <jeanpi...@gmail.com>:
> Hey Pythonistas,
>
> Consider the regular expression "$*". Compilation fails with the
> exception, "sre_constants.error: nothing to repeat".
>
> Consider the regular expression "(?=$)*". As far as I know it is
> equivalent. It does not fail to compile.
>
> Why the inconsistency? What's going on here?
>
> -- Devin
> --
> http://mail.python.org/mailman/listinfo/python-list

Hi,
I don't know the reason for the observed differences either (I can
think of some optimisation issues etc.), but just wanted to mention
some other similar patterns to your lookahaed:
It seems, that groups (capturing or not capturing) also work ok:

>>> re.findall("($)*", "abc")
['', '', '', '']
>>> re.findall("(?:$)*", "abc")
['', '', '', '']

However, is there any realistic usecase for repeated zero-width anchors?

regards,
vbr

Devin Jeanpierre

unread,
Feb 14, 2012, 10:53:38 AM2/14/12
to Vlastimil Brom, pytho...@python.org
On Tue, Feb 14, 2012 at 10:05 AM, Vlastimil Brom
<vlastim...@gmail.com> wrote:
> However, is there any realistic usecase for repeated zero-width anchors?

Maybe. There is a repeated zero-width anchor is used in the Python re
test suite, which is what made me notice this. I assume that came from
some actual use-case. (see:
http://hg.python.org/cpython/file/096e856a01aa/Lib/test/test_re.py#l599
)

And yeah, even something as crazy as ()* works, but as soon as it
becomes (a*)* it doesn't work. Weird.

-- Devin

MRAB

unread,
Feb 14, 2012, 1:05:59 PM2/14/12
to pytho...@python.org
I think it's a combination of warning the user about something that's
pointless,
as in the case of "$*", and producing a pattern which could cause the
internal
regex engine to get stuck in an infinite loop.

It is inconsistent in that it warns about "$*" but not "(?=$)*" even
though they are basically equivalent.

Devin Jeanpierre

unread,
Feb 14, 2012, 8:43:05 PM2/14/12
to pytho...@python.org
On Tue, Feb 14, 2012 at 1:05 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
>> And yeah, even something as crazy as ()* works, but as soon as it
>> becomes (a*)* it doesn't work. Weird.
>>
> I think it's a combination of warning the user about something that's
> pointless,
> as in the case of "$*", and producing a pattern which could cause the
> internal
> regex engine to get stuck in an infinite loop.

Considering that ()* works fine, I can't imagine it ever gets stuck in
infinite loops. But I admit I am too lazy to check against the
interpreter.

Also, complete failure is an exceptionally (heh) poor way of warning
people about stuff. I hope that's not really it.

-- Devin

MRAB

unread,
Feb 14, 2012, 9:08:57 PM2/14/12
to pytho...@python.org
There is one place in the re engine where it tries to avoid getting
stuck in an infinite loop because of a zero-width match, but the fix
inadvertently causes another bug. It's described in issue #1647489.

Devin Jeanpierre

unread,
Feb 15, 2012, 8:43:37 AM2/15/12
to pytho...@python.org
On Tue, Feb 14, 2012 at 9:08 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> There is one place in the re engine where it tries to avoid getting
> stuck in an infinite loop because of a zero-width match, but the fix
> inadvertently causes another bug. It's described in issue #1647489.

Just read the issue. Interesting, didn't know that was a bug rather
than deliberate behavior. The other behavior (only match empty space
once) makes more sense though. Thanks for linking.

Still, that's for avoiding infinite loops in finditer/findall, not
match/search :S

-- Devin
0 new messages