Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Regular Expression, what's the limit?

7 views

Skip to first unread message

Laurent Szyster

unread,

May 11, 1998, 3:00:00 AM5/11/98

Hello All,

As I plan to use the RE module to parser long strings with long pattern,
does anyone know what are the built-in (or practical) limitations of
this module.

Namely, is there a limit on pattern and processed string length, number
of groups and level of group nesting.

Thanks,

Laurent

Guido van Rossum

unread,

May 11, 1998, 3:00:00 AM5/11/98

There are no limits (other than those imposed by the size of virtual
memory :-) on the pattern or string length. You can only have 99
groups because there's no way to backreference groups with longer
numbers; I'm not sure if you can have more groups when you use named
groups or unreferenceable groups. (Andrew?)

--Guido van Rossum (home page: http://www.python.org/~guido/)

Andrew Kuchling

unread,

May 11, 1998, 3:00:00 AM5/11/98

Guido van Rossum writes:
>> Namely, is there a limit on pattern and processed string length, number
>> of groups and level of group nesting.
>There are no limits (other than those imposed by the size of virtual
>memory :-) on the pattern or string length. You can only have 99
>groups because there's no way to backreference groups with longer
>numbers; I'm not sure if you can have more groups when you use named
>groups or unreferenceable groups. (Andrew?)

If you want to be able to determine the extent of a group's
matched text, you have to use capturing groups, denoted by ( and ).
You can only have 99 of those; if you try to use more, you get an
exception. Named groups count as capturing groups, too.

If you don't care about the extent of the matched text, you
can use a non-capturing group (?: ). You can have any number of
these.

In answer to the original poster's question, no, you shouldn't
run into any unreasonable limits on pattern size, string size, or
nesting of groups. (The code uses C's signed int a lot, so it'll
probably break if you were to run it on strings larger than 2
gigabytes. That limit seems high enough, and I have no way of working
with strings that large, anyway.) The only limit is that one of 99
capturing groups.

Note that you'll have to be more careful with the patterns you
write. If you write something like '.*foo' (instead of just 'foo')
when attempting to find 'foo' in a string, it's slow, but you may not
notice it with 80-byte strings. It will cause severe speed problems
with very large strings. So you'll have to time your patterns and try
to optimize them as best you can.

--
A.M. Kuchling http://starship.skyport.net/crew/amk/
It was like a bad TV show. "He's a reincarnated serial killer -- his partner's
a bird. They're cops."
-- Matthew the raven, in SANDMAN #65: "The Kindly Ones:9"

0 new messages