As I plan to use the RE module to parser long strings with long pattern,
does anyone know what are the built-in (or practical) limitations of
this module.
Namely, is there a limit on pattern and processed string length, number
of groups and level of group nesting.
Thanks,
Laurent
There are no limits (other than those imposed by the size of virtual
memory :-) on the pattern or string length. You can only have 99
groups because there's no way to backreference groups with longer
numbers; I'm not sure if you can have more groups when you use named
groups or unreferenceable groups. (Andrew?)
--Guido van Rossum (home page: http://www.python.org/~guido/)
If you want to be able to determine the extent of a group's
matched text, you have to use capturing groups, denoted by ( and ).
You can only have 99 of those; if you try to use more, you get an
exception. Named groups count as capturing groups, too.
If you don't care about the extent of the matched text, you
can use a non-capturing group (?: ). You can have any number of
these.
In answer to the original poster's question, no, you shouldn't
run into any unreasonable limits on pattern size, string size, or
nesting of groups. (The code uses C's signed int a lot, so it'll
probably break if you were to run it on strings larger than 2
gigabytes. That limit seems high enough, and I have no way of working
with strings that large, anyway.) The only limit is that one of 99
capturing groups.
Note that you'll have to be more careful with the patterns you
write. If you write something like '.*foo' (instead of just 'foo')
when attempting to find 'foo' in a string, it's slow, but you may not
notice it with 80-byte strings. It will cause severe speed problems
with very large strings. So you'll have to time your patterns and try
to optimize them as best you can.
--
A.M. Kuchling http://starship.skyport.net/crew/amk/
It was like a bad TV show. "He's a reincarnated serial killer -- his partner's
a bird. They're cops."
-- Matthew the raven, in SANDMAN #65: "The Kindly Ones:9"