Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexp match

5 views
Skip to first unread message

Tcl Bliss

unread,
Sep 26, 2010, 7:42:39 PM9/26/10
to
I have hit a limit of 250 max characters matched when using the {m,n}
construct of regexp.
When using the following regexp:

regexp {(\A.{1000}?)(.+\Z)} $string

I get this error: couldn't compile regular expression pattern: invalid
repetition count(s)
I can go up to 250 before encountering the error.

Is this normal?

Thanks

Alexandre Ferrieux

unread,
Sep 27, 2010, 3:25:31 AM9/27/10
to

Though disappointing, this is a consequence of the Tcl RE engine
compiling to deterministic automata, with some hard-wired limits
inside. In most cases, the limit is a welcome safeguard because of the
intrinsic explosion of automaton size as soon as a bit of non-
determinism creeps in.

In your case, the next interesting thing to do is to remove that non-
determinism. Unfortunately, even the hand-writable automaton for

{^a{1000}b$}

still hits the hard-wired limit. Sigh.
(it would seem smarter to put a limit on total automaton size, rather
than on quantifiers, but I'm not the maintainer of the RE engine ;-)
Maybe file a feature request ?

-Alex

Tcl Bliss

unread,
Sep 27, 2010, 10:52:16 AM9/27/10
to
On Sep 27, 12:25 am, Alexandre Ferrieux <alexandre.ferri...@gmail.com>
wrote:

Ok, so it is hard-wired and it is normal. There is probably a
performance hit then for anything larger than 250. I can still use
[string range] to do the same job. Regexp was just much more compact.

> Maybe file a feature request ?

I would if I understood half of what you said :) Can I just copy your
words?

Message has been deleted
0 new messages