Why the limit on regex repeat is 1000?

565 views
Skip to first unread message

M Hasbini

unread,
Jun 6, 2021, 2:12:41 AM6/6/21
to golang-nuts
Playground: https://play.golang.org/p/opVpDD5Ts8S

Here's an example regex that fails to compile: `[a-zA-Z0-9]{1001,}`


Other languages regex engine behavior: The regex is valid on all languages in https://regex101.com/r/JzGrYG/1 except Go.

Is there a reason for this limit or is this a bug?

Rob Pike

unread,
Jun 6, 2021, 4:15:24 AM6/6/21
to M Hasbini, golang-nuts
It is to protect the regexp engine from overly expensive computations, as the repetition can introduce quadratic behavior in the compiler. The Go engine is concerned about pathological execution - not all engines have this property (see https://swtch.com/~rsc/regexp/regexp1.html) - and is being careful.

The 1000 is arbitrary but - speaking just for myself - a repeat count as high as a thousand tells me that a regular expression is the wrong mechanism for this problem. A simple loop would be vastly more efficient (see https://commandcenter.blogspot.com/2011/08/regular-expressions-in-lexing-and.html). You are using a steamroller to press a shirt.

-rob


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f22fb807-7611-4185-bc2b-e7a8f5d8856an%40googlegroups.com.

Alberto Donizetti

unread,
Jun 6, 2021, 4:18:49 AM6/6/21
to golang-nuts
Probably not a bug, since it's documented:


> Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum
> or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.

I don't know the reason for such restriction. If I had to guess, I'd say that's because the DFA approach
Go uses for regexp requires counting repetitions to be expanded into states (they're not implemented
using a loop), so some kind of limit is needed to ensure the resulting automata is not too big.

Alberto

Rob Pike

unread,
Jun 6, 2021, 4:22:06 AM6/6/21
to Alberto Donizetti, golang-nuts
The reason is that the only way to implement repetition in an NFA or DFA is literally to repeat the expression, so asking for `a{6}` means to generate the machine for `aaaaaa`. It is a notational convenience only, a macro mechanism if you will. Other "regexp" packages, which do not adhere to the clear, theoretically strong properties of regular languages, are free to use other techniques, at the cost of potentially catastrophic execution times.

-rob


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Dan Kortschak

unread,
Jun 6, 2021, 4:30:11 AM6/6/21
to golan...@googlegroups.com
On Sun, 2021-06-06 at 18:14 +1000, Rob Pike wrote:
> You are using a steamroller to press a shirt.

Tomi Ungerer has already published this approach.


https://kotonoha-books.ocnk.net/data/kotonoha-books/product/20160619_70ddf5.JPG


Reply all
Reply to author
Forward
0 new messages