Rationale for RE2 vs PCRE?

2,588 views
Skip to first unread message

epe...@google.com

unread,
Sep 21, 2016, 5:13:12 PM9/21/16
to re2-dev
Someone asked me why they would want to use RE2 rather than PCRE. I couldn't find any rationale section in the RE2 README or on the GitHub Wiki. The best I found was on Wikipedia here: https://en.wikipedia.org/wiki/RE2_(software)#Comparison_to_PCRE

Would it be OK to include a link to this page in the README, or perhaps make a new GitHub Wiki page about this?

I'm asking here per https://github.com/google/re2/wiki/Contribute because it's unclear if GitHub Wiki pages fall into this process.

Paul Wankadia

unread,
Sep 22, 2016, 12:29:59 AM9/22/16
to Etienne Perot, re2...@googlegroups.com
On Thu, Sep 22, 2016 at 7:13 AM, eperot via re2-dev <re2...@googlegroups.com> wrote:

Someone asked me why they would want to use RE2 rather than PCRE. I couldn't find any rationale section in the RE2 README or on the GitHub Wiki. The best I found was on Wikipedia here: https://en.wikipedia.org/wiki/RE2_(software)#Comparison_to_PCRE

I'm surprised that the Wikipedia page doesn't link to Russ' papers ([1], [2], [3]) on regular expression theory and praxis.

Would it be OK to include a link to this page in the README, or perhaps make a new GitHub Wiki page about this?

I'm asking here per https://github.com/google/re2/wiki/Contribute because it's unclear if GitHub Wiki pages fall into this process.

Thanks for asking. A wiki page for this sounds like a good idea. Please go ahead and create one if you like or else I can do it. :)

Etienne Perot

unread,
Sep 22, 2016, 2:31:24 PM9/22/16
to Paul Wankadia, re2...@googlegroups.com
A wiki page for this sounds like a good idea. Please go ahead and create one if you like or else I can do it. :)

Please do; I'm not familiar with either library well enough to write such a page (which is why I was looking for an authoritative answer to the "why would I want to use RE2?" question, rather than answering it myself).

 Thanks!

Paul Wankadia

unread,
Sep 23, 2016, 3:48:56 AM9/23/16
to Etienne Perot, re2...@googlegroups.com, Russ Cox
On Fri, Sep 23, 2016 at 4:30 AM, Etienne Perot <epe...@google.com> wrote:

Please do; I'm not familiar with either library well enough to write such a page (which is why I was looking for an authoritative answer to the "why would I want to use RE2?" question, rather than answering it myself).



Safety is RE2's raison d'être.

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

It is not a goal to be faster than all other engines under all circumstances. Although RE2 guarantees linear-time performance, the linear-time constant varies depending on the overhead entailed by safe handling of the regular expression. In a sense, RE2 behaves pessimistically whereas backtracking engines behave optimistically, so it can be outperformed in various situations.

It is also not a goal to implement all of the features offered by Perl, PCRE and other engines. As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported.

For more information, please refer to Russ Cox's articles on regular expression theory and praxis:

Russ Cox

unread,
Sep 23, 2016, 8:06:06 AM9/23/16
to Paul Wankadia, Etienne Perot, re2...@googlegroups.com
LGTM

Etienne Perot

unread,
Sep 23, 2016, 2:05:08 PM9/23/16
to Paul Wankadia, re2...@googlegroups.com
Thanks!
Reply all
Reply to author
Forward
0 new messages