Someone asked me why they would want to use RE2 rather than PCRE. I couldn't find any rationale section in the RE2 README or on the GitHub Wiki. The best I found was on Wikipedia here: https://en.wikipedia.org/wiki/RE2_(software)#Comparison_to_PCRE
Would it be OK to include a link to this page in the README, or perhaps make a new GitHub Wiki page about this?I'm asking here per https://github.com/google/re2/wiki/Contribute because it's unclear if GitHub Wiki pages fall into this process.
Please do; I'm not familiar with either library well enough to write such a page (which is why I was looking for an authoritative answer to the "why would I want to use RE2?" question, rather than answering it myself).
Safety is RE2's raison d'être.
RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.
It is not a goal to be faster than all other engines under all circumstances. Although RE2 guarantees linear-time performance, the linear-time constant varies depending on the overhead entailed by safe handling of the regular expression. In a sense, RE2 behaves pessimistically whereas backtracking engines behave optimistically, so it can be outperformed in various situations.
It is also not a goal to implement all of the features offered by Perl, PCRE and other engines. As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported.
For more information, please refer to Russ Cox's articles on regular expression theory and praxis: