Real-life examples of very large PCRE2 patterns?

37 views
Skip to first unread message

Alex Dowad

unread,
Aug 6, 2024, 5:12:32 PM8/6/24
to PCRE2 discussion list
Hi, everyone,

I am just implementing a new performance optimization in PCRE2 which rewrites certain regexes after parsing but before compilation. (See https://github.com/PCRE2Project/pcre2/issues/411.)

Please let me know: Do you know of any real-life, 'production' uses of PCRE2 where the regex is very large? Such patterns would be helpful as test cases to assess the performance cost of the new rewriting pass. ("Very large" might mean thousands of bytes, tens of thousands of bytes, or more.)

Any responses will be much appreciated. Good day, all... Alex Dowad

David Wahlstedt

unread,
Aug 10, 2024, 7:02:16 PM8/10/24
to PCRE2 discussion list
I would also appreciate examples! I have written a PCRE2 parser (not matching, just parsing) in Haskell, and want more examples to try out.
There are some in https://github.com/antlr/grammars-v4/tree/master/pcre/examples, but they are for PCRE1, although most of them work for PCRE2 as well. The misc.txt is quite big.
I made some "dummy" example for all the \p{script:name} properties and "binary" properties (\p{ascii} etc), by generating it from pcre2test -LS and -LP, and editing in emacs, just concatenate all \p{ } properites and it gets quite large. Of course it's not a "real life" example...

Best regards,

David
Reply all
Reply to author
Forward
0 new messages