trailing match all optimization

21 views

Skip to first unread message

Alkis Evlogimenos

unread,

Mar 14, 2026, 7:21:45 AMMar 14

to re2-dev

I want to implement an optimization for trailing match all expressions. An example is global replace operations like say host extraction from a url: ^https?://(?:www\.)?([^/]+)/.*$' —> ‘\1’. The engine will match the group, reach the match all at the end and spend linear time walking it even when it doesn’t have to.

I attempted to implement this optimization here https://github.com/google/re2/pull/607. I tried to keep it simple:
- after parsing the regexp, find if the optimization applies: trailing greedy match all, dot matches newline, oneline mode
- rebuild a stripped regexp without the match all and flag this in Prog. If there was a capture group, note its index
- run matching on the stripped regexp and use the unmatched portion of the input string to assign to the group if any

The approach is simple and verified by tests and benchmarks that it works. It has two drawbacks though:
1. after stripping the trailing match all from the compiled regex there is a round trip to string and compilation again
2. each RE2 object holds a second RE2 object if the optimization applies

Questions:
1. Is this optimization desirable?
2. If yes, is this implementation approach acceptable? Is there a better way?

Cheers,

Reply all

Reply to author

Forward

0 new messages