We made a couple of changes to re2j that improved performance for us considerably. First was in the machine cache, where there was a tremendous amount of thread contention (we had ~120 threads running at a time). Fixing this issue resulted in several
times speedup for re2j. Another that got about a 10% speed up was replacing an array list with an array. We're still waiting for our corporate approval to go through to contribute to the project, but once we have it we'll submit pull requests with these changes.
We are also looking at porting over the other regular expression matching engines from the C++ version of RE2. The C++ version has four linear time regular expression engines, each of which is useful for its own set of cases. The NFA engine, which is
implemented in RE2J is the slowest, but most versatile of those. If we have other engines in place, we should see considerable speed up for RE2J for most regular expressions.
Alan- if you're serious about wanting to give up the project, our group at Teradata would be happy to take it over.
Rebecca