Thanks! Going through the code, this is my understanding of how to use
it (playing fast with syntax):
vector<string> atoms;
FilteredRE2 f;
// add the specified patterns to f
for_each(patterns.begin(), patterns.end(), bind(addPatternToFilter,
&f, _1));
// compile f; get the fixed-string factors back out
f.Compile(&atoms);
vector<int> responsiveFactors;
// my own Aho-Corasick, Commentz-Walter, etc
multiStringSearch(text, atoms, responsiveFactors);
// responsiveFactors contains indices of strings found in atoms
if (!responsiveFactors.empty()) {
vector<int> matchPatterns;
f.AllMatches(text, responsiveFactors, &matchPatterns); // pass in
responsiveFactors, right?
}
There are a couple things I see with this.
The first is that I don't see a good way of getting at the matches
themselves (i.e. pairs of position & length). Could I do that by
adding another search function to FilteredRE2 that uses
RE2::FindAndConsume, with an Arg array passed in?
Second, what it looks like this is doing is running only those
patterns that had hits on the factors (good), but running the
associated regexps separately in multiple passes (less good). There
also doesn't seem to be a way to pass in information about where the
factors were found. Would there be a good way to combine the
implicated regexps into a single sub-matching NFA and then making a
single pass? If you were to take a S?WAG, would that be better or
worse than taking multiple passes with the different patterns?
Many, many thanks,
Jon