A thing to note is that RE2's DFA has a total maximum alphabet size of
257, instead of 256 as you might expect. (In practice, the alphabet
size is much smaller due to combining bytes into equivalence classes.
But the max is 257.) The reason is that at the end of a search, if
you've reached the end of the haystack, you need to run the DFA once
more on a special sentinel transition that represents the end of the
input. In RE2's DFA, that's called 'kByteEndText'[1], and it is used
as the final transition here[2].
That kByteEndText transition is necessary to make $ and \b work, since
both of those assertions can be satisfied when at the end of the
input. But as the second link above shows, when a search completes, it
doesn't always necessarily run on the special kByteEndText sentinel
transition. It can also run on the byte immediately following the end
of the input[3]. Recall that RE2's DFA searches within a substring of
the input[4]. So for example, if you have the string, c='foobar\r\n'
and the caller wants to search for matches of '\w$' within the
substring c[0:6], then that should report a match. If you've modified
RE2 to have '$' recognize either '\r\n' or '\n' as line endings, then
the only way for this to work is if your search reads two bytes past
the end of the haystack and into the surrounding context.
With that said, re-reading your initial post, it sounds like you're
okay with just '\r' being treated as a line ending. In which case,
maybe you _can_ get away with not scanning two bytes past the end of
the search text. (Although, IIRC, only very old Mac systems use a bare
'\r' as a line ending.)
I still have a nagging feeling in my mind that something isn't quite
right though. I think I would just advise to think through this and
write a lot of tests for it. And in particular, write tests that
search a sub-string of a larger string, as permitted by the DFA's API.
Good luck!
[1] -
https://github.com/google/re2/blob/f8e389f3acdc2517562924239e2a188037393683/re2/dfa.cc#L131
[2] -
https://github.com/google/re2/blob/f8e389f3acdc2517562924239e2a188037393683/re2/dfa.cc#L1485-L1498
[3] -
https://github.com/google/re2/blob/f8e389f3acdc2517562924239e2a188037393683/re2/dfa.cc#L1490
[4] -
https://github.com/google/re2/blob/13ebb377c6ad763ca61d12dd6f88b1126bd0b911/re2/dfa.cc#L1822-L1824
> --
> You received this message because you are subscribed to the Google Groups "re2-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
re2-dev+u...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/re2-dev/2b243c9d-8a59-4236-8c26-d556eac47a33n%40googlegroups.com.