Hi,
I found a bounds violation in DFA::WorkqToCachedState that affects RE2::Set.
The buffer allocated at dfa.cc:618 is too small to hold everything written
into it when mq != NULL.
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/dfa.cc#L618
The allocation is PODArray<int> inst(q->size()). For kManyMatch, Workq::size()
returns prog_->size() exactly -- nmark is zero for this mode (marks are only
allocated for kLongestMatch), so there is no slack. The main loop is correctly
bounded: the DCHECK_LE at dfa.cc:665 confirms n <= q->size() after it exits.
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/dfa.cc#L665
The problem is what comes after. At dfa.cc:726-732, when mq != NULL, the code
appends a MatchSep sentinel and one match_id per kInstMatch instruction in mq,
with no bounds check on either write:
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/dfa.cc#L726
inst[n++] = MatchSep;
for (Workq::iterator i = mq->begin(); i != mq->end(); ++i) {
Prog::Inst* ip = prog_->inst(*i);
if (ip->opcode() == kInstMatch)
inst[n++] = ip->match_id(); // n may already equal q->size()
}
Total writes are n_after_main_loop + 1 + match_id_count, which can exceed
q->size(). The overwritten bytes are then passed to CachedState at dfa.cc:739
and copied into a heap-allocated state via memmove(s->inst_, inst, n*sizeof(int))
at dfa.cc:791.
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/dfa.cc#L739
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/dfa.cc#L791
mq != NULL only ever occurs at one call site, dfa.cc:1103-1104, exclusively in
kManyMatch mode when a match fires during RunStateOnByte. This path is only
reachable via RE2::Set.
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/dfa.cc#L1103
Why it does not crash in practice today: the RE2::Set DotStar prefix adds
roughly 7-10 instructions to prog_->size() that never become list heads,
giving n an accidental margin below q->size() after the main loop. This is not
an invariant; it is a side-effect of the current CompileSet instruction layout.
Fix: the mq block appends at most 1 + prog_->inst_count(kInstMatch) elements.
inst_count_ is populated during Flatten() at prog.cc:617 and exposed via
prog_->inst_count(kInstMatch) at prog.h:235.
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/prog.cc#L617
https://github.com/google/re2/blob/972a15cedd008d846f1a39b2e88ce48d7f166cbd/re2/prog.h#L235
Proposed change at the allocation site (dfa.cc:618):
- PODArray<int> inst(q->size());
+ int inst_cap = q->size();
+ if (mq != NULL)
+ inst_cap += 1 + prog_->inst_count(kInstMatch);
+ PODArray<int> inst(inst_cap);
mq is a parameter to WorkqToCachedState so it is known at the point of
allocation. Using prog_->inst_count(kInstMatch) keeps the bound tight: for an
N-pattern RE2::Set that value equals N, whereas mq->size() equals prog_->size()
which is considerably larger.
Happy to send a patch.
--
Ali Raza (@locus-x64)