[regexp] Better operand disassembly [v8/v8 : main]

0 views
Skip to first unread message

Jakob Linke (Gerrit)

unread,
6:49 AM (5 hours ago) 6:49 AM
to Patrick Thier, V8 LUCI CQ, jgrube...@chromium.org, pthier...@chromium.org, v8-re...@googlegroups.com
Attention needed from Patrick Thier

Jakob Linke added 1 comment

Patchset-level comments
Open in Gerrit

Related details

Attention is currently required from:
  • Patrick Thier
Submit Requirements:
  • requirement satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: v8/v8
Gerrit-Branch: main
Gerrit-Change-Id: Iefc667e662bffe8b8d5e14ecdbbac8d0ecdf8066
Gerrit-Change-Number: 7462504
Gerrit-PatchSet: 2
Gerrit-Owner: Jakob Linke <jgr...@chromium.org>
Gerrit-Reviewer: Jakob Linke <jgr...@chromium.org>
Gerrit-Reviewer: Patrick Thier <pth...@chromium.org>
Gerrit-Attention: Patrick Thier <pth...@chromium.org>
Gerrit-Comment-Date: Tue, 13 Jan 2026 11:49:23 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

Patrick Thier (Gerrit)

unread,
7:34 AM (4 hours ago) 7:34 AM
to Jakob Linke, V8 LUCI CQ, jgrube...@chromium.org, pthier...@chromium.org, v8-re...@googlegroups.com
Attention needed from Jakob Linke

Patrick Thier added 13 comments

Patchset-level comments
Patrick Thier . resolved

Nice! I just have a couple of suggestions and nits.

File src/regexp/regexp-bytecodes-inl.h
Line 99, Patchset 2 (Latest): if (comma == std::string_view::npos) break;
Patrick Thier . unresolved

This shouldn't ever be relevant, unless the function is called with a wrong `N`. Asserting that N actually matches what we observed would be better IMO. WDYT?

Line 94, Patchset 2 (Latest): while (!part.empty() && part.front() == ' ') part.remove_prefix(1);
while (!part.empty() && part.back() == ' ') part.remove_suffix(1);
Patrick Thier . unresolved

nit: I don't think we need these loops.
You could use `find_first_not_of`/`find_last_not_of` together with a single `substr`.

Line 89, Patchset 2 (Latest): size_t end = (comma == std::string_view::npos) ? names.size() : comma;
Patrick Thier . unresolved

nit: With my suggestions above, you could just use `names.size() - 1` here (if back == ')' is asserted as suggested).

Line 82, Patchset 2 (Latest): names.remove_prefix(1);
names.remove_suffix(1);
Patrick Thier . unresolved

nit: `substr()` would be more efficient here.
Or even better: just set `start = 1`.

Line 81, Patchset 2 (Latest): if (names.size() >= 2 && names.front() == '(' && names.back() == ')') {
Patrick Thier . unresolved

nit: This is the expected input format, so could we assert that `front == '(' and back == ')'` instead?

Line 77, Patchset 2 (Latest):constexpr std::array<std::string_view, N> SplitNames(const char* raw_names) {
Patrick Thier . unresolved

nit: `consteval`. Or is there something that prevents it that I didn't see?

File src/regexp/regexp-bytecodes.cc
Line 22, Patchset 2 (Parent): for (int i = 0; i < RegExpBytecodes::Size(bytecode); i++) {
PrintF(", %02x", pc[i]);
}
Patrick Thier . unresolved

In case something is completely broken with the bytecode, it might be useful to have the raw output. WDYT about keeping/adding the raw output behind a separate flag (e.g. `--print-regexp-bytecode-raw`)?

Line 29, Patchset 2 (Latest): PrintF("[bit table]");
Patrick Thier . unresolved

I think having the values of the bit-table is also often interesting. Can we keep it?

Line 32, Patchset 2 (Latest): pc, DisallowGarbageCollection{});
Patrick Thier . unresolved

This is a weird pattern. Please move the DisallowGC scope to the top of the method.

Line 39, Patchset 2 (Latest): PrintF("%x", val);
Patrick Thier . unresolved

nit: It would be nice to prefix the value with `0x` to avoid confusion.

Line 35, Patchset 2 (Latest): } else if constexpr (type == RegExpBytecodeOperandType::kChar) {
if (std::isprint(val)) {
PrintF("'%c'", val);
} else {
PrintF("%x", val);
Patrick Thier . unresolved

nit: While you change this, we could also switch to the more modern ostream approach. This would allow us to use `AsUC32` here.

Line 42, Patchset 2 (Latest): PrintF("%x", static_cast<uint32_t>(val));
Patrick Thier . unresolved

nit: Why static_cast all the remaining operands to uint32_t instead of using the "correct" type? I guess it is just because of the 2 enum values? Can we specialize for these types instead?
Also: With the correct types now in place, maybe we don't want to just use `%x` for every value? Maybe it makes sense to add the format specifier to the operand type list?

Open in Gerrit

Related details

Attention is currently required from:
  • Jakob Linke
Submit Requirements:
    • requirement satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: v8/v8
    Gerrit-Branch: main
    Gerrit-Change-Id: Iefc667e662bffe8b8d5e14ecdbbac8d0ecdf8066
    Gerrit-Change-Number: 7462504
    Gerrit-PatchSet: 2
    Gerrit-Owner: Jakob Linke <jgr...@chromium.org>
    Gerrit-Reviewer: Jakob Linke <jgr...@chromium.org>
    Gerrit-Reviewer: Patrick Thier <pth...@chromium.org>
    Gerrit-Attention: Jakob Linke <jgr...@chromium.org>
    Gerrit-Comment-Date: Tue, 13 Jan 2026 12:34:36 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy
    Reply all
    Reply to author
    Forward
    0 new messages