extern macro LoadTrustedPointerFromObject(Moved here from regexp-match.tq
const kRegExpDataIndirectPointerTag:Moved to js-regexp.tq
extern macro LoadTrustedPointerFromObject(Moved to base.tq
const kJSRegExpRegExpDataOffset:Moved to js-regexp.tq
bool IsLineTerminator(int c) {Moved without modifications to regexp.cc
const kJSRegExpRegExpDataOffset:Moved from regexp.tq
const kRegExpDataIndirectPointerTag:Moved from regexp-match.tq
bool IsLineTerminator(int c) {Moved from js-regexp.cc (without change).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thank you, this must've been tedious to fix! Comments below are mostly about confusion between unmodified/escaped source vars. My two main requests are to 1) clearly distinguish between the two in RegExpData, and 2) pass RegExpData (or a small string container struct) in regexp.cc methods so each local callsite can pick the appropriate variant to use.
extern macro LoadTrustedPointerFromObject(Jakob LinkeMoved here from regexp-match.tq
Acknowledged
extern macro LoadTrustedPointerFromObject(Jakob LinkeMoved here from regexp-match.tq
Acknowledged
LoadObjectField<String>(data, RegExpData::kSourceOffset);Should this be escaped_source?
LoadObjectField<String>(data, RegExpData::kSourceOffset);Same here.
const kRegExpDataIndirectPointerTag:Jakob LinkeMoved to js-regexp.tq
Acknowledged
extern macro LoadTrustedPointerFromObject(Jakob LinkeMoved to base.tq
Acknowledged
const kJSRegExpRegExpDataOffset:Jakob LinkeMoved to js-regexp.tq
Acknowledged
const kJSRegExpRegExpDataOffset:Jakob LinkeMoved to js-regexp.tq
Acknowledged
inline Tagged<String> source(IsolateForSandbox isolate) const;Please remind us here that this is the escaped version.
bool IsLineTerminator(int c) {Moved without modifications to regexp.cc
Acknowledged
const kJSRegExpRegExpDataOffset:Jakob LinkeMoved from regexp.tq
Acknowledged
const kRegExpDataIndirectPointerTag:Jakob LinkeMoved from regexp-match.tq
Acknowledged
return data.escaped_source;It's confusing that RegExpData.escaped_source is "renamed" to source on JSRegExp, especially since RegExpData has another var named `source`. I think it would help a lot to make the names in RegExpData very explicit.
Wdyt about eg `unmodified_source`, `escaped_source`?
bool IsLineTerminator(int c) {Moved from js-regexp.cc (without change).
Acknowledged
Compile(isolate, &zone, &compile_data, flags, escaped_pattern,This now passes the escaped version which ends up flowing into https://source.chromium.org/chromium/chromium/src/+/main:v8/src/regexp/regexp.cc;l=1206;drc=10b6082ae57b34c1d623275ed00dae987fb2a590
Was this on purpose? Not a huge change, but potentially confusing.
DirectHandle<String> pattern(re_data->escaped_source(), isolate);We now pass `source` when the callee uses the string for parsing, and `escaped_source` for logging etc. This is totally implicit though in the signatures, and not clear which is appropriate (especially since the parameters for some callees are still called just `source`).
Wdyt about passing re_data instead and using the appropriate string directly at the use site? Some callees may even want to use both.
DCHECK(pattern->IsFlat());This must hold for both variants, right? If so, we could dcheck that in the RegExpData getters.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the quick review. PTAL again
LoadObjectField<String>(data, RegExpData::kSourceOffset);Should this be escaped_source?
Ah sorry forgot to leave a comment on the CL for this change.
It was intentional.
The behaviour is the same (We don't escape already escaped characters for the source and the parser handles both escaped/unescaped), but now the code matches the spec text (https://tc39.es/ecma262/#sec-regexp-pattern-flags `4.a Let P be pattern.[[OriginalSource]].`)
This also makes the intended behaviour more obvious in my opinion (I had to triple check that the previous version was actually working as intended).
LoadObjectField<String>(data, RegExpData::kSourceOffset);Same here.
Same reasoning as above, matches the spec text (https://tc39.es/ecma262/#sec-regexp.prototype.compile `3.b Let P be pattern.[[OriginalSource]].`)
inline Tagged<String> source(IsolateForSandbox isolate) const;Please remind us here that this is the escaped version.
It's confusing that RegExpData.escaped_source is "renamed" to source on JSRegExp, especially since RegExpData has another var named `source`. I think it would help a lot to make the names in RegExpData very explicit.
Wdyt about eg `unmodified_source`, `escaped_source`?
This just matches the previous behaviour.
I would propose:
`JSRegExp.source` to match `RegExp.prototype.source`, `RegExpData.original_source` and `RegExpData.escaped_source` to distinguish them in `RegExpData`.
Since this is another pretty big change (I think) I would like to do it in a follow-up.
Compile(isolate, &zone, &compile_data, flags, escaped_pattern,This now passes the escaped version which ends up flowing into https://source.chromium.org/chromium/chromium/src/+/main:v8/src/regexp/regexp.cc;l=1206;drc=10b6082ae57b34c1d623275ed00dae987fb2a590
Was this on purpose? Not a huge change, but potentially confusing.
Yes. That matches the previous behaviour (before the introduction of `RegExpData`).
DirectHandle<String> pattern(re_data->escaped_source(), isolate);We now pass `source` when the callee uses the string for parsing, and `escaped_source` for logging etc. This is totally implicit though in the signatures, and not clear which is appropriate (especially since the parameters for some callees are still called just `source`).
Wdyt about passing re_data instead and using the appropriate string directly at the use site? Some callees may even want to use both.
Yeah at least the names should be explicit (going to change this in the follow-up where I am going to rename `RegExpData.source`.
Will also evaluate where it makes sense to just pass `RegExpData` instead.
This must hold for both variants, right? If so, we could dcheck that in the RegExpData getters.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
LoadObjectField<String>(data, RegExpData::kSourceOffset);Patrick ThierShould this be escaped_source?
Ah sorry forgot to leave a comment on the CL for this change.
It was intentional.
The behaviour is the same (We don't escape already escaped characters for the source and the parser handles both escaped/unescaped), but now the code matches the spec text (https://tc39.es/ecma262/#sec-regexp-pattern-flags `4.a Let P be pattern.[[OriginalSource]].`)
This also makes the intended behaviour more obvious in my opinion (I had to triple check that the previous version was actually working as intended).
Acknowledged
LoadObjectField<String>(data, RegExpData::kSourceOffset);Patrick ThierSame here.
Same reasoning as above, matches the spec text (https://tc39.es/ecma262/#sec-regexp.prototype.compile `3.b Let P be pattern.[[OriginalSource]].`)
Acknowledged
return data.escaped_source;Patrick ThierIt's confusing that RegExpData.escaped_source is "renamed" to source on JSRegExp, especially since RegExpData has another var named `source`. I think it would help a lot to make the names in RegExpData very explicit.
Wdyt about eg `unmodified_source`, `escaped_source`?
This just matches the previous behaviour.
I would propose:
`JSRegExp.source` to match `RegExp.prototype.source`, `RegExpData.original_source` and `RegExpData.escaped_source` to distinguish them in `RegExpData`.
Since this is another pretty big change (I think) I would like to do it in a follow-up.
Right, it matches previous behavior but what I meant is that it's especially confusing now that JSRegExp and RegExpData both have `source`, but one is escaped and the other not.
Addressing in a followup is okay by me, although one (perhaps easy) alternative would be to swap the RegExpData fields in this commit:
Perhaps that's just a matter of a few find/replace? Anyway, up to you.
Compile(isolate, &zone, &compile_data, flags, escaped_pattern,Patrick ThierThis now passes the escaped version which ends up flowing into https://source.chromium.org/chromium/chromium/src/+/main:v8/src/regexp/regexp.cc;l=1206;drc=10b6082ae57b34c1d623275ed00dae987fb2a590
Was this on purpose? Not a huge change, but potentially confusing.
Yes. That matches the previous behaviour (before the introduction of `RegExpData`).
Are you saying that the state before this patch (which passes `re_data->source()`) was wrong and we originally passed the escaped source here? Do you perhaps have a pointer to a cl that introduced the change?
(If so, it's a bit weird to use the escaped version for TooMuchRegExpCode since it penalizes any pattern that happens to need a lot of escaping.)
DirectHandle<String> pattern(re_data->escaped_source(), isolate);Patrick ThierWe now pass `source` when the callee uses the string for parsing, and `escaped_source` for logging etc. This is totally implicit though in the signatures, and not clear which is appropriate (especially since the parameters for some callees are still called just `source`).
Wdyt about passing re_data instead and using the appropriate string directly at the use site? Some callees may even want to use both.
Yeah at least the names should be explicit (going to change this in the follow-up where I am going to rename `RegExpData.source`.
Will also evaluate where it makes sense to just pass `RegExpData` instead.
Ack. Fwiw I *think* I prefer the latter just to make the uses less confusing.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
return data.escaped_source;Patrick ThierIt's confusing that RegExpData.escaped_source is "renamed" to source on JSRegExp, especially since RegExpData has another var named `source`. I think it would help a lot to make the names in RegExpData very explicit.
Wdyt about eg `unmodified_source`, `escaped_source`?
Jakob LinkeThis just matches the previous behaviour.
I would propose:
`JSRegExp.source` to match `RegExp.prototype.source`, `RegExpData.original_source` and `RegExpData.escaped_source` to distinguish them in `RegExpData`.
Since this is another pretty big change (I think) I would like to do it in a follow-up.
Right, it matches previous behavior but what I meant is that it's especially confusing now that JSRegExp and RegExpData both have `source`, but one is escaped and the other not.
Addressing in a followup is okay by me, although one (perhaps easy) alternative would be to swap the RegExpData fields in this commit:
- source stores the escaped version and corresponds to JSRegExp.source.
- original_source the unescaped version.
Perhaps that's just a matter of a few find/replace? Anyway, up to you.
Yeah I agree that this is a bit confusing. Follow-up is already prepared. I prefer the explicit version. Otherwise I think we have the same problem where `source()` suddenly is escaped where we didn't expect it.
Compile(isolate, &zone, &compile_data, flags, escaped_pattern,Patrick ThierThis now passes the escaped version which ends up flowing into https://source.chromium.org/chromium/chromium/src/+/main:v8/src/regexp/regexp.cc;l=1206;drc=10b6082ae57b34c1d623275ed00dae987fb2a590
Was this on purpose? Not a huge change, but potentially confusing.
Jakob LinkeYes. That matches the previous behaviour (before the introduction of `RegExpData`).
Are you saying that the state before this patch (which passes `re_data->source()`) was wrong and we originally passed the escaped source here? Do you perhaps have a pointer to a cl that introduced the change?
(If so, it's a bit weird to use the escaped version for TooMuchRegExpCode since it penalizes any pattern that happens to need a lot of escaping.)
Previously (before the CL that accidently changed the behaviour; linked below) we just passed the escaped (`JSRegExp::source()`).
This is the CL: https://crrev.com/c/5713168
I don't think either of them is "wrong" since `TooMuchRegExpCode` is just a very rough heuristic (and diffs between `original_source()` and `escaped_source()` are probably very rare anyways).
Anyways in the follow-up I already changed it again to `original_source()`, since that makes more sense.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
+Dominik PTAL as owner of `src/profiler/heap-snapshot-generator.cc`
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
return data.escaped_source;Patrick ThierIt's confusing that RegExpData.escaped_source is "renamed" to source on JSRegExp, especially since RegExpData has another var named `source`. I think it would help a lot to make the names in RegExpData very explicit.
Wdyt about eg `unmodified_source`, `escaped_source`?
Jakob LinkeThis just matches the previous behaviour.
I would propose:
`JSRegExp.source` to match `RegExp.prototype.source`, `RegExpData.original_source` and `RegExpData.escaped_source` to distinguish them in `RegExpData`.
Since this is another pretty big change (I think) I would like to do it in a follow-up.
Patrick ThierRight, it matches previous behavior but what I meant is that it's especially confusing now that JSRegExp and RegExpData both have `source`, but one is escaped and the other not.
Addressing in a followup is okay by me, although one (perhaps easy) alternative would be to swap the RegExpData fields in this commit:
- source stores the escaped version and corresponds to JSRegExp.source.
- original_source the unescaped version.
Perhaps that's just a matter of a few find/replace? Anyway, up to you.
Yeah I agree that this is a bit confusing. Follow-up is already prepared. I prefer the explicit version. Otherwise I think we have the same problem where `source()` suddenly is escaped where we didn't expect it.
Yeah, I meant as an incremental step :) sgtm.
Compile(isolate, &zone, &compile_data, flags, escaped_pattern,Patrick ThierThis now passes the escaped version which ends up flowing into https://source.chromium.org/chromium/chromium/src/+/main:v8/src/regexp/regexp.cc;l=1206;drc=10b6082ae57b34c1d623275ed00dae987fb2a590
Was this on purpose? Not a huge change, but potentially confusing.
Jakob LinkeYes. That matches the previous behaviour (before the introduction of `RegExpData`).
Patrick ThierAre you saying that the state before this patch (which passes `re_data->source()`) was wrong and we originally passed the escaped source here? Do you perhaps have a pointer to a cl that introduced the change?
(If so, it's a bit weird to use the escaped version for TooMuchRegExpCode since it penalizes any pattern that happens to need a lot of escaping.)
Previously (before the CL that accidently changed the behaviour; linked below) we just passed the escaped (`JSRegExp::source()`).
This is the CL: https://crrev.com/c/5713168
I don't think either of them is "wrong" since `TooMuchRegExpCode` is just a very rough heuristic (and diffs between `original_source()` and `escaped_source()` are probably very rare anyways).
Anyways in the follow-up I already changed it again to `original_source()`, since that makes more sense.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
[regexp] Move JSRegExp::source() to RegExpData::escaped_source()
- Move JSRegExp::source() to RegExpData::escaped_source().
- Remove RegExpBoilerplateDescription::source.
- Pass RegExpData::escaped_source() instead of RegExpData::source()
through the compile pipeline (used by loggers).
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |