This should make anchored RegExps a little smoother.
It might make `regexpCaptureCount` more expensive since it now creates a RegExp on each invocation. It's a RegExp that matches the empty string quicker than the old one.
May be time to start caching the capture count. WDYT?
get _nativeGlobalVersion {
(I expect that we can just drop this RegExp and use the global one instead. But that's for another CL.)
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
There is essentially the same code (with minor differences) in js_dev_runtime/private/
r'new RegExp("(?:)|" + #, #)',
Having the JavaScript runtime recompile this RegExp might be quite expensive.
perhaps
```
late final bool hasCaptures = regExpCaptureCount(this);
```
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
r'new RegExp("(?:)|" + #, #)',
Having the JavaScript runtime recompile this RegExp might be quite expensive.
What it did before was to use the `MatchAsPrefix` regexp which would compile the regexp `#|()` instead, which is probably even more expensive with the extra capture. It would only be helped by it *maybe* being compiled before if someone had used `matchAsPrefix` with the same RegExp that they're passing to `split`. Which most likely never happened. People don't reuse RegExps, they create a new one for each purpose.
At least a smart RegExp compiler might see that `(?:)|...` can ignore the `...` part of the pattern.
Caching the result is probably worth doing. It's cheap and paus for itself the moment someone uses `split` with the same RegExp twice, which is more likely than using the RegExp for something else.
The other possible approach that doesn't require compiling a RegExp is to try to parse the regexp pattern, but that's going to be fragile.
Hmm, if we don't need to know the number of capture, just whether it has one, we might be able to just scan the pattern. If seeing `\`, ignore next character, if seeing `[` skip until `]` while still ignoring character after `\`, and if seeing `(` not followed by `?`, or followed by `?<` and then not `=` or `!`, it has a capture. I think that's correct for current JS regexps, and is unlikely to change with future changes. Let me try that.
Or just not using JS `String.prototype.split` with RegExps to begin with. That avoids having to know if it has captures. Just do the looping in Dart.
perhaps
```
late final bool hasCaptures = regExpCaptureCount(this);
```
That's fine. (Personally I don't like `late`, but if you know it's efficient here, it's probably fine.)
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Would it break anything if RegExp was a factory constructor that could return an existing RegExp?
bool _regExpHasCaptures(String pattern) {
I'm not sure this carries its weight.
1. Is it correct? It makes me nervous to have multiple places that parse RegExps. I think this needs to be tested.
2. It is more lines of code to maintain and keep in sync with the DDC version.
3. It won't be all that fast until it is JIT-optimized, which might be never.
4. It is might be more bytes of .js download.
Perhaps a hybrid aproach is the sweet spot:
JavaScript RegExps are still evolving.
There are other `(?` groups, luckily non-capturing for now)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Modifier
Lasse Nielsenperhaps
```
late final bool hasCaptures = regExpCaptureCount(this);
```
That's fine. (Personally I don't like `late`, but if you know it's efficient here, it's probably fine.)
If you don't like `late`, you can use something like
```
bool? __hasCaptures;
bool get _hasCaptures => __hasCaptures ??= _regExpHasCaptures(pattern);
```
return s;
I'd revert this to avoid churning files without changes.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
bool _regExpHasCaptures(String pattern) {
I'm not sure this carries its weight.
1. Is it correct? It makes me nervous to have multiple places that parse RegExps. I think this needs to be tested.
2. It is more lines of code to maintain and keep in sync with the DDC version.
3. It won't be all that fast until it is JIT-optimized, which might be never.
4. It is might be more bytes of .js download.
Perhaps a hybrid aproach is the sweet spot:
- cache the result
- use the `exec` trick...
- ... but guard it with a simple test like `JS<int>('', '#.indexOf("(")', pattern) < 0`, that catches the majority of split RegExps.
JavaScript RegExps are still evolving.
There are other `(?` groups, luckily non-capturing for now)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Modifier
Agree. Let's use the `exec` hack.
(I'd still consider just not using JS `split` with a RegExp, but it's possible that some RegExp with no captures gets used for a alot of splits (most split RegExps won't have captures if they're taken from JS) on long texts, and then it would pay off to use the native version when possible.)
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Would it break anything if RegExp was a factory constructor that could return an existing RegExp?
Probably not. I don't think anyone extends RegExp. And if they do, they should stop.
(It's technically breaking to change a generative constructor to factory for a non-final/interface class. The class should have been final. There was probably someone implementing it for mocking somewhere.)
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
Thanks for the changes.
There is essentially the same code in lib/_internal/js_dev_runtime/private/regexp_helper.dart
Please keep the DDC version aligned with the dart2js version.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
The hacked RegExp was also used by `regexpCaptureCount`,
The text could be cleaned up or deleted.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the changes.
There is essentially the same code in lib/_internal/js_dev_runtime/private/regexp_helper.dart
Please keep the DDC version aligned with the dart2js version.
That is the plan. Doing this one first, to get it right and reviewed, so the others (DDC *and* Wasm) should be slam-dunks.
Fixed the typo in the wasm file. Lost the +1. 😞
The hacked RegExp was also used by `regexpCaptureCount`,
The text could be cleaned up or deleted.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
r'new RegExp("(?:)|" + #, #)',
Having the JavaScript runtime recompile this RegExp might be quite expensive.
What it did before was to use the `MatchAsPrefix` regexp which would compile the regexp `#|()` instead, which is probably even more expensive with the extra capture. It would only be helped by it *maybe* being compiled before if someone had used `matchAsPrefix` with the same RegExp that they're passing to `split`. Which most likely never happened. People don't reuse RegExps, they create a new one for each purpose.
At least a smart RegExp compiler might see that `(?:)|...` can ignore the `...` part of the pattern.Caching the result is probably worth doing. It's cheap and paus for itself the moment someone uses `split` with the same RegExp twice, which is more likely than using the RegExp for something else.
The other possible approach that doesn't require compiling a RegExp is to try to parse the regexp pattern, but that's going to be fragile.
Hmm, if we don't need to know the number of capture, just whether it has one, we might be able to just scan the pattern. If seeing `\`, ignore next character, if seeing `[` skip until `]` while still ignoring character after `\`, and if seeing `(` not followed by `?`, or followed by `?<` and then not `=` or `!`, it has a capture. I think that's correct for current JS regexps, and is unlikely to change with future changes. Let me try that.
Or just not using JS `String.prototype.split` with RegExps to begin with. That avoids having to know if it has captures. Just do the looping in Dart.
Done
bool _regExpHasCaptures(String pattern) {
Lasse NielsenI'm not sure this carries its weight.
1. Is it correct? It makes me nervous to have multiple places that parse RegExps. I think this needs to be tested.
2. It is more lines of code to maintain and keep in sync with the DDC version.
3. It won't be all that fast until it is JIT-optimized, which might be never.
4. It is might be more bytes of .js download.
Perhaps a hybrid aproach is the sweet spot:
- cache the result
- use the `exec` trick...
- ... but guard it with a simple test like `JS<int>('', '#.indexOf("(")', pattern) < 0`, that catches the majority of split RegExps.
JavaScript RegExps are still evolving.
There are other `(?` groups, luckily non-capturing for now)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Modifier
Agree. Let's use the `exec` hack.
(I'd still consider just not using JS `split` with a RegExp, but it's possible that some RegExp with no captures gets used for a alot of splits (most split RegExps won't have captures if they're taken from JS) on long texts, and then it would pay off to use the native version when possible.)
Done
Lasse Nielsenperhaps
```
late final bool hasCaptures = regExpCaptureCount(this);
```
Stephen AdamsThat's fine. (Personally I don't like `late`, but if you know it's efficient here, it's probably fine.)
If you don't like `late`, you can use something like
```
bool? __hasCaptures;
bool get _hasCaptures => __hasCaptures ??= _regExpHasCaptures(pattern);
```
Done
return s;
Lasse NielsenI'd revert this to avoid churning files without changes.
Done, but lost the +1 in the process.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Make Dart2JS use the `y` RegExp flag for `matchAsPrefix`.
The implementation for `matchAsPrefix` predates the sticky `y` flag
for JS regexps, so it had a hack that modified the RegExp pattern to ensure that it only matched at a particular position.
This hack is no longer necessary, so the anchored RegExp can use the
same RegExp source as the plain and global RegExp, which may let the JS engine better reuse RegExp compilation.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |