Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Replacing YARR

418 views
Skip to first unread message

Jan de Mooij

unread,
Jan 2, 2014, 9:46:04 AM1/2/14
to JS Internals list
Back in 2010, we imported the YARR regular expression engine from JSC [0].
It has served us well over the years, but with all the optimizations to the
rest of the engine, regular expression performance is becoming a bottleneck
again. When YARR is able to JIT a regular expression, performance is mostly
on par with V8. However, when we can't compile a regexp, we're stuck in the
interpreter and become very slow.

Unfortunately, YARR is unable to JIT some regular expressions used in
popular JS libraries like jQuery [1]. The main problem is that YARR can't
compile regexps with nested parenthesized groups. As I understand it, this
is a pretty fundamental issue that requires a major refactoring. The
upstream WebKit bug has had no activity for over 3 years [2].

There's also a problem with "quantity 1 subpatterns that are copies" that
affects a Peacekeeper email validation regular expression [3] and is the
only reason for us being slower than Chrome on the Peacekeeper
stringValidateForm test [4].

To address these issues, we have the following options:
(1) Fix YARR ourselves, either upstream or locally.
(2) Switch from YARR to V8's irregexp engine.
(3) Write something ourselves, probably based on V8's irregexp.

(1) will be hard; I don't think we have somebody familiar enough with YARR
to do a refactoring of this size. It could be an option though.

For (2), we'd have to write a layer mapping V8's macro assembler calls to
our own macro assembler. Unfortunately, unlike SM and JSC, V8 has more
platform-specific code and we'd have to do this work for different
platforms. I'm not sure what other dependencies there are on other parts of
the V8 engine.

Personally, I like (3): it's not a small task, but it'd finally give us a
regexp engine that integrates well with the rest of the engine. This also
means we can dump JSC's macro-assembler (JM used it as well, but is also
gone) and use the one we wrote for the baseline/Ion JITs. It'd also
integrate much better than Yarr in terms of code style and data structures.
If we base it on irregexp, we should be able to avoid most pitfalls or
design problems.

What do people think?

Jan

[0] https://bugzilla.mozilla.org/show_bug.cgi?id=564953
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=929507
[2] https://bugs.webkit.org/show_bug.cgi?id=42264
[3] https://bugs.webkit.org/show_bug.cgi?id=122891
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=692009

Andreas Gal

unread,
Jan 2, 2014, 10:01:07 AM1/2/14
to Jan de Mooij, JS Internals list

I am not confident that regexp performance is enough of a key investment area for us to justify (3). (2) sounds like a viable option to me, though we will have to investigate the platform bindings as you said. I remember Lars bragging that their regexp engine compiles are possible cases of regexps, so at least this will be the last time we have to do this. Plus, we will automatically always be performance competitive with Chrome. Thats an important strategic approach here.

Thanks,

Andreas
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Nicolas B. Pierron

unread,
Jan 2, 2014, 10:31:56 AM1/2/14
to
On 01/02/2014 06:46 AM, Jan de Mooij wrote:
> (3) Write something ourselves, probably based on V8's irregexp.
>
> […]
>
> Personally, I like (3): it's not a small task, but it'd finally give us a
> regexp engine that integrates well with the rest of the engine. This also
> means we can dump JSC's macro-assembler (JM used it as well, but is also
> gone) and use the one we wrote for the baseline/Ion JITs. It'd also
> integrate much better than Yarr in terms of code style and data structures.
> If we base it on irregexp, we should be able to avoid most pitfalls or
> design problems.
>
> What do people think?

I do not know much about YARR, but I saw a few sec-critical patches related
to it before. At the moment, on octane, we still have to go from JITed code
-> C++ -> YARR's code.

I think we can do better by implementing a regexp compiler which *targets*
JavaScript instead of assembly. By generating asm.js-like code for regexp,
we can have good generated code and we can easily apply all the rules we
already have for calling / *inlining* them in Ion code.

This approach has obvious disadvantages:
- we serialize the code in JavaScript.

On the other hand:
- we reduce the surface of attacks.
- we can test it on other JS engines.
- we do not have to add extra logic for inlining regex.

In addition, one might even suggest that we self-host the regex compiler. If
this is not a performance issue, I think this would be a good idea as the
first persons who are willing to get features in the JavaScript Regex engine
are JavaScript users.

--
Nicolas B. Pierron

Tom Schuster

unread,
Jan 2, 2014, 10:32:15 AM1/2/14
to Andreas Gal, Jan de Mooij, JS Internals list
I looked into option (2) for a few hours, so probably less than Jan
probably did. Something I noticed is that keeping the fork up to date would
probably be very painful. They implement some objects based on their
different class based JS object mechanism. They have a Handle<> that
behaves differently. Overall I find their code fits much less into ours
compared to yarr/jsc.The work for porting v8 might actually not be much
less compared to implemented something very similar to how they do it.
> > (3) Write something ourselves, probably based on V8's irregexp.
> >
> > (1) will be hard; I don't think we have somebody familiar enough with
> YARR
> > to do a refactoring of this size. It could be an option though.
> >
> > For (2), we'd have to write a layer mapping V8's macro assembler calls to
> > our own macro assembler. Unfortunately, unlike SM and JSC, V8 has more
> > platform-specific code and we'd have to do this work for different
> > platforms. I'm not sure what other dependencies there are on other parts
> of
> > the V8 engine.
> >
> > Personally, I like (3): it's not a small task, but it'd finally give us a
> > regexp engine that integrates well with the rest of the engine. This also
> > means we can dump JSC's macro-assembler (JM used it as well, but is also
> > gone) and use the one we wrote for the baseline/Ion JITs. It'd also
> > integrate much better than Yarr in terms of code style and data
> structures.
> > If we base it on irregexp, we should be able to avoid most pitfalls or
> > design problems.
> >
> > What do people think?
> >

smaug

unread,
Jan 2, 2014, 10:39:28 AM1/2/14
to
On 01/02/2014 05:01 PM, Andreas Gal wrote:
>
> I am not confident that regexp performance is enough of a key investment area for us to justify (3). (2) sounds like a viable option to me, though
> we will have to investigate the platform bindings as you said. I remember Lars bragging that their regexp engine compiles are possible cases of
> regexps, so at least this will be the last time we have to do this. Plus, we will automatically always be performance competitive with Chrome.
> Thats an important strategic approach here.

With (2) it becomes hard to beat v8. Also, if (2) requires significant work to make it integrate well with SM, doesn't
it pretty much become (3). (Though, I'm not a SpiderMonkey hacker)

But anyhow, I agree something needs to be done. regexp slowness shows up in too many profiles these days.


-Olli

Brendan Eich

unread,
Jan 2, 2014, 11:14:11 AM1/2/14
to dev-tech-js-en...@lists.mozilla.org
smaug wrote:
> With (2) it becomes hard to beat v8.

Neutralizing V8 is probably both sufficient to compete, and the best we
can do with our hackers on hand to work on such projects. Winning at
regexp perf or breaking some speed-of-sound barrier not yet breached by
any engine does not seem like a priority, although it would be
interesting research. (E.g., using SIMD instructions for regex
execution, something Intel has researched.)

> Also, if (2) requires significant work to make it integrate well with
> SM, doesn't
> it pretty much become (3). (Though, I'm not a SpiderMonkey hacker)

This is a key point. V8 is more intertwingled than JSC. Pulling on the
yarn may pull big tangles out of the big ball.

> But anyhow, I agree something needs to be done. regexp slowness shows
> up in too many profiles these days.

What about contacting Apple principals who maintain JSC and checking
whether they want to fix YARR? Just because their bug has been idle for
years does not mean we should assume they aren't thinking of, planning,
or even about to start working on JITting all the regexps.

/be

Till Schneidereit

unread,
Jan 2, 2014, 11:37:14 AM1/2/14
to Brendan Eich, JS Internals list
On Thu, Jan 2, 2014 at 5:14 PM, Brendan Eich <bre...@mozilla.com> wrote:

>
> What about contacting Apple principals who maintain JSC and checking
> whether they want to fix YARR? Just because their bug has been idle for
> years does not mean we should assume they aren't thinking of, planning, or
> even about to start working on JITting all the regexps.
>

There hasn't been any meaningful amount of activity in Yarr for a long time
now. We reported various issues, sometimes with pointers to easy fixes we
applied, without ever getting even an aknowledgement.

For me, another key point is that we have various high-frequency crashes
and security issues in Yarr that we don't know how to fix. Even if Apple
were to fix the most important performance issues, this situation wouldn't
change meaningfully.


On Thu, Jan 2, 2014 at 4:31 PM, Nicolas B. Pierron <
nicolas....@mozilla.com> wrote:
> In addition, one might even suggest that we self-host the regex compiler.
If this is not a performance issue, I think this would be a good idea as
the first persons who are willing to get features in the JavaScript Regex
engine are JavaScript users.

It's probably no surprise that I like this option. One thing we could do to
evaluate its potential is to replace the regexps in some benchmarks with
straight-forward, not-too-clever JS implementations and see what the
resulting performance and memory characteristics are like.

Steve Fink

unread,
Jan 2, 2014, 11:39:46 AM1/2/14
to Nicolas B. Pierron, dev-tech-js-en...@lists.mozilla.org
On Thu 02 Jan 2014 07:31:56 AM PST, Nicolas B. Pierron wrote:
> I think we can do better by implementing a regexp compiler which
> *targets* JavaScript instead of assembly. By generating asm.js-like
> code for regexp, we can have good generated code and we can easily
> apply all the rules we already have for calling / *inlining* them in
> Ion code.

I had a similar thought, but a little different -- could we take a good
C-based regex engine and compile it via emscripten to produce a
self-hosted implementation? I'm thinking of something like pcre.

There would probably be some perf rough edges, especially since this
would be walking over strings not simple byte arrays. I don't know how
hard it would be to smooth those out. (Nor have I looked at the pcre
code.) String encoding might be ugly.

Niko Matsakis

unread,
Jan 2, 2014, 11:56:31 AM1/2/14
to Steve Fink, Nicolas B. Pierron, dev-tech-js-en...@lists.mozilla.org
I have been thinking along similar lines (either self-hosting, or
targeting JS), but with a different incentive: I would like to find a
way to make regular expression matching work during parallel
execution. Self-hosting is the easiest way to do that.



Niko

On Thu, Jan 02, 2014 at 08:39:46AM -0800, Steve Fink wrote:
> On Thu 02 Jan 2014 07:31:56 AM PST, Nicolas B. Pierron wrote:
> > I think we can do better by implementing a regexp compiler which
> > *targets* JavaScript instead of assembly. By generating asm.js-like
> > code for regexp, we can have good generated code and we can easily
> > apply all the rules we already have for calling / *inlining* them in
> > Ion code.
>
> I had a similar thought, but a little different -- could we take a good
> C-based regex engine and compile it via emscripten to produce a
> self-hosted implementation? I'm thinking of something like pcre.
>
> There would probably be some perf rough edges, especially since this
> would be walking over strings not simple byte arrays. I don't know how
> hard it would be to smooth those out. (Nor have I looked at the pcre
> code.) String encoding might be ugly.

Brendan Eich

unread,
Jan 2, 2014, 12:05:57 PM1/2/14
to Till Schneidereit, JS Internals list
Till Schneidereit wrote:
> On Thu, Jan 2, 2014 at 5:14 PM, Brendan Eich <bre...@mozilla.com
> <mailto:bre...@mozilla.com>> wrote:
>
>
> What about contacting Apple principals who maintain JSC and
> checking whether they want to fix YARR? Just because their bug has
> been idle for years does not mean we should assume they aren't
> thinking of, planning, or even about to start working on JITting
> all the regexps.
>
>
> There hasn't been any meaningful amount of activity in Yarr for a long
> time now. We reported various issues, sometimes with pointers to easy
> fixes we applied, without ever getting even an aknowledgement.

Yes, not great, but see below.

> For me, another key point is that we have various high-frequency
> crashes and security issues in Yarr that we don't know how to fix.

Do we report these upstream?

> Even if Apple were to fix the most important performance issues, this
> situation wouldn't change meaningfully.

I meant by "contacting Apple" something other than reporting some bugs,
namely mailing the principals and raising the issues of common interest.
One time attempt. I can help.

/be

Brendan Eich

unread,
Jan 2, 2014, 12:07:55 PM1/2/14
to Steve Fink, Nicolas B. Pierron, dev-tech-js-en...@lists.mozilla.org
Steve Fink wrote:
> I'm thinking of something like pcre.

PCRE is way off from ECMA-262, and not a perf winner against V8 AFAIK.
Anyone know more?

We are likely better off not reinventing wheels, as Andreas noted.

Self-hosting for ||ism is a good longer term project (breaking the sound
barrier, I called it), but research. To fix our immediate problems
(Peacekeeper score, bugs) we need something less radical, sooner. Or
some surprisingly fast work on the self-hosted path, but I doubt that
has Emscriptened PCRe in it.

/be

Till Schneidereit

unread,
Jan 2, 2014, 12:30:26 PM1/2/14
to Brendan Eich, JS Internals list
On Thu, Jan 2, 2014 at 6:05 PM, Brendan Eich <bre...@mozilla.com> wrote:

> For me, another key point is that we have various high-frequency crashes
>> and security issues in Yarr that we don't know how to fix.
>>
>
> Do we report these upstream?


We do, if we actually know that they're upstream bugs. That is not always
easy to determine, because in many cases the bug might also be in the
adapter layer we use to make Yarr work within SpiderMonkey. Or below that
in other parts of SpiderMonkey itself, even.

>
>
> Even if Apple were to fix the most important performance issues, this
>> situation wouldn't change meaningfully.
>>
>
> I meant by "contacting Apple" something other than reporting some bugs,
> namely mailing the principals and raising the issues of common interest.
> One time attempt. I can help.
>

Understood. The point I was trying to make is that Yarr seems to be largely
unowned upstream, and it's pretty much completely unowned (and hard to own)
on our end.

However, I agree that we should contact JSC people to find out if there are
short- or medium-term plans for work on Yarr.

Nicolas B. Pierron

unread,
Jan 2, 2014, 12:47:58 PM1/2/14
to
On 01/02/2014 07:31 AM, Nicolas B. Pierron wrote:
> In addition, one might even suggest that we self-host the regex compiler. If
> this is not a performance issue, I think this would be a good idea as the
> first persons who are willing to get features in the JavaScript Regex engine
> are JavaScript users.

I should have wrote that with a past tense …

https://github.com/jviereck/regexp.js

Just by looking at the slides he made for the JSConf.eu, I can see an image
which seem to correspond to IonGraph output (a way to graphically spew the
SSA representation of the generated assembly)

I have not yet tested the performances on octane, but knowing that the state
of the JIT is not complete, I guess this is unlikely to compete out of the box.

--
Nicolas B. Pierron

Brendan Eich

unread,
Jan 2, 2014, 1:50:01 PM1/2/14
to Nicolas B. Pierron, julian....@gmail.com, dev-tech-js-en...@lists.mozilla.org
I should have remembered this (I missed Julian's talk at JSConf.eu,
alas). CC'ing.

/be

> Nicolas B. Pierron <mailto:nicolas....@mozilla.com>
> January 2, 2014 12:47 PM
>
>
> I should have wrote that with a past tense …
>
> https://github.com/jviereck/regexp.js
>
> Just by looking at the slides he made for the JSConf.eu, I can see an
> image which seem to correspond to IonGraph output (a way to
> graphically spew the SSA representation of the generated assembly)
>
> I have not yet tested the performances on octane, but knowing that the
> state of the JIT is not complete, I guess this is unlikely to compete
> out of the box.
>
> Nicolas B. Pierron <mailto:nicolas....@mozilla.com>
> January 2, 2014 10:31 AM
>
>
> I do not know much about YARR, but I saw a few sec-critical patches
> related to it before. At the moment, on octane, we still have to go
> from JITed code -> C++ -> YARR's code.
>
> I think we can do better by implementing a regexp compiler which
> *targets* JavaScript instead of assembly. By generating asm.js-like
> code for regexp, we can have good generated code and we can easily
> apply all the rules we already have for calling / *inlining* them in
> Ion code.
>
> This approach has obvious disadvantages:
> - we serialize the code in JavaScript.
>
> On the other hand:
> - we reduce the surface of attacks.
> - we can test it on other JS engines.
> - we do not have to add extra logic for inlining regex.
>

Luke Wagner

unread,
Jan 2, 2014, 3:28:36 PM1/2/14
to Jan de Mooij, JS Internals list
I don't think a pure (2) approach is our cheapest option. Even with Yarr, it took Chris a whole bunch of work to import and it also took Dave/Dave a long time each time they pulled a new version. It sounds like irregexp would be much worse. Furthermore, having a whole hunk of code you can't just change means everybody goes to lengths to avoid touching it and it becomes a big sad sinkhole.

Perhaps we could use a modified (2) approach: fork irregexp. In particular, we'd:
- significantly refactor the code to use SM rooting, assembler, Vector, LifoAlloc, etc APIs
- declare open season on stylistic refactorings to make irregexp match SM

The obvious concern is that we'd miss updates/fixes in V8. However, looking at the V8 svn repo, the irregexp files change infrequently (almost nothing in the last 6 months) so we could just as well, every month or so, just look at all the changes to the 9 *regexp* files and manually apply the diffs.

One thing, though, is we'd really need an owner for this code who took the time to fully understand irregexp so they could fix what may come as it came and review patches.

Cheers,
Luke

Andreas Gal

unread,
Jan 2, 2014, 3:35:45 PM1/2/14
to Luke Wagner, Jan de Mooij, JS Internals list

Sounds like a solid plan. It combines the best of both worlds (we don't have to reinvent the wheel but we minimize how much code we import). The fact that the code is pretty stable definitely supports this approach.

Andreas

Brendan Eich

unread,
Jan 2, 2014, 4:11:10 PM1/2/14
to Andreas Gal, Luke Wagner, Jan de Mooij, JS Internals list
Need to talk to Apple people first, one Ping only.

/be

Chris Peterson

unread,
Jan 3, 2014, 10:52:16 PM1/3/14
to
On 1/2/14, 1:11 PM, Brendan Eich wrote:
> Need to talk to Apple people first, one Ping only.

Does anyone here have a JSC/Yarr contact at Apple?


On 1/2/14, 12:28 PM, Luke Wagner wrote:
> One thing, though, is we'd really need an owner for this code who
> took the time to fully understand irregexp so they could fix what may
> come as it came and review patches.

If we we don't get an affirmative response from Apple, who would be a
good owner for porting irregexp to SM?


chris

Brendan Eich

unread,
Jan 3, 2014, 11:52:36 PM1/3/14
to Chris Peterson, dev-tech-js-en...@lists.mozilla.org
On Jan 3, 2014, at 10:52 PM, Chris Peterson <cpet...@mozilla.com> wrote:
>
> Does anyone here have a JSC/Yarr contact at Apple?

Already in contact, with jandem in tandem -- will follow up when more to report.

/be

julian....@googlemail.com

unread,
Jan 5, 2014, 5:31:41 AM1/5/14
to
On Thursday, January 2, 2014 6:47:58 PM UTC+1, Nicolas Pierron wrote:
> On 01/02/2014 07:31 AM, Nicolas B. Pierron wrote:
>
> I should have wrote that with a past tense …
>
> https://github.com/jviereck/regexp.js

So far I hadn't done any performance numbers for RegExp.JS. I looked into this and thanks to the help of Till I got the Octane benchmark running in the JS shell [1].

Before converting the entire Octane RegExp benchmark to run using RegExp.JS I thought I just try the first RegExp tested in the benchmark. This means the in terms of code changes:

diff --git a/regexp.js b/regexp.js
- var re0 = /^ba/;
+ var re0 = new RegExpJS(/^ba/);

Just changing this one RegExp caused the score from ~1480 on my machine to drop to 77 (!!!) using the RegExp.JS library (& my.mood = :( ).

Okay, so maybe RegExp.JS is doing something completely wrong, which is why I tried another dump approach and defined:

function RegExpJS(reg) { }

RegExpJS.prototype.exec = function(str) {
if (str.startsWith('ba')) {
return ['ba'];
} else {
return null;
}
}

This RegExpJS object ONLY works HARDCODED with the first regexp of the octane benchmark (/^ba/) - cheating, I know, but let's see where this gets us in terms of performance. Running the regexp.js benchmark with this RegExpJS definition and the modification |var re0 = new RegExpJS(/^ba/);| resulted in a score of ~1340. Better than 77, but still a huge drop compared to 1480 by only changing one RegExp in the benchmark!

(If you wonder if replacing the |if(str.startsWith('ba'))| call with |if (str[0] == 'b' && str[1] == 'a') {| --- no, that doesn't make any difference in terms of performance :/).

---

Without knowing anything about the Spidermonkey JS internals, this very small benchmarking raises the following questions to me:

1) Is the YARR implementation so much faster than anything written in plane JS (even if the JS is highly optimized for the RegExp and matches the string in the best optimial way)?
2) Is there a performance bug in Spidermonkey, that makes even the plain RegExpJS running only /^ba/ such slow?



Cheers,

- Julian




[1] Using the js shell provided at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ dated on the 04-Jan-2014 11:50.



hv1989

unread,
Jan 5, 2014, 6:13:02 AM1/5/14
to julian....@googlemail.com, dev-tech-js-en...@lists.mozilla.org
Hi Julian,

I'm not sure what you have tried. But I tried your hardcoded version.
(i.e. defining RegExpJS ourself, with the ^ba hack)

- octane1.0-regexp:
before: 4510
after: 4658

- octane2.0-regexp:
before: 2585
after: 2390

So in octane1.0 that is indeed an improvement. For octane2.0 not and
that has a reason. In octane2.0 all calls to "exec()" have a wrapper:
"Exec()" that does some extra testing to make sure the result is
correct. Using TypeInformation we can find out this is only called
with "RegExp" as first parameter. So we can optimize that. Now with
"new RegExpJS(/^ba/);" we see 2 signatures in "Exec". So it is less
specialized (not much, just an extra if to distinguish the paths at
the "exec" call). I'm sure if all regexps would be transformed to
"RegExpJS" we would get that back. It would only see 1 signature
again.

Now about RegExp.JS bringing such a big loss. That is possible. Yarr
isn't bad and in octane-regexp we only are stuck in the interpreter
for 3% and even in that case the interpreter isn't that slow. We
wouldn't win much on octane-regexp if we could JIT everything (what
the problem is for the other benchmarks like jQuery and Peacekeeper).
It will bring maximum a 4% gain for octane-regexp. Though I would
suggest to try to run the numbers again, since the numbers differ so
much from mine.

Best Hannes

Julian Viereck

unread,
Jan 5, 2014, 7:08:38 AM1/5/14
to hv1989, dev-tech-js-en...@lists.mozilla.org
Hi Hannes,

thanks a lot for your reply :)

> I'm not sure what you have tried. But I tried your hardcoded version.

I tried to make my testing more transparent and uploaded my code on a
GitHub repo:

https://github.com/jviereck/regexp.js-octane

> Though I would suggest to try to run the numbers again, since the
numbers differ so much from mine.

Looking at the numbers, I think the numbers are fine if we assume you
have a more powerful PC that results in a score roughly 2x of my value
by default. Your score values before and after differ by ~200 points,
while my do by ~100 - so there is the 2x speed difference.

> we see 2 signatures in "Exec". So it is less specialized (not much,
just an extra if to distinguish the paths at the "exec" call). I'm sure
if all regexps would be transformed to "RegExpJS" we would get that
back. It would only see 1 signature again.

Thanks a lot for this hint! Based on this input, I have created a new
"Exec2" function, which is an exact copy of the "Exec" function, but the
"Exec2" function is only used for executing the re0 regular expression
[1]. Using the hard coded RegExpJS function for re0 [2] resulted in
these numbers:

before: 1582.7
(https://github.com/jviereck/regexp.js-octane/tree/e925606d0850b5c94d1622f7cfdcd2ab2c08e767)
after: 1632.7
(https://github.com/jviereck/regexp.js-octane/tree/0630eec8e656f3df5effc27114ba80ffe970d53e)

These numbers are the average of 10 runs. There seems to be a speedup
using the hardcoded JS version.

These results look more promising. However, they should be treated with
care as getting /^ba/ to work is quite simple and the implementation
makes very good to JS functions (e.g. String.prototype.startsWith),
while a more complicated example including backtracking might yield
different results.

Do you think it is worth to implement a hard coded version of the second
Octane tested regular expression:

var re1 =
/(((\w+):\/\/)([^\/:]*)(:(\d+))?)?([^#?]*)(\?([^#]*))?(#(.*))?/;

to see how good the performance can get?

Best,

- Julian


[1]:
https://github.com/jviereck/regexp.js-octane/commit/0d6e01d36a7d5dc24c385e3437e6b740dbd9da78#diff-0

[2]:
https://github.com/jviereck/regexp.js-octane/commit/0630eec8e656f3df5effc27114ba80ffe970d53e
>> On Thursday, January 2, 2014 6:47:58 PM UTC+1, Nicolas Pierron wrote:
>>> On 01/02/2014 07:31 AM, Nicolas B. Pierron wrote:
>>>
>>> I should have wrote that with a past tense �
>> _______________________________________________
>> dev-tech-js-engine-internals mailing list
>> dev-tech-js-en...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

--

- Julian

hv1989

unread,
Jan 5, 2014, 7:28:17 AM1/5/14
to Julian Viereck, dev-tech-js-engine-internals
Hi Julian,

don't forget that it is a bit unfair to compare like that. (For 2 reasons).
1) Exec needs to set re.LastIndex accordingly. I.e. set it to 0 at the
start and correctly upon match
2) The result array has two extra properties: index and input that you
don't set.

There are possible more extra things that need to happen that is not
caused by Yarr at all...

Now one of the important improvements your version got, is that it
doesn't need to flatten the inputstring. I think.
While that happens by default for Yarr. Now that's one of the benefits
of using JS. It has more information.
Another one is that we don't need to jump out of JS to C++ to Yarr Jit
code again.

Best Hannes

Till Schneidereit

unread,
Jan 5, 2014, 2:03:24 PM1/5/14
to hv1989, Julian Viereck, dev-tech-js-engine-internals
I just did another test: I stripped out all split() and replace() calls and
all regexps except for re0 from the test, and changed the exec to set
index and input on the result. Yes, this is *really* microbenchmark-y, I
know.

With these changes applied, our score increases to ~38.000. With the exec
function commented out, it goes down to ~22.000. v8 for some reason doesn't
like the exec function, so I can't test that, but without it, it reaches
~38.800.

So even in the ideal case, a simple, unsafe and not-spec-compliant
hand-rolled JS version for one of the simplest-imaginable regexps doesn't
beat v8. For all I know, this might look very different in more complex
cases, of course.

Nicolas B. Pierron

unread,
Jan 6, 2014, 5:09:25 AM1/6/14
to julian....@googlemail.com
Hi,

On 01/05/2014 02:31 AM, julian....@googlemail.com wrote:
> Before converting the entire Octane RegExp benchmark to run using
> RegExp.JS I thought I just try the first RegExp tested in the benchmark.
> This means the in terms of code changes:
>
> diff --git a/regexp.js b/regexp.js
> - var re0 = /^ba/;
> + var re0 = new RegExpJS(/^ba/);

Any reasons why you are using the deconstructing RegExpJS function, instead
of giving a string as argument?

var re0 = new RegExpJS("^ba");

--
Nicolas B. Pierron

Chris Peterson

unread,
Jan 10, 2014, 8:17:02 PM1/10/14
to
Any news from Apple about Yarr?


chris

Brendan Eich

unread,
Jan 10, 2014, 8:43:25 PM1/10/14
to Chris Peterson, Jan de Mooij, dev-tech-js-en...@lists.mozilla.org
Yes, but I will let Jan summarize.

/be

> Chris Peterson <mailto:cpet...@mozilla.com>
> January 10, 2014 5:17 PM
>
>
> Any news from Apple about Yarr?
>
>
> chris
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
> Chris Peterson <mailto:cpet...@mozilla.com>
> January 3, 2014 7:52 PM
>
>
> Does anyone here have a JSC/Yarr contact at Apple?
>
>
> On 1/2/14, 12:28 PM, Luke Wagner wrote:
> > One thing, though, is we'd really need an owner for this code who
> > took the time to fully understand irregexp so they could fix what may
> > come as it came and review patches.
>
> If we we don't get an affirmative response from Apple, who would be a
> good owner for porting irregexp to SM?
>
>
> chris
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
> Andreas Gal <mailto:andre...@gmail.com>
> January 2, 2014 12:35 PM
> Sounds like a solid plan. It combines the best of both worlds (we
> don't have to reinvent the wheel but we minimize how much code we
> import). The fact that the code is pretty stable definitely supports
> this approach.
>
> Andreas
>
>
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
> Luke Wagner <mailto:lu...@mozilla.com>
> January 2, 2014 12:28 PM
> I don't think a pure (2) approach is our cheapest option. Even with
> Yarr, it took Chris a whole bunch of work to import and it also took
> Dave/Dave a long time each time they pulled a new version. It sounds
> like irregexp would be much worse. Furthermore, having a whole hunk of
> code you can't just change means everybody goes to lengths to avoid
> touching it and it becomes a big sad sinkhole.
>
> Perhaps we could use a modified (2) approach: fork irregexp. In
> particular, we'd:
> - significantly refactor the code to use SM rooting, assembler,
> Vector, LifoAlloc, etc APIs
> - declare open season on stylistic refactorings to make irregexp match SM
>
> The obvious concern is that we'd miss updates/fixes in V8. However,
> looking at the V8 svn repo, the irregexp files change infrequently
> (almost nothing in the last 6 months) so we could just as well, every
> month or so, just look at all the changes to the 9 *regexp* files and
> manually apply the diffs.
>
> One thing, though, is we'd really need an owner for this code who took
> the time to fully understand irregexp so they could fix what may come
> as it came and review patches.
>
> Cheers,
> Luke
>
> ----- Original Message -----
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
> Jan de Mooij <mailto:jande...@gmail.com>
> January 2, 2014 6:46 AM

Jan de Mooij

unread,
Jan 13, 2014, 7:02:58 AM1/13/14
to Brendan Eich, Chris Peterson, JS Internals list
The JSC guys we talked to are interested in collaborating. They said they'd
be happy to coordinate plans, import changes from us, move YARR into a
separate project/library or meet with us to discuss architectural changes.

They agree the nested-parantheses problem etc should be fixed, but they
have no plans to do this in the near future, because it's not a problem for
the benchmarks they track (sunspider regexp-dna and V8-regexp).

Jan


On Sat, Jan 11, 2014 at 2:43 AM, Brendan Eich <bre...@mozilla.com> wrote:

> Yes, but I will let Jan summarize.
>
> /be
>
> Chris Peterson <mailto:cpet...@mozilla.com>
>> January 10, 2014 5:17 PM
>>
>>
>>
>> Any news from Apple about Yarr?
>>
>>
>> chris
>> _______________________________________________
>> dev-tech-js-engine-internals mailing list
>> dev-tech-js-en...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>> Chris Peterson <mailto:cpet...@mozilla.com>
>> January 3, 2014 7:52 PM
>>
>>
>>
>> Does anyone here have a JSC/Yarr contact at Apple?
>>
>>
>> On 1/2/14, 12:28 PM, Luke Wagner wrote:
>> > One thing, though, is we'd really need an owner for this code who
>> > took the time to fully understand irregexp so they could fix what may
>> > come as it came and review patches.
>>
>> If we we don't get an affirmative response from Apple, who would be a
>> good owner for porting irregexp to SM?
>>
>>
>> chris
>> _______________________________________________
>> dev-tech-js-engine-internals mailing list
>> dev-tech-js-en...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>> Andreas Gal <mailto:andre...@gmail.com>
>> January 2, 2014 12:35 PM
>>
>> Sounds like a solid plan. It combines the best of both worlds (we don't
>> have to reinvent the wheel but we minimize how much code we import). The
>> fact that the code is pretty stable definitely supports this approach.
>>
>> Andreas
>>
>>
>> _______________________________________________
>> dev-tech-js-engine-internals mailing list
>> dev-tech-js-en...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>> Luke Wagner <mailto:lu...@mozilla.com>
>> January 2, 2014 12:28 PM
>>
>> I don't think a pure (2) approach is our cheapest option. Even with Yarr,
>> it took Chris a whole bunch of work to import and it also took Dave/Dave a
>> long time each time they pulled a new version. It sounds like irregexp
>> would be much worse. Furthermore, having a whole hunk of code you can't
>> just change means everybody goes to lengths to avoid touching it and it
>> becomes a big sad sinkhole.
>>
>> Perhaps we could use a modified (2) approach: fork irregexp. In
>> particular, we'd:
>> - significantly refactor the code to use SM rooting, assembler, Vector,
>> LifoAlloc, etc APIs
>> - declare open season on stylistic refactorings to make irregexp match SM
>>
>> The obvious concern is that we'd miss updates/fixes in V8. However,
>> looking at the V8 svn repo, the irregexp files change infrequently (almost
>> nothing in the last 6 months) so we could just as well, every month or so,
>> just look at all the changes to the 9 *regexp* files and manually apply the
>> diffs.
>>
>> One thing, though, is we'd really need an owner for this code who took
>> the time to fully understand irregexp so they could fix what may come as it
>> came and review patches.
>>
>> Cheers,
>> Luke
>>
>> ----- Original Message -----
>> _______________________________________________
>> dev-tech-js-engine-internals mailing list
>> dev-tech-js-en...@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>> Jan de Mooij <mailto:jande...@gmail.com>
>> January 2, 2014 6:46 AM
>>

Chris Peterson

unread,
Jan 13, 2014, 2:33:24 PM1/13/14
to
On 1/13/14, 4:02 AM, Jan de Mooij wrote:
> The JSC guys we talked to are interested in collaborating. They said they'd
> be happy to coordinate plans, import changes from us, move YARR into a
> separate project/library or meet with us to discuss architectural changes.

� Can you please send their contact info to me?
� Do they actually have developers working on YARR?
� Do we have local changes we would want to push upstream to them?


> They agree the nested-parantheses problem etc should be fixed, but they
> have no plans to do this in the near future, because it's not a problem for
> the benchmarks they track (sunspider regexp-dna and V8-regexp).

� Which benchmarks that we care about are impacted by YARR pref?


chris

Chris Peterson

unread,
Feb 14, 2014, 9:36:40 PM2/14/14
to
On 1/13/14, 4:02 AM, Jan de Mooij wrote:
> The JSC guys we talked to are interested in collaborating. They said they'd
> be happy to coordinate plans, import changes from us, move YARR into a
> separate project/library or meet with us to discuss architectural changes.

Jan, any more news from the JSC developers about collaborating on Yarr?

In your original mail starting this thread, you said that fixing Yarr
ourselves or upstream would be hard without help from someone very
familiar with Yarr. Are the JSC developers actively maintaining Yarr
now? Do you feel they will commit new resources to fixing Yarr
performance and crashes?

I don't want us to delay alternatives, like V8's irregexp or self-hosted
RegExpJS, if we are not going to get timely help with Yarr.


chris

Jan de Mooij

unread,
Feb 17, 2014, 8:25:33 AM2/17/14
to Chris Peterson, JS Internals list
On Sat, Feb 15, 2014 at 3:36 AM, Chris Peterson <cpet...@mozilla.com> wrote:
> Jan, any more news from the JSC developers about collaborating on Yarr?
>
> In your original mail starting this thread, you said that fixing Yarr
> ourselves or upstream would be hard without help from someone very familiar
> with Yarr. Are the JSC developers actively maintaining Yarr now? Do you feel
> they will commit new resources to fixing Yarr performance and crashes?

AFAIK, this is not a priority for them. When we talked to them they
were willing to discuss changes with us etc, but they had no plans to
improve Yarr themselves. Sorry, I forgot to reply to your previous
mail, I'll send you their contact info.

Jan

Chris Peterson

unread,
Feb 21, 2014, 9:47:39 PM2/21/14
to
hi Jan, sorry for my late reply!

Reading the other YARR emails you forwarded, the JSC developers sound
upbeat but won't commit any engineering help themselves.

In your original email about replacing YARR, you pointed out that no one
at Mozilla is (currently) familiar enough with YARR to implement major
features like nested parenthesized groups. If the JSC developers are not
going to help, what do you think we should do?

Do you want to resurrect your YARR thread on js-engine.internals
newsgroup? The discussion ended with Luke suggesting:

- fork irregexp
- significantly refactor the code to use SM rooting, assembler,
Vector, LifoAlloc, etc APIs
- declare open season on stylistic refactorings to make irregexp match SM

Andreas and Brendan seemed to support the plan, but Brendan suggested we
contact Apple first ("one ping only"). Which brings us to the present
day. :)


chris

Nicholas Nethercote

unread,
Feb 22, 2014, 12:48:13 AM2/22/14
to Chris Peterson, JS Internals list
On Sat, Feb 22, 2014 at 1:47 PM, Chris Peterson <cpet...@mozilla.com> wrote:
>
> The discussion ended with Luke suggesting:
>
> - fork irregexp
>
> - significantly refactor the code to use SM rooting, assembler, Vector,
> LifoAlloc, etc APIs
> - declare open season on stylistic refactorings to make irregexp match SM
>
> Andreas and Brendan seemed to support the plan, but Brendan suggested we
> contact Apple first ("one ping only"). Which brings us to the present day.
> :)

That sounds good to me.

Nick
0 new messages