regexp enhancements for ECMAScript

6 views
Skip to first unread message

Andy Chu

unread,
Jan 24, 2010, 4:35:43 PM1/24/10
to narw...@googlegroups.com
I sent this to es-discuss but I think it hasn't gone through
moderation. Since I mentioned this first on this mailing list related
to my work on Narcissus and statically analyzing require()s, I'd like
to solicit comments here:

http://andychu.net/ecmascript/RegExp-Enhancements.html

It's pretty short and simple.

(Side note: does anyone know that es-discuss is moderated for new
threads but not replies? That's what it seems like to me, as I sent a
new message twice without success, but a reply shows up instantly.)

thanks,
Andy

Kris Kowal

unread,
Jan 24, 2010, 4:44:55 PM1/24/10
to narw...@googlegroups.com, Steve Levithan
[cc Steve Levithan]

On Sun, Jan 24, 2010 at 1:35 PM, Andy Chu <an...@chubot.org> wrote:
> I sent this to es-discuss but I think it hasn't gone through
> moderation.  Since I mentioned this first on this mailing list related
> to my work on Narcissus and statically analyzing require()s, I'd like
> to solicit comments here:
>
> http://andychu.net/ecmascript/RegExp-Enhancements.html
>
> It's pretty short and simple.

Steve, with regard to the linked blog post, do you have any ideas?

Kris Kowal

Zachary Carter

unread,
Jan 24, 2010, 6:23:23 PM1/24/10
to narw...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Narwhal and Jack" group.
To post to this group, send email to narw...@googlegroups.com.
To unsubscribe from this group, send email to narwhaljs+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/narwhaljs?hl=en.


This proposal sounds like an API around "sticky" mode which was implemented in JS 1.8[1] and which is currently part of the RegExp extensions proposal for Harmony[2]. I'd love for other engines to implement mode this as well. An abstraction like the API Andy proposes may allow a library to degrade to string slicing if the engine doesn't support sticky mode. This could be done at the library level.

[1] https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Objects/RegExp
[2] http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps&s=regexp#y_flag


--
Zach Carter
http://zach.carter.name

Ash Berlin

unread,
Jan 24, 2010, 6:37:58 PM1/24/10
to narw...@googlegroups.com

On 24 Jan 2010, at 23:23, Zachary Carter wrote:


This proposal sounds like an API around "sticky" mode which was implemented in JS 1.8[1] and which is currently part of the RegExp extensions proposal for Harmony[2]. I'd love for other engines to implement mode this as well. An abstraction like the API Andy proposes may allow a library to degrade to string slicing if the engine doesn't support sticky mode. This could be done at the library level.

[1] https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Objects/RegExp
[2] http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps&s=regexp#y_flag

I think [2] you link to above was for the old ES4 proposal. New ones for ES-next live under strawman or harmony namespaces on the wiki.

Steven Levithan

unread,
Jan 24, 2010, 5:54:56 PM1/24/10
to Narwhal and Jack
Unless I'm misunderstanding something, a solution for this problem was
provided in JavaScript 1.8 (Firefox 3) based on an ECMAScript 4
proposal that unfortunately did not make it into ECMAScript 5: The /y
(sticky) flag (in combination with manually setting regexp.lastIndex).

See:
- http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps#y_flag
- https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp#Parameters

As noted in Andy's article, Python solves this problem with the
re.match/re.search distinction and the pos argument. Other languages
and libraries tend to provide similar capabilities in various ways--
e.g., Perl, .NET, and some other regex flavors provide a magic \G
token that works similarly to JS1.8's /y flag. Personally, I think
that although the /y flag reinvents the wheel compared to how other
regex libraries have provided similar functionality, the details of /y
fit elegantly with JavaScript and its existing regexp.lastIndex
property. I'd like to see it added to future ECMAScript specs.

Unlike the API additions proposed in the article, /y also works with
String methods that accept a RegExp. That is, it provides a useful
twist on what can be done with string.replace, string.split, etc.,
although of course using regexp.lastIndex as the search start position
only works with regexp.exec and regexp.test (and only when the /g flag
is used), with or without /y.

--Steven Levithan
http://blog.stevenlevithan.com


On Jan 25, 12:44 am, Kris Kowal <cowbertvon...@gmail.com> wrote:
> [cc Steve Levithan]
>

Zachary Carter

unread,
Jan 24, 2010, 6:51:32 PM1/24/10
to narw...@googlegroups.com
Right you are, those were the old proposals. 

Andy Chu

unread,
Jan 24, 2010, 8:06:41 PM1/24/10
to narw...@googlegroups.com
> This proposal sounds like an API around "sticky" mode which was implemented
> in JS 1.8[1] and which is currently part of the RegExp extensions proposal
> for Harmony[2]. I'd love for other engines to implement mode this as well.
> An abstraction like the API Andy proposes may allow a library to degrade to
> string slicing if the engine doesn't support sticky mode. This could be done
> at the library level.
>
> [1]
> https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Objects/RegExp
> [2]
> http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps&s=regexp#y_flag

Uh wow, yes this is the exact same problem:

"This flag will make it easier to write simple and efficient lexical
analyzers for embedded languages using ECMAScript regular expressions.
The current language has quadratic complexity because each match may
potentially search to the end of the input for a match. (That can be
worked around in a couple of ways but it’s cumbersome.) "

I'll update my article to reflect this. Apparently it's implemented
in Firefox 3.

My first reaction is that /y will definitely work. But, I think it
continues a conflation of the compiled regexp and an *in progress
match* in the JS RegExp API.

That is, .lastIndex does not belong on the RegExp object itself, and
sticky mode doesn't need to be a modifier on the RegExp either.
They're both properties of the match taking place. So Python rightly
keeps 'pos' as an argument passed into the .search() and .match().

Say you have an HTTP server and you're matching incoming requests
against a regex. With Python's API, you can use the same RegExp
object, and keep a separate match state (pos) for each request. With
the JS API, you need to create a compiled RegExp per request, so that
.lastIndex doesn't get stomped on. And conceivably you may want /y in
one request but not another.

But perhaps the ship has already sailed on this. It's a bit annoying,
but a detail.

Thanks for the links!

Andy

Andy Chu

unread,
Jan 28, 2010, 2:30:52 AM1/28/10
to narw...@googlegroups.com
> My first reaction is that /y will definitely work.  But, I think it
> continues a conflation of the compiled regexp and an *in progress
> match* in the JS RegExp API.

FYI here is the updated proposal that addresses that problem:

http://andychu.net/ecmascript/RegExp-Enhancements-2.html

It went through to es-discuss finally (spam problem) so that's
probably the best place to discuss it.

https://mail.mozilla.org/pipermail/es-discuss/2010-January/thread.html

thanks,
Andy

Reply all
Reply to author
Forward
0 new messages