http://andychu.net/ecmascript/RegExp-Enhancements.html
It's pretty short and simple.
(Side note: does anyone know that es-discuss is moderated for new
threads but not replies? That's what it seems like to me, as I sent a
new message twice without success, but a reply shows up instantly.)
thanks,
Andy
On Sun, Jan 24, 2010 at 1:35 PM, Andy Chu <an...@chubot.org> wrote:
> I sent this to es-discuss but I think it hasn't gone through
> moderation. Since I mentioned this first on this mailing list related
> to my work on Narcissus and statically analyzing require()s, I'd like
> to solicit comments here:
>
> http://andychu.net/ecmascript/RegExp-Enhancements.html
>
> It's pretty short and simple.
Steve, with regard to the linked blog post, do you have any ideas?
Kris Kowal
--
You received this message because you are subscribed to the Google Groups "Narwhal and Jack" group.
To post to this group, send email to narw...@googlegroups.com.
To unsubscribe from this group, send email to narwhaljs+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/narwhaljs?hl=en.
This proposal sounds like an API around "sticky" mode which was implemented in JS 1.8[1] and which is currently part of the RegExp extensions proposal for Harmony[2]. I'd love for other engines to implement mode this as well. An abstraction like the API Andy proposes may allow a library to degrade to string slicing if the engine doesn't support sticky mode. This could be done at the library level.
[1] https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Objects/RegExp
[2] http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps&s=regexp#y_flag
See:
- http://wiki.ecmascript.org/doku.php?id=proposals:extend_regexps#y_flag
- https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp#Parameters
As noted in Andy's article, Python solves this problem with the
re.match/re.search distinction and the pos argument. Other languages
and libraries tend to provide similar capabilities in various ways--
e.g., Perl, .NET, and some other regex flavors provide a magic \G
token that works similarly to JS1.8's /y flag. Personally, I think
that although the /y flag reinvents the wheel compared to how other
regex libraries have provided similar functionality, the details of /y
fit elegantly with JavaScript and its existing regexp.lastIndex
property. I'd like to see it added to future ECMAScript specs.
Unlike the API additions proposed in the article, /y also works with
String methods that accept a RegExp. That is, it provides a useful
twist on what can be done with string.replace, string.split, etc.,
although of course using regexp.lastIndex as the search start position
only works with regexp.exec and regexp.test (and only when the /g flag
is used), with or without /y.
--Steven Levithan
http://blog.stevenlevithan.com
On Jan 25, 12:44 am, Kris Kowal <cowbertvon...@gmail.com> wrote:
> [cc Steve Levithan]
>
Uh wow, yes this is the exact same problem:
"This flag will make it easier to write simple and efficient lexical
analyzers for embedded languages using ECMAScript regular expressions.
The current language has quadratic complexity because each match may
potentially search to the end of the input for a match. (That can be
worked around in a couple of ways but it’s cumbersome.) "
I'll update my article to reflect this. Apparently it's implemented
in Firefox 3.
My first reaction is that /y will definitely work. But, I think it
continues a conflation of the compiled regexp and an *in progress
match* in the JS RegExp API.
That is, .lastIndex does not belong on the RegExp object itself, and
sticky mode doesn't need to be a modifier on the RegExp either.
They're both properties of the match taking place. So Python rightly
keeps 'pos' as an argument passed into the .search() and .match().
Say you have an HTTP server and you're matching incoming requests
against a regex. With Python's API, you can use the same RegExp
object, and keep a separate match state (pos) for each request. With
the JS API, you need to create a compiled RegExp per request, so that
.lastIndex doesn't get stomped on. And conceivably you may want /y in
one request but not another.
But perhaps the ship has already sailed on this. It's a bit annoying,
but a detail.
Thanks for the links!
Andy
FYI here is the updated proposal that addresses that problem:
http://andychu.net/ecmascript/RegExp-Enhancements-2.html
It went through to es-discuss finally (spam problem) so that's
probably the best place to discuss it.
https://mail.mozilla.org/pipermail/es-discuss/2010-January/thread.html
thanks,
Andy