Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Irregularities in Regular Expressions

0 views
Skip to first unread message

Brantley Harris

unread,
Aug 20, 2009, 2:23:51 AM8/20/09
to
Type this into your javascript console, you'll notice it prints out
false.
r = new RegExp("\\\\", "g"); r.test('\\') == r.test('\\');

What is going on here?

Without the global flag it will print true, as expected:
r = new RegExp("\\\\"); r.test('\\') == r.test('\\');

But with the flag it will alternate:
r = new RegExp("\\\\", "g");
r.test('\\') --> true
r.test('\\') --> false
r.test('\\') --> true

I have tried this in both Firefox and Safari. Am I going insane? Can
anyone explain this?

Jonathan Fine

unread,
Aug 20, 2009, 2:57:00 AM8/20/09
to

Your question is a good one, and a sign of sanity. According to:
http://stackoverflow.com/questions/61088/hidden-features-of-javascript
"Regular expression "constants" can maintain state (like the last thing
they matched)."

Here's a variant of your example:

js> r = /a/g
/a/g
js> r.test('aaa')
true
js> r.test('aaa')
true
js> r.test('aaa')
true
js> r.test('aaa')
false
js> r.test('aaa')
true
js> r.test('aaa')
true
js> r.test('aaa')
true
js> r.test('aaa')
false

I don't recall exactly where the state is stored - maybe I've given you
enough to solve the problem yourself - try the Mozilla Developer Center.

--
Jonathan

Jonathan Fine

unread,
Aug 20, 2009, 3:02:56 AM8/20/09
to
Jonathan Fine wrote:

> Your question is a good one, and a sign of sanity. According to:
> http://stackoverflow.com/questions/61088/hidden-features-of-javascript
> "Regular expression "constants" can maintain state (like the last thing
> they matched)."

There's actually a wart in the language here, according to Nick Zakas,
that is fixed in ES5.

http://google-caja.googlecode.com/svn/trunk/doc/html/es5-talk/es5-talk.html
http://google-caja.googlecode.com/svn/trunk/doc/html/es5-talk/img29.html

--
Jonathan

Malcolm Dew-Jones

unread,
Aug 20, 2009, 3:05:13 AM8/20/09
to
Brantley Harris (deadw...@gmail.com) wrote:
: Type this into your javascript console, you'll notice it prints out

: false.
: r = new RegExp("\\\\", "g"); r.test('\\') == r.test('\\');

: What is going on here?

This part is easy. What you show below is that each /g test alternates
true and false, so "r.test('\\') == r.test('\\');" is true==false, which
is certainly false.

So the only question is why r.test('\\') alternates true and false.

: Without the global flag it will print true, as expected:

Thomas 'PointedEars' Lahn

unread,
Aug 20, 2009, 3:43:20 AM8/20/09
to
Brantley Harris wrote:
> Type this into your javascript console,

The term "javascript console" would normally need to be defined by an OP as
there is not only one ECMAScript implementation named "javascript", and
behavior may and is known to vary between implementations; especially with
RegExp objects.

> you'll notice it prints out false.
> r = new RegExp("\\\\", "g"); r.test('\\') == r.test('\\');
>
> What is going on here?
>
> Without the global flag it will print true, as expected:
> r = new RegExp("\\\\"); r.test('\\') == r.test('\\');
>
> But with the flag it will alternate:
> r = new RegExp("\\\\", "g");
> r.test('\\') --> true
> r.test('\\') --> false
> r.test('\\') --> true
>
> I have tried this in both Firefox and Safari. Am I going insane?

Hopefully not.

> Can anyone explain this?

Yes.

Your Regular Expression, that the RegExp object referred to by `r'
encapsulates, matches one literal backslash. There is only one backslash in
the string to be tested each (the first and only argument of the test()
call, respectively), and you are reusing the same RegExp object, which has
its global property set to `true' per the constructor call.

As RegExp.prototype.test() calls RegExp.prototype.exec() and that must
return `null' if there are no more matches and the `global' property has the
value `true' (step 6 of its specified algorithm)¹, the first call to
r.test() returns `true', but the second one returns `false'².

See the ECMAScript Language Specification, Edition 3 Final:

| 15.10.6.3 RegExp.prototype.test(string)
|
| Equivalent to the expression RegExp.prototype.exec(string) != null.

And `(true == false)' results in `false' (section 11.9.1).


PointedEars
___________
¹ If that were not so, the useful loop `while ((m = /x/g.exec(s)))' would
never terminate in implementations of said ECMAScript Edition.
² Consequently, the next call would return `true' again as the lastIndex
value would be reset to 0 and there would be a match at index 0.
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300...@news.demon.co.uk>

Richard Cornford

unread,
Aug 20, 2009, 3:56:00 AM8/20/09
to
Malcolm Dew-Jones wrote:

> Brantley Harris wrote:
> : Type this into your javascript console, you'll notice it
> : prints out
> : false.
> : r = new RegExp("\\\\", "g"); r.test('\\') == r.test('\\');
>
> : What is going on here?
>
> This part is easy. What you show below is that each /g test
> alternates true and false, so "r.test('\\') == r.test('\\');
> " is true==false, which is certainly false.
>
> So the only question is why r.test('\\') alternates true and false.
<snip>

Regular expression objects have a - lastIndex - property that receives
different handling depending on whether the regular expression has its -
global - flag set to true or false. With the - global - flag set to
true the regular expression's - exec - method (which is employed by
its - test - method) does not re-set the - lastIndex - to zero unless it
is already greater than the length of the string being tested. The -
lastIndex - property is used to determine the point in the tested string
at which a test starts. Thus - exec - starts testing the second time
after the point at which the match was found in the first test. That
second test fails, leaving the - lastIndex - greater than the length of
the tested string, so a third test sees the - lastIndex - re-set it to
zero. The result is oscillating behaviour described.

This characteristic allows for constructs such as applying a global
regular expression in the condition of a - while - loop and handling
each match in the body of the loop, and so can be useful. It becomes
problematic when the - global - flag is employed inappropriately, though
it is trivial to manually set/re-set the regular expression's -
lastIndex - property to zero prior to using it.

Richard.

0 new messages