Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexp test function behavior

3 views
Skip to first unread message

HopfZ

unread,
Oct 29, 2006, 7:02:38 AM10/29/06
to
I coudn't understand some behavior of RegExp.test function.

Example html code:
----------------
<html><head></head><body><script type="text/javascript">
var r = /^https?:\/\//g;
document.write( [
r.test('http://a'),
r.test('http://b'),
r.test('http://c'),
r.test('http://d')
]);
</script></body></html>
---------------------

The page displays true, false, true, false. (in Opera, Firefox and IE)
This is strange because I expected it would display true, true, true,
true. There must be something I didn't know about the function
RegExp.test.

Michael Winter

unread,
Oct 29, 2006, 7:56:06 AM10/29/06
to
HopfZ wrote:

[snip]

> var r = /^https?:\/\//g;
> document.write( [
> r.test('http://a'),
> r.test('http://b'),
> r.test('http://c'),
> r.test('http://d')
> ]);

[snip]

> The page displays true, false, true, false. (in Opera, Firefox and IE)
> This is strange because I expected it would display true, true, true,
> true. There must be something I didn't know about the function
> RegExp.test.

The global flag is the cause of your confusion. It doesn't even make
sense for it to be included: you're using an expression with an
input-start assertion[1] (^) and that could only ever match once.

The RegExp.prototype.test method is equivalent to the expression,

re.exec(str) != null

and the global flag is significant when the RegExp.prototype.exec method
is used. After a match, the lastIndex property of the regular expression
object is modified to point just beyond the end of the previously
matched sub-string. On the next invocation of the exec method, this
position is used to begin the next search.

At the end of the first call, the lastIndex property will point beyond
the end of the match (to the character, 'a'). Whilst attempting to match
the input-start assertion (^) in the second call, the assertion will
fail (the match is attempted after the start of the string). These
attempts will continue until the end of the string is reached, at which
point the lastIndex property is reset to zero and null is returned. With
the lastIndex property reset, the third call can proceed normally like
the first. The fourth call will be a repeat of the second.

Mike


[1] With the multi-line flag, it also acts as a line-start
assertion, but that doesn't apply here.

Evertjan.

unread,
Oct 29, 2006, 10:38:50 AM10/29/06
to
Michael Winter wrote on 29 okt 2006 in comp.lang.javascript:

> The global flag is the cause of your confusion. It doesn't even make
> sense for it to be included: you're using an expression with an
> input-start assertion[1] (^) and that could only ever match once.
>

Even more so, setting the global flag in a test() never makes any sense.

> At the end of the first call, the lastIndex property will point beyond
> the end of the match

A good explanation.

Even so it is a bug!!!!

The global flag should either lead to an error,
or be disregarded in test().

===================================

Testing:

<script type='text/javascript'>

// IE7 tested

var r = /x/g;
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write('<br>');
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true

document.write('<br>');
r = /x/;
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // true


</script>

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Michael Winter

unread,
Oct 30, 2006, 8:37:36 AM10/30/06
to
Evertjan. wrote:

[snip]

> Even more so, setting the global flag in a test() never makes any
> sense.

Never? I don't know about that. Rare, certainly.

For example, one way to count the number of occurrences of a pattern
within a string is to use the String.prototype.match method[1]:

var result = string.match(regExp),
count = result ? result.length : 0;

where regExp is a regular expression object with the global flag set.
However, one could also do the same with the RegExp.prototype.test method:

function countMatches(string, pattern) {
var count = 0,
index = pattern.lastIndex = 0;

while (pattern.test(string)) {
++count;
if (pattern.lastIndex == index) {++pattern.lastIndex;}
}
return count;
}

var count = countMatches(string, regExp);

Marginally easier to use, and slightly more efficient in Fx and Op - the
test was simple: a case-sensitive, single character search. Even though
it's slower in MSIE, performance is still better than in either Fx or Op.

[snip]

> Even so it is a bug!!!!
>
> The global flag should either lead to an error,
> or be disregarded in test().

Not at all. The blame would fall on the developer who used a global flag
where it didn't belong, or failed to reset the lastIndex property after
a previous invocation.

[snip]

Mike


[1] Browsers return null from the RegExp.prototype.match method
when both the global flag is set for the regular expression
object and no matches are found. It seems to me that
15.5.4.10, ECMA-262 3rd Ed. would call for an empty array.
Not that big a deal, but it would make the example above a
bit simpler.

Evertjan.

unread,
Oct 30, 2006, 10:50:48 AM10/30/06
to
Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:

> Evertjan. wrote:
>
> [snip]
>
>> Even more so, setting the global flag in a test() never makes any
>> sense.
>
> Never? I don't know about that. Rare, certainly.

Never!

> For example, one way to count the number of occurrences of a pattern
> within a string is to use the String.prototype.match method[1]:
>
> var result = string.match(regExp),
> count = result ? result.length : 0;

Michael, I said: in "test()"

"match()" is not "test()"

[snap]

>
> [snip]
>
>> Even so it is a bug!!!!
>>
>> The global flag should either lead to an error,
>> or be disregarded in test().
>
> Not at all.

But it is.

The subject of this thread is:
"regexp test function behavior"
not:
"regexp match function behavior"

"match()" is not "test()"

[snap]

> [1] Browsers return null from the RegExp.prototype.match method

[snip]

"match()" is not "test()"

Michael Winter

unread,
Oct 30, 2006, 11:59:32 AM10/30/06
to
Evertjan. wrote:

> Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:
>
>> Evertjan. wrote:
>>
>> [snip]
>>
>>> Even more so, setting the global flag in a test() never makes any
>>> sense.
>>
>> Never? I don't know about that. Rare, certainly.
>
> Never!

Care to state a reason?

The information that can be gleaned from using the RegExp.prototype.test
method in this way is limited[1], which is why such usage would be rare.
However, that is a far cry from claiming that it makes no sense. Indeed,
my previous post demonstrated a reasonable use.

The point of the global flag is to allow repetitive processing, where
the lastIndex property indicates the position from which the next
invocation starts. This would allow the test method to assert that there
is more than one match, or even that one begins after a certain point
should the lastIndex property be set explicitly. If that's all that's
required, then there's no need to use a method that would return more
information (and be wasteful, in the process).

>> For example, one way to count the number of occurrences of a pattern
>> within a string is to use the String.prototype.match method[1]:
>>
>> var result = string.match(regExp),
>> count = result ? result.length : 0;
>
> Michael, I said: in "test()"

I read what you wrote.

> "match()" is not "test()"

If you read past the part that you quoted, you would notice that I go on
to present an equivalent using the test method and a regular expression
with the global flag set. Mentioning the String.prototype.match method
was merely a comparison.

>>> Even so it is a bug!!!!
>>>
>>> The global flag should either lead to an error,
>>> or be disregarded in test().
>>
>> Not at all.

I wrote a little more than that.

> But it is.

Again, would you like to actually provide an explanation?

The test method is /defined/ in terms of the exec method; it is the
behavioural equivalent of:

regExp.exec(string) == null

/including/ all of the side effects that the exec method introduces. The
method should be used with that in mind, and if it's not, then it's the
fault of the developer and nobody else.

Note that an implementation doesn't have to use that exact expression.
Instead, it might copy the algorithm of the exec method (see 15.10.6.2),
except returning false instead of null in step 6, and returning true
instead of steps 12 and 13. This would save some time whilst providing
the same behaviour, however this latter issue is the most significant.

> The subject of this thread is:
> "regexp test function behavior"
> not:
> "regexp match function behavior"

I know. I answered the OP's question, did I not? Even so, threads drift.

> "match()" is not "test()"

I hope you're going to feel a little silly now after banging on about
that so irrationally.

>> [1] Browsers return null from the RegExp.prototype.match method
>
> [snip]
>
> "match()" is not "test()"

That comment was an aside, which was why I presented it as an endnote.

Mike


[1] As far as I can see, only three facts can be obtained:

1. Whether the string matched the pattern (the return value
of the method itself),
2. The location of first character to follow the match just
obtained (the value of the lastIndex property), and
3. Whether the pattern matched a zero-length string (the
lastIndex property will not have changed).

Evertjan.

unread,
Oct 30, 2006, 2:54:13 PM10/30/06
to
Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:

> I wrote a little more than that.
>
>> But it is.
>
> Again, would you like to actually provide an explanation?
>
> The test method is /defined/ in terms of the exec method; it is the
> behavioural equivalent of:
>
> regExp.exec(string) == null

That is neither here nor there. A method is not defined as a behavioural
equivalent, it's behaviour is described. It's implementation could be
defined as a behavioural equivalent, but that could make the method
buggy,
as it does in this case.

Having a global flag in a test makes no sense, since the result is
stable at the first match, and further searching should be aborted.

The possible "defining" of test() in the sense of having a search
starting point left over by an earlier test(), only if the regex string
variable is not refreshed, is so strange, we can only call that a bug.

> I know. I answered the OP's question, did I not? Even so, threads
> drift.

You specificly said that my assertion was wrong, by stating an unrelated
code, not using test() but match().



>> "match()" is not "test()"
>
> I hope you're going to feel a little silly now after banging on about
> that so irrationally.

Shall we keep on subject, Michael, or do you feel attacked in person?

Michael Winter

unread,
Oct 30, 2006, 9:43:23 PM10/30/06
to
Evertjan. wrote:

> Michael Winter wrote on 30 okt 2006 in comp.lang.javascript:

[snip]

>> The test method is /defined/ in terms of the exec method; it is the
>> behavioural equivalent of:
>>
>> regExp.exec(string) == null

A typo on my part: the comparison operator should be not-equal (!=), of
course.

> That is neither here nor there.

How so? It is a very succinct description of the behaviour of the method.

> A method is not defined as a behavioural equivalent, it's behaviour
> is described.

And it is: if the exec method were to return null or undefined, the test
method should return false. By examining the algorithm for the former,
one can ascertain precisely what is returned, where, and for what
reason, and how to modify the process to return booleans instead.

> It's implementation could be defined as a behavioural equivalent, but
> that could make the method buggy, as it does in this case.

I fail to see how.

> Having a global flag in a test makes no sense, since the result is
> stable at the first match, and further searching should be aborted.

That depends on what the test method is meant to do. Clearly, you have
decided upon a very limited definition. That does not make something the
language faulty; it means that your expectations are. The global flag
changes the behaviour of several methods related to regular expressions,
so it should only be used where that behaviour is desired.

> The possible "defining" of test() in the sense of having a search
> starting point left over by an earlier test(), only if the regex

^^^^^^^^^^^^^^^^^


> string variable is not refreshed, is so strange, we can only call

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> that a bug.

What?

[snip]

> You specificly said that my assertion was wrong, by stating an
> unrelated code, not using test() but match().

If you really believe that, you didn't read my second post properly. In
fact, that would seem to indicate that you didn't read the previous one
properly, either, where I wrote (emphasis added):

... I go on to present an equivalent *using the test method*


and a regular expression with the global flag set. Mentioning
the String.prototype.match method was merely a comparison.

[snip]

Mike

Evertjan.

unread,
Oct 31, 2006, 3:33:50 AM10/31/06
to
Michael Winter wrote on 31 okt 2006 in comp.lang.javascript:

>> The possible "defining" of test() in the sense of having a search
>> starting point left over by an earlier test(), only if the regex
> ^^^^^^^^^^^^^^^^^
>> string variable is not refreshed, is so strange, we can only call
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> that a bug.
>
> What?
>

As I wrote about before in this thread:

var r = /x/g;
// r, the regex string variable. will not be refreshed here:


document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write(r.test('x')+'<br>'); // true
document.write(r.test('x')+'<br>'); // false
document.write('<br>');
r = /x/g;

// r, will NOW be refreshed every time:


document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true
r = /x/g;
document.write(r.test('x')+'<br>'); // true

// if a litteral regex string is used, that works like refreshed:
document.write(/x/g.test('x')+'<br>'); // true
document.write(/x/g.test('x')+'<br>'); // true
document.write(/x/g.test('x')+'<br>'); // true

[..]


> ... I go on to present an equivalent *using the test method*
> and a regular expression with the global flag set. Mentioning
> the String.prototype.match method was merely a comparison.

You could perhaps be correctly explaining the behavour of test(),
I still fail to see why an explanation of a behavour of a js-method does
prevent that behavour to be a bug.

I certainly helps understanding a bug, so that we can programme "around"
it.

However, the above mentioned refreshing of the regex string variable
behavour difference is not explained, methinks.

Either way, I am still convinced we should call this a bug.

Michael Winter

unread,
Nov 1, 2006, 8:10:05 AM11/1/06
to
Evertjan. wrote:

[snip]

> var r = /x/g;
> // r, the regex string variable. will not be refreshed here:
> document.write(r.test('x')+'<br>'); // true
> document.write(r.test('x')+'<br>'); // false
> document.write(r.test('x')+'<br>'); // true
> document.write(r.test('x')+'<br>'); // false
> document.write('<br>');
> r = /x/g;
> // r, will NOW be refreshed every time:
> document.write(r.test('x')+'<br>'); // true
> r = /x/g;

[snip]

Ah, I see. If you wrote "regex object", that would have been more obvious.

Creating a new regular expression object is hardly necessary. Just set
the lastIndex property to zero:

var re = /x/g;

document.write(re.test('x') + '<br>'); // true
document.write(re.test('x') + '<br>'); // false
document.write(re.test('x') + '<br>'); // true
re.lastIndex = 0;
document.write(re.test('x') + '<br>'); // true

[snip]

> You could perhaps be correctly explaining the behavour of test(), I
> still fail to see why an explanation of a behavour of a js-method
> does prevent that behavour to be a bug.

It doesn't, not automatically. Specifications can be badly thought out,
but, in my opinion, that doesn't apply in this case.

[snip]

> However, the above mentioned refreshing of the regex string variable
> behavour difference is not explained, methinks.

Each literal evaluates to an object reference (the object itself is
created before execution begins as the literal is scanned), and each of
those objects are completely different - they do not compare as equal
even if the literal is exactly the same. The test method will alter the
lastIndex property of the referenced object, but that object will
eventually be discarded and replaced by a new one.

[snip]

Mike

Evertjan.

unread,
Nov 1, 2006, 10:41:41 AM11/1/06
to
Michael Winter wrote on 01 nov 2006 in comp.lang.javascript:
[..]

>> However, the above mentioned refreshing of the regex string variable
>> behavour difference is not explained, methinks.
>
> Each literal evaluates to an object reference (the object itself is
> created before execution begins as the literal is scanned), and each of
> those objects are completely different - they do not compare as equal
> even if the literal is exactly the same. The test method will alter the
> lastIndex property of the referenced object, but that object will
> eventually be discarded and replaced by a new one.

I begin to see.

However, I think this construction while being useful in match() and exec
(), is a bad one in test(). I would never have allowed test() to change any
property of the regex object, even it's lastIndex property.

Michael Winter

unread,
Nov 3, 2006, 9:22:56 AM11/3/06
to
Michael Winter wrote:

[snip]

> function countMatches(string, pattern) {
> var count = 0,
> index = pattern.lastIndex = 0;
>
> while (pattern.test(string)) {
> ++count;
> if (pattern.lastIndex == index) {++pattern.lastIndex;}

index = pattern.lastIndex;
> }
> return count;
> }

Forgot to update the index variable.

[snip]

Mike

0 new messages