Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regular Expression Question

11 views
Skip to first unread message

Robert

unread,
Nov 19, 2009, 8:31:01 AM11/19/09
to
I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf
where the alpha chars are entirely random.

When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use
[0-9]... I get 2 as a match count.

I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?

Can someone tell me the correct pattern to use? (without counting on the
surrounding characters).

Tom Lavedas

unread,
Nov 19, 2009, 9:54:19 AM11/19/09
to

The best I could do was find all contiguous digits greater than four
and then post process that ...

sMatch = RegExpFind("\d{4,}", "sfsdf1234567sdfsdf")
if sMatch <> "" then
Redim aMatches(Len(sMatch) - 3)

aMatches(0) = Left(sMatch, 4)

For i = 1 to Len(sMatch) - 4
aMatches(i) = Mid(sMatch, i + 1, 4)
Next
wsh.echo Join(aMatches, vbnewline)
else
wsh.echo "No match found"
end if

' Finds the first match, only
Function RegExpFind(patrn, strng)
Dim regEx, Matches ' Define variables.
Set regEx = New RegExp ' Create a regular
expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = False ' Set case insensitivity.
regEx.Global = False ' Set global applicability.
Set Matches = regEx.Execute(strng) ' Execute search.
RegExpFind = Matches(0).Value ' Return match
End Function

It get the answer you were looking for in this case. I doesn't try to
find disjointed groups of four or more digits. In that case, the
function would need to return an array and each element would need to
be processed, but that's certainly possible.
_____________________
Tom Lavedas

Csaba Gabor

unread,
Nov 19, 2009, 2:02:11 PM11/19/09
to
On Nov 19, 2:31 pm, Robert <Rob...@discussions.microsoft.com> wrote:
> I want to get all the 4 digit numbers from the following sfsdf1234567sdfsdf
> where the alpha chars are entirely random.
...

> I want my regexp to match against 1234, 2345, 3456, 4567 and get a count of 4?

This was a highly interesting problem because I think it
exposes a bug in VBScript Regular Expression object. In
particular, the regular expression which finds the number
of such matches is:

(?=\d{4})

That is
subject = "sfsdf1234567sdfsdf"
Set regExM = New RegExp
regExM.Pattern = "(?=\d{4})"
regExM.Global = True
Set Matches = regExM.Execute(subject)
MsgBox Matches.count

This, however, does not return the matches. To do
that, we should actually capture the 4 lookahead digits.
We can do so as follows: regExM.Pattern = "(?=(\d{4}))"

HOWEVER, if at this point you try to iterate through all
the matches:

For Each Match in Matches
MsgBox Match.SubMatches(0)
Next

your VBScript will get really whacked out (the bug).
The supposedly captured subpatterns are actually
not available. There is some major nastiness in there.
One's first reaction might be to think, well you shouldn't
be trying to make captures within a lookahead. I mean
it doesn't say anywhere explicitly that that should work,
right?

But actually, capturing within forward lookahead does
work in both JScript and VBscript (as the below examples
demonstrate) if you approach it in an alternate fashion.
Here is the VBScript code which will show the number of
matches along with the actual matches:

subject = "sfsdf1234567sdfsdf"
Set regExR = New RegExp
regExR.Pattern = "(\d)(?=(\d{3}))"
regExR.Global = True

Set regExM = New RegExp
regExM.Pattern = "\d{4}"
regExM.Global = True

Set Matches = regExM.Execute(regExR.replace(subject, "$1$2 "))
res = Matches.count & " matches:"
For Each Match in Matches
res = res & vbCrLf & Match.Value
Next
MsgBox res


Finally, here is a slightly more compact javascript
way to do the same thing (assuming it's in a script
element in a web page):

var subject = "abcd1234567wqe";
sub2 = subject.replace(/(\d)(?=(\d{3}))/g, "$1$2 ").match(/\d{4}/g);
alert (sub2.length + " matches:\n" + sub2.join("\n"));


Csaba Gabor from Vienna
Replace alert with WScript.Echo if you want to place the above
three JScript lines into a .js file to run from the command line.

PS. I would be curious to know whether the .NET version of the
RegExp engine exhibits the same flaw.

Paul Randall

unread,
Nov 19, 2009, 10:11:34 PM11/19/09
to

"Csaba Gabor" <dan...@gmail.com> wrote in message
news:d1dd3386-f6c9-454d...@v30g2000yqm.googlegroups.com...

(?=\d{4})

-------------------------------------------

Hi, Csaba
Download and install Regular Expression Workbench:
http://code.msdn.microsoft.com/RegexWorkbench/Release/ProjectReleases.aspx?ReleaseId=406
It uses the dot net engine.
I find it especially handy for parsing a regular expression into chunks that
I can understand, and for testing a regular expression I'm trying to build
for a VBScript application. I have to be especially careful to remember
that just because it works in RE Workbench (dot net) does not mean that it
works in VBScript. I don't know of any valid VBScript regular expressions
that it does not correctly parse.

-Paul Randall


Dr J R Stockton

unread,
Nov 20, 2009, 4:43:50 PM11/20/09
to
In microsoft.public.scripting.vbscript message <A3A5975E-7677-4B93-948B-
C61DC3...@microsoft.com>, Thu, 19 Nov 2009 05:31:01, Robert
<Rob...@discussions.microsoft.com> posted:

This works in JavaScript (Firefox 3.0.15); perhaps it can be translated:

St = "aaa1234567b45454bb" ; T = []
RE = /(\d\d\d\d)/gi
RE.lastIndex = 0 // Seems needed in FF & Op, to repeat
while (true) {
A = RE.exec(St) ; RE.lastIndex -= 3
if (!A) break
T.push(A[1])
}

Result is in T.

--
(c) John Stockton, nr London, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQqish topics, acronyms & links;
Astro stuff via astron-1.htm, gravity0.htm ; quotings.htm, pascal.htm, etc.
No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.

Evertjan.

unread,
Nov 21, 2009, 5:04:47 AM11/21/09
to
Dr J R Stockton wrote on 20 nov 2009 in
microsoft.public.scripting.vbscript:

> In microsoft.public.scripting.vbscript message
> <A3A5975E-7677-4B93-948B- C61DC3...@microsoft.com>, Thu, 19 Nov
> 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted:
>>I want to get all the 4 digit numbers from the following
>>sfsdf1234567sdfsdf where the alpha chars are entirely random.
>>
>>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use
>>[0-9]... I get 2 as a match count.
>>
>>I want my regexp to match against 1234, 2345, 3456, 4567 and get a
>>count of 4?
>>
>>Can someone tell me the correct pattern to use? (without counting on
>>the surrounding characters).
>
> This works in JavaScript (Firefox 3.0.15); perhaps it can be
> translated:
>
> St = "aaa1234567b45454bb" ; T = []
> RE = /(\d\d\d\d)/gi

why the i ?

> RE.lastIndex = 0 // Seems needed in FF & Op, to repeat
> while (true) {
> A = RE.exec(St) ; RE.lastIndex -= 3
> if (!A) break
> T.push(A[1])
> }
>
> Result is in T.

This can be done with match():

<script type='text/javascript'>
var s = "aaa1234567b45454bb";
var r = s.match(/\d{4}/g);

document.write(r); // 1234,4545
</script>

translated into vbs with regEx.Execute():

<script type='text/vbscript'>
s = "aaa1234567b45454bb"


Set regEx = New RegExp

regEx.Pattern = "\d{4}"
regEx.Global = True
Set Matches = regEx.Execute(s)

r = ""


For Each Match in Matches

if r <> "" Then r = r & ","
r = r & Match.Value
Next
document.write r
</script>

[making quite a point of the ease of Javascript]


--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

Tom Lavedas

unread,
Nov 21, 2009, 6:57:22 AM11/21/09
to
On Nov 21, 5:04 am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote:
> Dr J R Stockton wrote on 20 nov 2009 in
> microsoft.public.scripting.vbscript:
>
>
>
> > In microsoft.public.scripting.vbscript message
> > <A3A5975E-7677-4B93-948B- C61DC30FE...@microsoft.com>, Thu, 19 Nov

Go all the way back to Robert's OP. The formulation you post does not
meet the original requirements - and that's why the rest of the
discussion ensued.
_____________________
Tom Lavedas

Evertjan.

unread,
Nov 21, 2009, 7:56:45 AM11/21/09
to
Tom Lavedas wrote on 21 nov 2009 in microsoft.public.scripting.vbscript:
> On Nov 21, 5:04�am, "Evertjan." <exjxw.hannivo...@interxnl.net> wrote:
>> Dr J R Stockton wrote on 20 nov 2009 in

[please do not quote signatures on usenet]

> Go all the way back to Robert's OP. The formulation you post does not
> meet the original requirements - and that's why the rest of the
> discussion ensued.

I do not think so, Tom, the OP is included.

And even if it were, that is usenet,
discussion drifting is part of the fun.

Tom Lavedas

unread,
Nov 21, 2009, 9:03:59 AM11/21/09
to

Sure, drift is what happens. I agree, its part of the experience.
However, my point was that the pattern you posted does not return all
four digit permutation as the OP requested. Your own commented code
says as much:

document.write(r); // 1234,4545

The request, as I (and others) read it was for output like this ...

1234, 2345, 3456, 4567, 4545, 5454
_____________________
Tom Lavedas

Evertjan.

unread,
Nov 21, 2009, 11:10:03 AM11/21/09
to

And again:


[please do not quote signatures on usenet]

>

> Sure, drift is what happens. I agree, its part of the experience.
> However, my point was that the pattern you posted does not return all
> four digit permutation as the OP requested. Your own commented code
> says as much:
>
> document.write(r); // 1234,4545
>
> The request, as I (and others) read it was for output like this ...
>
> 1234, 2345, 3456, 4567, 4545, 5454

I see, well I did not read it that way.

In Javascript this is simple too:

<script type='text/javascript'>
var a = 'aaa1234567b45454bb'.split(/\D+/);
for (var i=0;i<a.length;i++)
for (var j=0;j<a[i].length-3;j++)
document.write(a[i].substr(j,4)+',');
</script>

writes: 1234,2345,3456,4567,4545,5454,

Csaba Gabor

unread,
Nov 21, 2009, 3:29:08 PM11/21/09
to
As does:
<script type='text/javascript'>
document.write("aaa1234567b45454bb".
replace(/.*?(\d)(?=(\d{3}))|.+$/g, "$1$2,").replace(/,+$/,""))
</script>

The second replace merely gets rid of trailing commas.

The first replace, doing the heavy lifting, says: eat chars
until there's a digit (that's the (\d), which we'll call $1)
followed by three other digits (which we'll call $2). When
that happens, replace $1 (and the prior characters) with:
$1 followed by $2 followed by ","

Characters now continue to be eaten starting from the char
following the $1 (in the original string). The eating does
not start from after the $2 because $2 was embedded within
a forward lookahead, namely the (?=...), so that the $2
chars did not get eaten in the prior part. Finally,
that |.+$ is there because once we get to the last set of 4
contiguous digits, we want to replace the remainder of the
string with the empty string (or more precisely, a comma),
so those final characters have to get eaten somehow (and
the plus ensures there is at least one character to eat).

This is a slight adaptation from the last part
of my Nov. 19 post in this same thread:
http://groups.google.com/group/microsoft.public.scripting.vbscript/browse_frm/thread/82238d66a934a547

Csaba Gabor from Vienna

Dr J R Stockton

unread,
Nov 22, 2009, 1:42:19 PM11/22/09
to
In microsoft.public.scripting.vbscript message <Xns9CCA70B5C810Feejj99@1
94.109.133.242>, Sat, 21 Nov 2009 10:04:47, Evertjan. <exjxw.hannivoort@
interxnl.net> posted:

>Dr J R Stockton wrote on 20 nov 2009 in
>microsoft.public.scripting.vbscript:
>
>> In microsoft.public.scripting.vbscript message
>> <A3A5975E-7677-4B93-948B- C61DC3...@microsoft.com>, Thu, 19 Nov
>> 2009 05:31:01, Robert <Rob...@discussions.microsoft.com> posted:
>>>I want to get all the 4 digit numbers from the following
>>>sfsdf1234567sdfsdf where the alpha chars are entirely random.
>>>
>>>When I use the pattern [0-9]..[0-9] I get a match count of 1. if i use
>>>[0-9]... I get 2 as a match count.
>>>
>>>I want my regexp to match against 1234, 2345, 3456, 4567 and get a
>>>count of 4?
>>>
>>>Can someone tell me the correct pattern to use? (without counting on
>>>the surrounding characters).
>>
>> This works in JavaScript (Firefox 3.0.15); perhaps it can be
>> translated:
>>
>> St = "aaa1234567b45454bb" ; T = []
>> RE = /(\d\d\d\d)/gi
>
>why the i ?

Superfluously inherited from the code - now function DayCheck3 in
<linxchek.htm> - from which the code for this was derived.


>This can be done with match():
>
><script type='text/javascript'>
> var s = "aaa1234567b45454bb";
> var r = s.match(/\d{4}/g);
>
> document.write(r); // 1234,4545
></script>

Only if one does not mind getting a different result.

>translated into vbs with regEx.Execute():
>
><script type='text/vbscript'>
> s = "aaa1234567b45454bb"
> Set regEx = New RegExp
> regEx.Pattern = "\d{4}"
> regEx.Global = True
> Set Matches = regEx.Execute(s)
>
> r = ""
> For Each Match in Matches
> if r <> "" Then r = r & ","
> r = r & Match.Value
> Next
> document.write r
></script>
>
>[making quite a point of the ease of Javascript]

The OP did not want a string, AFAICS.

Rather than using two nested loops, one could use the local equivalent
of (after proper testing)

for (J=0, T = [] ; J<St.length-3 ; J++)
if ( !/\D/.test(S = St.substr(J, 4) ) ) T.push(S)

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)

0 new messages