Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Difference of * and + in regular expression

0 views
Skip to first unread message

Peng Yu

unread,
Jun 21, 2008, 10:04:04 PM6/21/08
to
Hi,

If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?

namespace a { namespace b { namespace c {

Thanks,
Peng

$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}

Gunnar Hjalmarsson

unread,
Jun 21, 2008, 10:39:09 PM6/21/08
to
Peng Yu wrote:
> If I used the uncommented if-statement, I would get no match.

Not true. $1 is defined, so the regex does match.

> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> print "$1\$\n";
> }

With the * quantifier, the regex seems to behave non-greedy, though.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

John W. Krahn

unread,
Jun 21, 2008, 10:40:07 PM6/21/08
to
Peng Yu wrote:
> Hi,
>
> If I used the uncommented if-statement, I would get no match. If I
> used the commend if statement otherwise, I would have the following
> string as the output. I'm wondering why the regular expression with *
> does not match anything?

It does match, it just doesn't match what you expected it to match.

> namespace a { namespace b { namespace c {
>
> Thanks,
> Peng
>
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> print "$1\$\n";
> }

$ perl -e'
use re qw/ debug /;

my $string = "a namespace a { namespace b { namespace c { ";

if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}

'
Compiling REx `\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)'
size 40 Got 324 bytes for offset annotations.
first at 1
1: STAR(3)
2: SPACE(0)
3: OPEN1(5)
5: CURLYX[0] {0,32767}(37)
7: OPEN2(9)
9: EXACT <namespace>(13)
13: PLUS(15)
14: SPACE(0)
15: ALNUM(16)
16: CURLYM[3] {0,32767}(28)
20: BRANCH(22)
21: ALNUM(26)
22: BRANCH(24)
23: DIGIT(26)
26: SUCCEED(0)
27: NOTHING(28)
28: STAR(30)
29: SPACE(0)
30: EXACT <{>(32)
32: STAR(34)
33: SPACE(0)
34: CLOSE2(36)
36: WHILEM[1/2](0)
37: NOTHING(38)
38: CLOSE1(40)
40: END(0)
minlen 0
Offsets: [40]
3[1] 1[2] 4[1] 0[0] 37[1] 0[0] 5[1] 0[0] 6[9] 0[0] 0[0] 0[0]
17[1] 15[2] 18[2] 27[1] 0[0] 20[1] 0[0] 20[1] 21[2] 23[1] 24[2] 26[1]
0[0] 27[0] 27[0] 30[1] 28[2] 31[2] 0[0] 35[1] 33[2] 36[1] 0[0] 37[0]
37[0] 38[1] 0[0] 39[0]
Matching REx "\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)" against "a

namespace a { namespace b { namespace c { "

Setting an EVAL scope, savestack=5
0 <> <a namespace > | 1: STAR
SPACE can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 3: OPEN1
0 <> <a namespace > | 5: CURLYX[0] {0,32767}
0 <> <a namespace > | 36: WHILEM[1/2]
0 out of 0..32767 cc=bfa0d330
Setting an EVAL scope, savestack=15
0 <> <a namespace > | 7: OPEN2
0 <> <a namespace > | 9: EXACT <namespace>
failed...
restoring \1 to -1(0)..-1(no)
restoring \1..\3 to undef
failed, try continuation...
0 <> <a namespace > | 37: NOTHING
0 <> <a namespace > | 38: CLOSE1
0 <> <a namespace > | 40: END
Match successful!
$
Freeing REx: `"\\s*((namespace\\s+\\w(\\w|\\d)*\\s*\\{\\s*)*)"'


You see where it says "Match successful!", that means that the
expression (namespace\s+\w(\w|\d)*\s*\{\s*)* matched zero times.

Also, the expression \w(\w|\d)* could be simplified to \w+.


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Peng Yu

unread,
Jun 22, 2008, 12:21:54 AM6/22/08
to
On Jun 21, 9:39 pm, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> Peng Yu wrote:
> > If I used the uncommented if-statement, I would get no match.
>
> Not true. $1 is defined, so the regex does match.
>
> > $string="a namespace a { namespace b { namespace c { ";
>
> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> > print "$1\$\n";
> > }
>
> With the * quantifier, the regex seems to behave non-greedy, though.

According to the manual, *? is non-greedy.
Why * is also non-greedy?

Thanks,
Peng

Gunnar Hjalmarsson

unread,
Jun 22, 2008, 12:46:49 AM6/22/08
to

I don't know, sorry. Maybe the answer can be derived from John's more
extensive explanation.

Ben Morrow

unread,
Jun 21, 2008, 11:04:57 PM6/21/08
to

Quoth Peng Yu <Peng...@gmail.com>:

>
> If I used the uncommented if-statement, I would get no match. If I
> used the commend if statement otherwise, I would have the following
> string as the output. I'm wondering why the regular expression with *
> does not match anything?
>
> namespace a { namespace b { namespace c {
>
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {

'Match earlier in the string' beats 'match longest', even with greedy
matching, and since your regex will match the empty string the first
match is right before the first 'a'.

Ben

--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [b...@morrow.me.uk]

Tad J McClellan

unread,
Jun 22, 2008, 11:00:28 AM6/22/08
to


Greediness is not involved here.

(Greedy vs. non-greedy never changes whether a match will succeed or fail.
It is simply a "tie breaker" used when the regex engine can match more
than one way at the current pos()ition.
)

There are 2 primary issues with this OP's problem: writing a pattern
where everything is optional, and that regexes match as early as possible
from left to right.

If you write a pattern where everything is optional, then it will match
the empty string, which in turn means that it would match *every* string
you can think of.

The left-to-right evaluation of the pattern seems to be buried
a bit in perlre.pod:

The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

comp.llang.perl.moderated

unread,
Jun 22, 2008, 11:41:02 PM6/22/08
to
On Jun 22, 8:00 am, Tad J McClellan <ta...@seesig.invalid> wrote:

I still prefer to think of this as another
aspect of greediness: * can be greedy
but only as greedy as needed to get the
earliest match. Thus, even greed embraces the cardinal Perl virtue of
laziness....

--
Charles DeRykus

Ted Zlatanov

unread,
Jun 23, 2008, 12:26:47 PM6/23/08
to
On Sun, 22 Jun 2008 20:41:02 -0700 (PDT) "comp.llang.perl.moderated" <c...@blv-sam-01.ca.boeing.com> wrote:

clpm> I still prefer to think of this as another aspect of greediness: *
clpm> can be greedy but only as greedy as needed to get the earliest
clpm> match. Thus, even greed embraces the cardinal Perl virtue of
clpm> laziness....

I'd call that opportunism, not laziness.

"The two cardinal virtues of Perl are TMTOWTDI and laziness and
opportunism... No, no. The THREE cardinal virtues of Perl are TMTOWTDI
and laziness and opportunism and DWIM... DAMN IT... The FOUR cardinal
virtues of Perl are... etc."

Ted

xho...@gmail.com

unread,
Jun 23, 2008, 3:18:22 PM6/23/08
to

It depends on what you mean. "Greedy" in CS generally means you make
locally optimal decisions, rather than looking for globally optimal ones.
But what is considered "optimal" in the local matching of a regex?

In this sense, it is greedy either way, in that it still optimizes locally
rather than globally. It is just that what we consider optimal changes
with the addition of ?.

At this point, perhaps they revert from a CS meaning to a moral/political
meaning--greedy no longer means local vs. global, now it means as much as
possible vs. as little as possible.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

MSwanberg

unread,
Jun 25, 2008, 3:49:31 PM6/25/08
to
> }- Hide quoted text -
>
> - Show quoted text -


I changed it to

if ($string =~ /\s*(namespace\s+\w(\w|\d)*\s*\{\s*)/) {
print "$1\$\n";
}

and it seems to work okay.

What exactly are you trying to do?

-Mike

0 new messages