If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?
namespace a { namespace b { namespace c {
Thanks,
Peng
$string="a namespace a { namespace b { namespace c { ";
#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}
Not true. $1 is defined, so the regex does match.
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> print "$1\$\n";
> }
With the * quantifier, the regex seems to behave non-greedy, though.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
It does match, it just doesn't match what you expected it to match.
> namespace a { namespace b { namespace c {
>
> Thanks,
> Peng
>
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> print "$1\$\n";
> }
$ perl -e'
use re qw/ debug /;
my $string = "a namespace a { namespace b { namespace c { ";
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}
'
Compiling REx `\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)'
size 40 Got 324 bytes for offset annotations.
first at 1
1: STAR(3)
2: SPACE(0)
3: OPEN1(5)
5: CURLYX[0] {0,32767}(37)
7: OPEN2(9)
9: EXACT <namespace>(13)
13: PLUS(15)
14: SPACE(0)
15: ALNUM(16)
16: CURLYM[3] {0,32767}(28)
20: BRANCH(22)
21: ALNUM(26)
22: BRANCH(24)
23: DIGIT(26)
26: SUCCEED(0)
27: NOTHING(28)
28: STAR(30)
29: SPACE(0)
30: EXACT <{>(32)
32: STAR(34)
33: SPACE(0)
34: CLOSE2(36)
36: WHILEM[1/2](0)
37: NOTHING(38)
38: CLOSE1(40)
40: END(0)
minlen 0
Offsets: [40]
3[1] 1[2] 4[1] 0[0] 37[1] 0[0] 5[1] 0[0] 6[9] 0[0] 0[0] 0[0]
17[1] 15[2] 18[2] 27[1] 0[0] 20[1] 0[0] 20[1] 21[2] 23[1] 24[2] 26[1]
0[0] 27[0] 27[0] 30[1] 28[2] 31[2] 0[0] 35[1] 33[2] 36[1] 0[0] 37[0]
37[0] 38[1] 0[0] 39[0]
Matching REx "\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)" against "a
namespace a { namespace b { namespace c { "
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 1: STAR
SPACE can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 3: OPEN1
0 <> <a namespace > | 5: CURLYX[0] {0,32767}
0 <> <a namespace > | 36: WHILEM[1/2]
0 out of 0..32767 cc=bfa0d330
Setting an EVAL scope, savestack=15
0 <> <a namespace > | 7: OPEN2
0 <> <a namespace > | 9: EXACT <namespace>
failed...
restoring \1 to -1(0)..-1(no)
restoring \1..\3 to undef
failed, try continuation...
0 <> <a namespace > | 37: NOTHING
0 <> <a namespace > | 38: CLOSE1
0 <> <a namespace > | 40: END
Match successful!
$
Freeing REx: `"\\s*((namespace\\s+\\w(\\w|\\d)*\\s*\\{\\s*)*)"'
You see where it says "Match successful!", that means that the
expression (namespace\s+\w(\w|\d)*\s*\{\s*)* matched zero times.
Also, the expression \w(\w|\d)* could be simplified to \w+.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
According to the manual, *? is non-greedy.
Why * is also non-greedy?
Thanks,
Peng
I don't know, sorry. Maybe the answer can be derived from John's more
extensive explanation.
'Match earlier in the string' beats 'match longest', even with greedy
matching, and since your regex will match the empty string the first
match is right before the first 'a'.
Ben
--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [b...@morrow.me.uk]
Greediness is not involved here.
(Greedy vs. non-greedy never changes whether a match will succeed or fail.
It is simply a "tie breaker" used when the regex engine can match more
than one way at the current pos()ition.
)
There are 2 primary issues with this OP's problem: writing a pattern
where everything is optional, and that regexes match as early as possible
from left to right.
If you write a pattern where everything is optional, then it will match
the empty string, which in turn means that it would match *every* string
you can think of.
The left-to-right evaluation of the pattern seems to be buried
a bit in perlre.pod:
The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
I still prefer to think of this as another
aspect of greediness: * can be greedy
but only as greedy as needed to get the
earliest match. Thus, even greed embraces the cardinal Perl virtue of
laziness....
--
Charles DeRykus
clpm> I still prefer to think of this as another aspect of greediness: *
clpm> can be greedy but only as greedy as needed to get the earliest
clpm> match. Thus, even greed embraces the cardinal Perl virtue of
clpm> laziness....
I'd call that opportunism, not laziness.
"The two cardinal virtues of Perl are TMTOWTDI and laziness and
opportunism... No, no. The THREE cardinal virtues of Perl are TMTOWTDI
and laziness and opportunism and DWIM... DAMN IT... The FOUR cardinal
virtues of Perl are... etc."
Ted
It depends on what you mean. "Greedy" in CS generally means you make
locally optimal decisions, rather than looking for globally optimal ones.
But what is considered "optimal" in the local matching of a regex?
In this sense, it is greedy either way, in that it still optimizes locally
rather than globally. It is just that what we consider optimal changes
with the addition of ?.
At this point, perhaps they revert from a CS meaning to a moral/political
meaning--greedy no longer means local vs. global, now it means as much as
possible vs. as little as possible.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
I changed it to
if ($string =~ /\s*(namespace\s+\w(\w|\d)*\s*\{\s*)/) {
print "$1\$\n";
}
and it seems to work okay.
What exactly are you trying to do?
-Mike