Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: [perl #84294] Is this a bug in the simple Perl regex?

4 views
Skip to first unread message

demerphq

unread,
Feb 17, 2011, 3:29:53 AM2/17/11
to perl5-...@perl.org, Serge, bugs-bi...@rt.perl.org
looks like a bug to me.

can you provide perl -v output

and also output when you run the pattern under use re 'debug'

please?

On 17 February 2011 09:07, Serge <perlbug-...@perl.org> wrote:
> # New Ticket Created by  Serge
> # Please include the string:  [perl #84294]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=84294 >
>
>
> Hello Perl gurus,
>
>  the regex prints $2 undefined:
> ===
> #!perl -w
> use strict;
> #use re qw(debug);
>
> print "Match: \$1=$1 \$2=$2" if 'ab' =~
> /^((\w+)
>    (?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})
>  ){2}$
> /x;
> ===
>  Output:
> ===
> $2=ab
> $2 not defined
> $2=b
> Match: $1=b $2=b
> ===
>
>  J. Friedl and other regexp guru wrote, that after closing 2-nd bracket
> must exist special variables $2, $+, and $ ^N with the defined values, and
> also the link \2 should be created. But in this example all this has undefined
> value. Is it a bug?
>  Sorry for my English.
>
> --
> Regards,
>  Serge
>
>

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Serge

unread,
Feb 17, 2011, 3:07:42 AM2/17/11
to bugs-bi...@rt.perl.org

Eric Brine via RT

unread,
Feb 17, 2011, 12:35:19 PM2/17/11
to perl5-...@perl.org
Replicated using v5.12.3 built for i686-linux by perlbrew

Requested re debug output:

Compiling REx "^((\w+)%n (?{print defined $2 ? %"\$2=$2\n%" : %"\$2
not d"...
synthetic stclass "ANYOF[0-9A-Z_a-z][{unicode_all}]".
Final program:
1: BOL (2)
2: CURLYX[0] {2,2} (17)
4: OPEN1 (6)
6: OPEN2 (8)
8: PLUS (10)
9: ALNUM (0)
10: CLOSE2 (12)
12: EVAL (14)
14: CLOSE1 (16)
16: WHILEM (0)
17: NOTHING (18)
18: EOL (19)
19: END (0)
floating ""$ at 2..2147483647 (checking floating) stclass
ANYOF[0-9A-Z_a-z][{unicode_all}] anchored(BOL) minlen 2 with eval
Guessing start of match in sv for REx "^((\w+)%n (?{print defined $2 ?
%"\$2=$2\n%" : %"\$2 not d"... against "ab"
Found floating substr ""$ at offset 2...
start_shift: 2 check_at: 2 s: 0 endpos: 1
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx "^((\w+)%n (?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not
d"... against "ab"
0 <> <ab> | 1:BOL(2)
0 <> <ab> | 2:CURLYX[0] {2,2}(17)
0 <> <ab> | 16: WHILEM(0)
whilem: matched 0 out of 2..2
0 <> <ab> | 4: OPEN1(6)
0 <> <ab> | 6: OPEN2(8)
0 <> <ab> | 8: PLUS(10)
ALNUM can match 2 times out of
2147483647...
2 <ab> <> | 10: CLOSE2(12)
2 <ab> <> | 12: EVAL(14)
$2=ab
2 <ab> <> | 14: CLOSE1(16)
2 <ab> <> | 16: WHILEM(0)
whilem: matched 1 out of 2..2
2 <ab> <> | 4: OPEN1(6)
2 <ab> <> | 6: OPEN2(8)
2 <ab> <> | 8: PLUS(10)
ALNUM can match 0 times out of
2147483647...
failed...
failed...
1 <a> <b> | 10: CLOSE2(12)
1 <a> <b> | 12: EVAL(14)
$2 not defined
1 <a> <b> | 14: CLOSE1(16)
1 <a> <b> | 16: WHILEM(0)
whilem: matched 1 out of 2..2
1 <a> <b> | 4: OPEN1(6)
1 <a> <b> | 6: OPEN2(8)
1 <a> <b> | 8: PLUS(10)
ALNUM can match 1 times out of
2147483647...
2 <ab> <> | 10: CLOSE2(12)
2 <ab> <> | 12: EVAL(14)
$2=b
2 <ab> <> | 14: CLOSE1(16)
2 <ab> <> | 16: WHILEM(0)
whilem: matched 2 out of 2..2
2 <ab> <> | 17: NOTHING(18)
2 <ab> <> | 18: EOL(19)
2 <ab> <> | 19: END(0)
Match successful!
Match: $1=b $2=bFreeing REx: "^((\w+)%n (?{print defined $2 ?
%"\$2=$2\n%" : %"\$2 not d"...

Serge

unread,
Feb 17, 2011, 1:26:04 PM2/17/11
to Eric Brine via RT
Hello Eric Brine,

at this time I think, that the output "$2 not defined" is a bug.
Here is the minimal example, which reproduces this error:

'ab' =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})){2}/;

Output:

$2=ab
$2 not defined
$2=b

I think, the output should look

$2=ab
$2=a
$2=b

Probably, the mistake arises owing to localization $2 inside re
($2 and other special re variables have structure like stack, but it is
incorrect ideology inside re).
Immediately before re prints "$2 not defined" we've closed second captured
parentheses, which must capture at least one symbol \w.

--
Regards,
Serge

EBvR> Replicated using v5.12.3 built for i686-linux by perlbrew

EBvR> Requested re debug output:

EBvR> Compiling REx "^((\w+)%n (?{print defined $2 ? %"\$2=$2\n%" : %"\$2
EBvR> not d"...
EBvR> synthetic stclass "ANYOF[0-9A-Z_a-z][{unicode_all}]".
EBvR> Final program:
EBvR> 1: BOL (2)
EBvR> 2: CURLYX[0] {2,2} (17)
EBvR> 4: OPEN1 (6)
EBvR> 6: OPEN2 (8)
EBvR> 8: PLUS (10)
EBvR> 9: ALNUM (0)
EBvR> 10: CLOSE2 (12)
EBvR> 12: EVAL (14)
EBvR> 14: CLOSE1 (16)
EBvR> 16: WHILEM (0)
EBvR> 17: NOTHING (18)
EBvR> 18: EOL (19)
EBvR> 19: END (0)
EBvR> floating ""$ at 2..2147483647 (checking floating) stclass
EBvR> ANYOF[0-9A-Z_a-z][{unicode_all}] anchored(BOL) minlen 2 with eval
EBvR> Guessing start of match in sv for REx "^((\w+)%n (?{print defined $2 ?
EBvR> %"\$2=$2\n%" : %"\$2 not d"... against "ab"
EBvR> Found floating substr ""$ at offset 2...
EBvR> start_shift: 2 check_at: 2 s: 0 endpos: 1
EBvR> Does not contradict STCLASS...
EBvR> Guessed: match at offset 0
EBvR> Matching REx "^((\w+)%n (?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not
EBvR> d"... against "ab"
EBvR> 0 <> <ab> | 1:BOL(2)
EBvR> 0 <> <ab> | 2:CURLYX[0] {2,2}(17)
EBvR> 0 <> <ab> | 16: WHILEM(0)
EBvR> whilem: matched 0 out of 2..2
EBvR> 0 <> <ab> | 4: OPEN1(6)
EBvR> 0 <> <ab> | 6: OPEN2(8)
EBvR> 0 <> <ab> | 8: PLUS(10)
EBvR> ALNUM can match 2 times out of
EBvR> 2147483647...
EBvR> 2 <ab> <> | 10: CLOSE2(12)
EBvR> 2 <ab> <> | 12: EVAL(14)
EBvR> $2=ab
EBvR> 2 <ab> <> | 14: CLOSE1(16)
EBvR> 2 <ab> <> | 16: WHILEM(0)
EBvR> whilem: matched 1 out of 2..2
EBvR> 2 <ab> <> | 4: OPEN1(6)
EBvR> 2 <ab> <> | 6: OPEN2(8)
EBvR> 2 <ab> <> | 8: PLUS(10)
EBvR> ALNUM can match 0 times out of
EBvR> 2147483647...
EBvR> failed...
EBvR> failed...
EBvR> 1 <a> <b> | 10: CLOSE2(12)
EBvR> 1 <a> <b> | 12: EVAL(14)
EBvR> $2 not defined
EBvR> 1 <a> <b> | 14: CLOSE1(16)
EBvR> 1 <a> <b> | 16: WHILEM(0)
EBvR> whilem: matched 1 out of 2..2
EBvR> 1 <a> <b> | 4: OPEN1(6)
EBvR> 1 <a> <b> | 6: OPEN2(8)
EBvR> 1 <a> <b> | 8: PLUS(10)
EBvR> ALNUM can match 1 times out of
EBvR> 2147483647...
EBvR> 2 <ab> <> | 10: CLOSE2(12)
EBvR> 2 <ab> <> | 12: EVAL(14)
EBvR> $2=b
EBvR> 2 <ab> <> | 14: CLOSE1(16)
EBvR> 2 <ab> <> | 16: WHILEM(0)
EBvR> whilem: matched 2 out of 2..2
EBvR> 2 <ab> <> | 17: NOTHING(18)
EBvR> 2 <ab> <> | 18: EOL(19)
EBvR> 2 <ab> <> | 19: END(0)
EBvR> Match successful!
EBvR> Match: $1=b $2=bFreeing REx: "^((\w+)%n (?{print defined $2 ?
EBvR> %"\$2=$2\n%" : %"\$2 not d"...

Serge

unread,
Feb 17, 2011, 3:31:20 AM2/17/11
to bugs-bi...@rt.perl.org
# New Ticket Created by Serge
# Please include the string: [perl #84296]

# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=84296 >


Excuse, working regexp engine in this example is correct...
===========================================================

Hello Perl gurus,

the regex prints $2 undefined:
===
#!perl -w
use strict;
#use re qw(debug);

print "Match: \$1=$1 \$2=$2" if 'ab' =~
/^((\w+)
(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})
){2}$
/x;
===
Output:
===

$2=ab
$2 not defined
$2=b

Serge

unread,
Feb 17, 2011, 3:47:01 AM2/17/11
to perlbug-...@perl.org
Excuse, working regexp engine in this example is correct.
Please close this ticket.
===========================================================

pfpo> Greetings,

pfpo> This message has been automatically generated in response to the
pfpo> creation of a perl bug report regarding:
pfpo> "Is this a bug in the simple Perl regex?".

pfpo> There is no need to reply to this message right now. Your ticket has been
pfpo> assigned an ID of [perl #84296].

pfpo> You can view your ticket at
pfpo> http://rt.perl.org/rt3/Ticket/Display.html?id=84296

pfpo> Within the next 24-72 hours, your message will be posted to the perl developers. Please be patient!

pfpo> Please include the string:

pfpo> [perl #84296]

pfpo> in the subject line of all future correspondence about this issue. To do so,
pfpo> you may reply to this message (please delete unnecessary quotes and text.)

pfpo> Thank you,
pfpo> perlbug-...@perl.org

Eric Brine via RT

unread,
Feb 18, 2011, 3:23:03 PM2/18/11
to perl5-...@perl.org
If there's any doubt as to the presence of a bug, this should clear it up:

$ perl -e'"ab" =~ /((\w+)(?{print defined $^N ? "\$^N=$^N\n" : "\$^N not
defined\n"})){2}/;'
$^N=ab
$^N not defined
$^N=b

Serge

unread,
Feb 19, 2011, 1:07:03 AM2/19/11
to Eric Brine via RT
Sorry, your example shows once again the prospective bug...
Let's present the re

'ab' =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})){2}/;

as

((\w+)(?{print...}))((\w+)(?{print...}))

\w{2} is equivalent to \w\w, right? But we assume that the second copy of the
re produces also the same $1 and $2 (not $3 and $4). Current position in the re
marked with |.

1. First (\w+) captures all the text:
((\w+) | (?{print...}))((\w+)(?{print...}))
$2 receives the value 'ab', eval prints $2=ab.

2. Then we enter second copy of (\w+):
((\w+)(?{print...}))(( | \w+)(?{print...}))
$2 (and also $+, $^N, \2) receives the value undefined.

3. We see that \w not match. We do backtracking:
((\w+ | )(?{print...}))((\w+)(?{print...}))
We enter first copy of (\w+) from right to left, and $2 again receives the value undefined.

4. (\w+) captures the letter a:
((\w+) | (?{print...}))((\w+)(?{print...}))
$2 must receive the value a, but in current version of Perl $2 receives
undefined... Why? Probably, two values of undefined are stored in $2 as in a stack,
then last value is removed from the stack, and $2 again equal undefined?
Here eval must print $2=a.

5. Second copy of (\w+) captures the letter b:
((\w+)(?{print...}))((\w+) | (?{print...}))
Eval prints $2=b. Match successfull.

Do you see any mistake in this reasoning?

--
Regards,
Serge

EBvR> If there's any doubt as to the presence of a bug, this should clear it up:

EBvR> $ perl -e'"ab" =~ /((\w+)(?{print defined $^N ? "\$^N=$^N\n" : "\$^N not
EBvR> defined\n"})){2}/;'
EBvR> $^N=ab
EBvR> $^N not defined
EBvR> $^N=b

Serge

unread,
Feb 19, 2011, 4:08:35 AM2/19/11
to Eric Brine via RT
Sorry, it seems, I've mistaken. Here's my correction to my previous reasoning.

===


Let's present the re

'ab' =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})){2}/;

as

((\w+)(?{print...}))((\w+)(?{print...}))

Is \w{2} equivalent to \w\w, right? But we assume that the second copy of the


re produces also the same $1 and $2 (not $3 and $4). Current position in the re
marked with |.

1. First (\w+) captures all the text:<br>
((\w+) | (?{print...}))((\w+)(?{print...}))<br>


$2 receives the value 'ab', eval prints $2=ab.

2. Then we enter second copy of (\w+):<br>
((\w+)(?{print...}))(( | \w+)(?{print...}))<br>


$2 (and also $+, $^N, \2) receives the value undefined.

3. We see that \w not match. We do backtracking:<br>
((\w+ | )(?{print...}))((\w+)(?{print...}))<br>


We enter first copy of (\w+) from right to left, and $2 again receives the value undefined.

4. \w+ gives back the letter b (but $2 remains undefined, because we did not come left of the opening parenthesis for $2):<br>
(( | \w+(?{print...}))((\w+)(?{print...}))<br>
$2 remains undefined.

4. (\w+) captures none, because we did not come left of the opening parenthesis for $2:<br>
((\w+) | (?{print...}))((\w+)(?{print...}))<br>
$2 remains undefined. Eval prints $2=undefined.

5. Second copy of (\w+) captures the letter b:<br>
((\w+)(?{print...}))((\w+) | (?{print...}))<br>


Eval prints $2=b. Match successfull.

===

Serge

unread,
Feb 19, 2011, 11:12:00 AM2/19/11
to Eric Brine via RT
Sorry for my poor English.
After previous email I've thought once again and now I think that intuitively
$2=undefined should be incorrect, and $2=a correct.
After that I've received an email from guru of regex Jeffrey Friedl (www.regex.info):

===
From: jfr...@regex.info (Jeffrey Friedl)
Reply to: Jeffrey Friedl <jfr...@yahoo.com>

Serge <su...@cronc.com> wrote:
|> The regex


|> 'ab' =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not defined\n"})){2}/;

|> outputs:


|> $2=ab
|> $2 not defined
|> $2=b

|> Why $2 not defined? I think, the regex here must print $2=a. Is it a bug?

Hi Serge,
I've been thinking about this for a while, and as far as I can tell it does seem
to be a bug. By definition, $2 must be defined before the (?{...}) can run.

It's probably a problem with how it backtracks. I'd suggest filing a bug report..

Jeffrey
__________________________________________________________________________
Jeffrey Friedl Kyoto, Japan http://regex.info/blog/
===

Splitting the regex:


((\w+)(?{print...}))((\w+)(?{print...}))

is wrong, really the regex is not split.
After (\w+) captures all the string:
(\w+)) | {2}
we see, that second repetition of \w not match. We do backtracking and enter
second parentheses going from right to left:
(/w | )+
In this case the regex engine (as I think) set $2=undefined, but why? Intuitively
it seems set $2=undefined should do after we leave the open second parenthesis
going from right to left.

--
Regards,
Serge

demerphq

unread,
Feb 26, 2011, 2:49:57 PM2/26/11
to Serge, Eric Brine via RT

Well, I didnt read it carefully, but if you are arguing that we should
do something other than we do then there is no debate.

This *is* a bug. An optimization bug probably. In the PLUS regop. We
can see this by enabling debug, and then using a construct that does
not result in a PLUS regop being generated. The first of the following
two dumps is from the original code, the second is from changing the
\w+ into (?:\w|foo)+, which because it is of variable length will not
result in a PLUS regop, and thus will produce the expected output.


demerphq@gemini:blead:~/old_git_tree/perl$ ./perl -Ilib -Mre=debug
-le'"ab" =~ /((\w+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not
defined\n"})){2}/;'
Compiling REx "((\w+)(?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not defined"...
Final program:
1: CURLYX[0] {2,2} (16)
3: OPEN1 (5)
5: OPEN2 (7)
7: PLUS (9)
8: ALNUM (0)
9: CLOSE2 (11)
11: EVAL (13)
13: CLOSE1 (15)
15: WHILEM (0)
16: NOTHING (17)
17: END (0)
stclass ALNUM minlen 2 with eval
Matching REx "((\w+)(?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not
defined"... against "ab"
Matching stclass ALNUM against "a" (1 bytes)
0 <> <ab> | 1:CURLYX[0] {2,2}(16)
0 <> <ab> | 15: WHILEM(0)


whilem: matched 0 out of 2..2

0 <> <ab> | 3: OPEN1(5)
0 <> <ab> | 5: OPEN2(7)
0 <> <ab> | 7: PLUS(9)


ALNUM can match 2 times out of

2147483647...
2 <ab> <> | 9: CLOSE2(11)
2 <ab> <> | 11: EVAL(13)
$2=ab

2 <ab> <> | 13: CLOSE1(15)
2 <ab> <> | 15: WHILEM(0)


whilem: matched 1 out of 2..2

2 <ab> <> | 3: OPEN1(5)
2 <ab> <> | 5: OPEN2(7)
2 <ab> <> | 7: PLUS(9)


ALNUM can match 0 times out

of 2147483647...
failed...
failed...
1 <a> <b> | 9: CLOSE2(11)
1 <a> <b> | 11: EVAL(13)
$2 not defined

1 <a> <b> | 13: CLOSE1(15)
1 <a> <b> | 15: WHILEM(0)


whilem: matched 1 out of 2..2

1 <a> <b> | 3: OPEN1(5)
1 <a> <b> | 5: OPEN2(7)
1 <a> <b> | 7: PLUS(9)


ALNUM can match 1 times out

of 2147483647...
2 <ab> <> | 9: CLOSE2(11)
2 <ab> <> | 11: EVAL(13)
$2=b

2 <ab> <> | 13: CLOSE1(15)
2 <ab> <> | 15: WHILEM(0)


whilem: matched 2 out of 2..2

2 <ab> <> | 16: NOTHING(17)
2 <ab> <> | 17: END(0)
Match successful!
Freeing REx: "((\w+)(?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not defined"...
demerphq@gemini:blead:~/old_git_tree/perl$ ./perl -Ilib -Mre=debug
-le'"ab" =~ /(((?:\w|foo)+)(?{print defined $2 ? "\$2=$2\n" : "\$2 not
defined\n"})){2}/;'
Compiling REx "(((?:\w|foo)+)(?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not"...
Final program:
1: CURLYX[0] {2,2} (24)
3: OPEN1 (5)
5: OPEN2 (7)
7: CURLYX[0] {1,32767} (16)
9: BRANCH (11)
10: ALNUM (15)
11: BRANCH (FAIL)
12: EXACT <foo> (15)
14: TAIL (15)
15: WHILEM (0)
16: NOTHING (17)
17: CLOSE2 (19)
19: EVAL (21)
21: CLOSE1 (23)
23: WHILEM (0)
24: NOTHING (25)
25: END (0)
minlen 2 with eval
Matching REx "(((?:\w|foo)+)(?{print defined $2 ? %"\$2=$2\n%" : %"\$2
not"... against "ab"
0 <> <ab> | 1:CURLYX[0] {2,2}(24)
0 <> <ab> | 23: WHILEM(0)


whilem: matched 0 out of 2..2

0 <> <ab> | 3: OPEN1(5)
0 <> <ab> | 5: OPEN2(7)
0 <> <ab> | 7: CURLYX[0] {1,32767}(16)
0 <> <ab> | 15: WHILEM(0)
whilem: matched 0 out of 1..32767
0 <> <ab> | 9: BRANCH(11)
0 <> <ab> | 10: ALNUM(15)
1 <a> <b> | 15: WHILEM(0)
whilem: matched 1 out of 1..32767
1 <a> <b> | 9: BRANCH(11)
1 <a> <b> | 10: ALNUM(15)
2 <ab> <> | 15: WHILEM(0)
whilem: matched 2 out
of 1..32767
2 <ab> <> | 9: BRANCH(11)
2 <ab> <> | 10: ALNUM(15)
failed...
2 <ab> <> | 11: BRANCH(14)
2 <ab> <> | 12: EXACT <foo>(15)
failed...
BRANCH failed...
whilem: failed, trying
continuation...
2 <ab> <> | 16: NOTHING(17)
2 <ab> <> | 17: CLOSE2(19)
2 <ab> <> | 19: EVAL(21)
$2=ab

2 <ab> <> | 21: CLOSE1(23)
2 <ab> <> | 23: WHILEM(0)


whilem: matched 1 out of 2..2

2 <ab> <> | 3: OPEN1(5)
2 <ab> <> | 5: OPEN2(7)
2 <ab> <> | 7: CURLYX[0] {1,32767}(16)
2 <ab> <> | 15: WHILEM(0)
whilem: matched
0 out of 1..32767
2 <ab> <> | 9: BRANCH(11)
2 <ab> <> | 10: ALNUM(15)
failed...
2 <ab> <> | 11: BRANCH(14)
2 <ab> <> | 12: EXACT <foo>(15)
failed...
BRANCH failed...
failed...
failed...
failed...
failed...
1 <a> <b> | 11: BRANCH(14)
1 <a> <b> | 12: EXACT <foo>(15)
failed...
BRANCH failed...
whilem: failed, trying
continuation...
1 <a> <b> | 16: NOTHING(17)
1 <a> <b> | 17: CLOSE2(19)
1 <a> <b> | 19: EVAL(21)
$2=a

1 <a> <b> | 21: CLOSE1(23)
1 <a> <b> | 23: WHILEM(0)


whilem: matched 1 out of 2..2

1 <a> <b> | 3: OPEN1(5)
1 <a> <b> | 5: OPEN2(7)
1 <a> <b> | 7: CURLYX[0] {1,32767}(16)
1 <a> <b> | 15: WHILEM(0)
whilem: matched 0
out of 1..32767
1 <a> <b> | 9: BRANCH(11)
1 <a> <b> | 10: ALNUM(15)
2 <ab> <> | 15: WHILEM(0)
whilem: matched
1 out of 1..32767
2 <ab> <> | 9: BRANCH(11)
2 <ab> <> | 10: ALNUM(15)
failed...
2 <ab> <> | 11: BRANCH(14)
2 <ab> <> | 12: EXACT <foo>(15)
failed...
BRANCH failed...
whilem: failed,
trying continuation...
2 <ab> <> | 16: NOTHING(17)
2 <ab> <> | 17: CLOSE2(19)
2 <ab> <> | 19: EVAL(21)
$2=b

2 <ab> <> | 21: CLOSE1(23)
2 <ab> <> | 23: WHILEM(0)


whilem:
matched 2 out of 2..2

2 <ab> <> | 24: NOTHING(25)
2 <ab> <> | 25: END(0)
Match successful!
Freeing REx: "(((?:\w|foo)+)(?{print defined $2 ? %"\$2=$2\n%" : %"\$2 not"...

demerphq

unread,
Mar 1, 2011, 6:58:19 AM3/1/11
to Serge, Eric Brine via RT
FWIW i did some debugging on this over the weekend.

It definitely is a bug, something possibly in the state logic, but i
havent quite worked it out.

Essentially what happens is that when we enter the CLOSE regop after
having advanced the pointer we end up with the start of the capture
buffer being _after_ the end of the capture buffer. IOW, when we
rewind and restart the match at position 1 the buffer still thinks the
start point is position 2 which is where the plus was attempted on the
second pass through the outer loop.

I haven't worked out a fix as I spent more time figuring out gdb than
I did on debugging the actual problem. No doubt there is some easier
to learn GUI wrapper around gdb that makes things easier, but right
now, ill say that the MS VC debugger is about a gazillion times more
useful than gdb is in terms of getting stuff done with minimal-to-no
training. Talk about vim/emacs style learning curves. :-(

cheers,
Yves

yves orton via RT

unread,
Mar 12, 2011, 12:00:23 PM3/12/11
to perl5-...@perl.org, da...@iabyn.com
I just pushed to blead the following commit to add TODO test for this:

d774cd11ba563c66e3199abfc3061bdc88e980e0 Add tests for RT #84294
/((\w+)(?{print $2})){2,2}/ problem

And then another to fix it:

92e82afa16f5f1aa1b3e163f6d4656d14c44a4d2 Fix RT #84294 /((\w+)(?{print
$2})){2,2}/ problem

Dave can you please review the fix? The regex state machine is your
baby, and I'm not entirely confident I'm not creating a memory leak or
something equivalently strange.

Dave Mitchell

unread,
Mar 13, 2011, 10:14:28 AM3/13/11
to yves orton via RT, perl5-...@perl.org

Yeah it looks good. It doesn't leak memory, although it does use more
memory now. Previously, the pattern /A{100,120}/, after doing 120 matches,
would have pushed 100 WHILEM_A_pre's and 20 WHILEM_B_max's onto the
backtrack stack, but only 20 sets of parentheses onto the savestack; now
it it pushes 120 sets. So basically you've removed an optimisation that
was logically incorrect.

--
Spock (or Data) is fired from his high-ranking position for not being able
to understand the most basic nuances of about one in three sentences that
anyone says to him.
-- Things That Never Happen in "Star Trek" #19

0 new messages