Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Perl RE bug with keys(%+)

0 views
Skip to first unread message

Clint O

unread,
Nov 23, 2009, 4:51:33 PM11/23/09
to
Maybe this is a bug, maybe not. I am using the named capture buffers
to reduce bugs as I change grouping of my regular expressions over
time. In a lexical analysis application, I'm using it over a series
of alternations.

my $re = qr/ (?<ALT1>pattern) | (?<ALT2>pattern) | ...

One of the alternations happens to be nested:

my $foo = qr{
(?<CODEBEGIN>
\{
(?<CODE>
(?:
(?> [^{}\n]+ ) # Non-parens without
backtracking
|
(?&CODEBEGIN) # Recurse to start of
pattern
)*
)
\}
)
}x;

However, when I ask for the keys of %+, I only get back CODEBEGIN yet
the CODE capture is there when I ask for it. My hope was to use the
keys to determine what I matched so I didn't have to do a series of
tests on %+, but apparently I will have to continue doing this since
this method won't work.

This is Perl 5.10.0.

Thanks,

-Clint

s...@netherlands.com

unread,
Nov 25, 2009, 4:27:10 PM11/25/09
to
On Mon, 23 Nov 2009 13:51:33 -0800 (PST), Clint O <clint...@gmail.com> wrote:

>Maybe this is a bug, maybe not. I am using the named capture buffers
>to reduce bugs as I change grouping of my regular expressions over
>time. In a lexical analysis application, I'm using it over a series
>of alternations.
>
>my $re = qr/ (?<ALT1>pattern) | (?<ALT2>pattern) | ...
>
>One of the alternations happens to be nested:
>
>my $foo = qr{
> (?<CODEBEGIN>
> \{
> (?<CODE>
> (?:
> (?> [^{}\n]+ ) # Non-parens without

^^
This is not good here, "\n" is never consumed and most likely
the result is a non-match.
This can also be written more effectively as [^{}]++

>backtracking
> |
> (?&CODEBEGIN) # Recurse to start of
>pattern
> )*
> )
> \}
> )
> }x;
>
>However, when I ask for the keys of %+, I only get back CODEBEGIN yet
>the CODE capture is there when I ask for it. My hope was to use the
>keys to determine what I matched so I didn't have to do a series of
>tests on %+, but apparently I will have to continue doing this since
>this method won't work.
>
>This is Perl 5.10.0.
>
>Thanks,
>
>-Clint

You are right, it probably is a bug. However, %+ seems to be private
within recursion the way you have it because acording to the docs
CODEBEGIN can't know about CODE and visa-versa.

That $+{CODE} can be tested and contain a value outside of CODEBEGIN
is a mystery and worrysome. You can of course maintain your own private
hash to store results.

Below, shows this behavior in more detail. Let me know if you find
a satisfactory answer to this.

-sln
---------
use strict;
use warnings;
use Devel::Peek;
use Data::Dumper;

my %CodeAll = ();
my $container = '';

my $string = " func { subfunc { some {code }; more code } {last block}";

my $foo = qr/


(?<CODEBEGIN>
\{
(?<CODE>
(?:

[^{}]++ # Non-parens without backtracking


|
(?&CODEBEGIN) # Recurse to start of pattern
)*
)

(?{ print " * ",Dumper(\%+);
$container = $+{CODE};
})
\}
)
(?{ print ">>* ",Dumper(\%+);
$CodeAll{CODEBEGIN} = $+{CODEBEGIN};
$CodeAll{CODE} = $+{CODE};
})
/x;

print "______________________\n\n";

while ($string =~ /$foo/g)
{
print "\n\n====================\n";
Dump \%+;
print "\n( \%+ )\n",Dumper(\%+);
print "( \%CodeAll )\n",Dumper(\%CodeAll),"\n";
print "______________________\n\n";
}
__END__

Clint O

unread,
Nov 30, 2009, 2:08:29 PM11/30/09
to
On Nov 25, 1:27 pm, s...@netherlands.com wrote:
> This is not good here, "\n" is never consumed and most likely
> the result is a non-match.
> This can also be written more effectively as   [^{}]++

Yes, I ended up simplifying my life and using this before I saw your
post:

my $code = qr{


(?<CODEBEGIN>
\{
(?<CODE>
(?:

(?> [^{}]+ ) # Non-curly without


backtracking
|
(?&CODEBEGIN) # Recurse to start of
pattern
)*
)

\}
)
}x;

Then I go back and split the token on '\\\n' to weed out the escaped
newlines. My hope was to avoid re-scanning any string, but the RE and
concatenation rules just became unmanageable at some point and I
decided to cut my losses. I'm not familiar with the '++', but I will
look that up as an alternative to using (?> ). So far you are the
only person that has responded to this post, so I'm not hopeful that
I'll get a satisfactory answer from anyone as to what's happening
here.

Thanks,

-Clint

0 new messages