Basically I'm writing a sub that wants to take a regular
expression as a parameter. It then blindly operates on data,
matching, and posible substitution.
Apparently qr// will only function on the matching side, something like this:
# works
$rx = qr/\Q$sometext\E/s;
$data =~ /$rx/;
# or $data =~ $rx/
But this:
# does not work, no way no how
$rx = qr{s/\Q$sometext\E/junk/g};
$data =~ $rx;
Even though qr{s/\Q$sometext\E/junk/g} will pass warnings and errors,
even though the substitution is constant (ie, no runtime $1,$2, etc..)
it never matches.
I mean I could see a failure scenario if using $1.. on the substitution side
because it breaks undefined'ness, but if its given a constant it should work IMO.
And if it does compile, like the above does, it should work.
The fall back is to use an eval "" where something like this is possible:
$rx = "s/\\Q$sometext(.*?)\\E/junk\$1/g";
$expression = "\$res = \$data =~ $rx";
eval $expression;
if ($res) {
...
}
But eval is 2 to 4 times slower.
They only thing "dynamic" about the regualar expression above is the case of
substitution of $1.. Surely this could be taken into account when say using
the qr// construct couldn't it? Is it really breaking the rules, or would it
factor down to an eval anyway in that case? But the constant substitution,
I don't see why that can't work.
Is there anyway possible the substitution side will work?
TIA,
sln
The matching is done by the // operator. Not because you happened to use
qr// a bit earlier.
> But this:
>
> # does not work, no way no how
> $rx = qr{s/\Q$sometext\E/junk/g};
> $data =~ $rx;
A bare regex is simply not going to work on the right hand side of a =~
operator. It's the operator on the right hand side that does the
matching, not the =~ operator itself. That only binds an expression
instead of $_ to that matching operator.
More detail:
From perlop:
Binary "=~" binds a scalar expression to a pattern match. Certain
operations search or modify the string $_ by default. This operator
makes that kind of operation work on some other string. The right
argument is a search pattern, substitution, or transliteration.
Note that 'pattern' or 'regular expression' are not part of the allowed
right arguments.
Further down in the same document, under "Quote and Quote-like
Operators":
Customary Generic Meaning Interpolates
'' q{} Literal no
"" qq{} Literal yes
‘‘ qx{} Command yes*
qw{} Word list no
// m{} Pattern match yes*
qr{} Pattern yes*
s{}{} Substitution yes*
tr{}{} Transliteration no (but see below)
<<EOF here-doc yes*
And a little further down again:
Regexp Quote-Like Operators
Here are the quote-like operators that apply to pattern matching and
related activities.
[snip]
Martien
--
|
Martien Verbruggen | Computers in the future may weigh no more
| than 1.5 tons. -- Popular Mechanics, 1949
|
> Basically I'm writing a sub that wants to take a regular
> expression as a parameter. It then blindly operates on data,
> matching, and posible substitution.
>
> Apparently qr// will only function on the matching side, something like this:
"qr" stands for "quote regular expression" and the so called
"matching side" of s/// is the part that is a regular expression.
qr will work fine there.
(the other "side" is the "replacement string", ie. it is not
a regular expression at all.)
> # does not work, no way no how
Of course not. You are trying to quote something that is not
a regular expression.
> $rx = qr{s/\Q$sometext\E/junk/g};
That regular expression will match if the string contains:
an "s" character followed by
a "/" character followed by
the literal contents of $sometext followed by
a "/" character followed by
a "j" character followed by
a "u" character followed by
...
So that will match if:
my $data = "s/$sometext/junk/g";
> $data =~ $rx;
my $rx = qr/\Q$sometext\E/; # quote only the regex part
$data =~ s/$rx/junk/g; # works fine
> And if it does compile, like the above does, it should work.
It does work (but only if $data actually contains the characters listed above).
> Is there anyway possible the substitution side will work?
Yes. See above.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
> # does not work, no way no how
> $rx = qr{s/\Q$sometext\E/junk/g};
> $data =~ $rx;
Looks like you're unintentionally trying to run a regex within the
regex, where the regex within is actually just trying to match a string
(not a functional regex).
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!
>> # does not work, no way no how
>> $rx = qr{s/\Q$sometext\E/junk/g};
>> $data =~ $rx;
>
>Looks like you're unintentionally trying to run a regex within the
>regex, where the regex within is actually just trying to match a string
>(not a functional regex).
(S)he's just trying to "save" a substitution as first-order object,
and (s)he blindily tried some "random" syntax that's not going to work
of course.
Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
>I'm probably going to use some wrong terms here but I
>hope to give enough detail that I can get a definative
>resolution to this, once and for all.
>
>Basically I'm writing a sub that wants to take a regular
>expression as a parameter. It then blindly operates on data,
>matching, and posible substitution.
[cut]
># does not work, no way no how
>$rx = qr{s/\Q$sometext\E/junk/g};
Actually, this comes out oh so often! Others duly explained to you
what's going on. Bottom line is, you *can't* "save" a substitution as
a first order object of the language. The substitution part of a
substitution, though, is "simply" a string: well, either that or code
- if the /e modifier is supplied. In both cases you can *think* of it,
possibly at the expense of a tiny wrapper layer, as a sub. Thus a
solution to your problem, albeit not just as "slim" as you may have
hoped for, may be given in terms of a couple consisting of a regex and
a sub. Sounds reasonable?
>> # does not work, no way no how
>> $rx = qr{s/\Q$sometext\E/junk/g};
>> $data =~ $rx;
>
>A bare regex is simply not going to work on the right hand side of a =~
>operator. It's the operator on the right hand side that does the
>matching, not the =~ operator itself. That only binds an expression
>instead of $_ to that matching operator.
This is simply not true:
$ perl -E '$r=qr/\w+\s(\w+)\s\w+/;
"foo bar baz" =~ $r and say $1'
bar
In fact...
>More detail:
>
>From perlop:
>
> Binary "=~" binds a scalar expression to a pattern match. Certain
> operations search or modify the string $_ by default. This operator
> makes that kind of operation work on some other string. The right
> argument is a search pattern, substitution, or transliteration.
^^^^^^^^^^^^^^
^^^^^^^^^^^^^^
It's simply *ad hoc* in Perl 5.
Thats clear, no suprises then.
Thanks!
sln
>On Mon, 03 Nov 2008 00:24:30 GMT, s...@netherlands.com wrote:
>
>>I'm probably going to use some wrong terms here but I
>>hope to give enough detail that I can get a definative
>>resolution to this, once and for all.
>>
>>Basically I'm writing a sub that wants to take a regular
>>expression as a parameter. It then blindly operates on data,
>>matching, and posible substitution.
>[cut]
>># does not work, no way no how
>>$rx = qr{s/\Q$sometext\E/junk/g};
>
>Actually, this comes out oh so often! Others duly explained to you
>what's going on. Bottom line is, you *can't* "save" a substitution as
>a first order object of the language. The substitution part of a
>substitution, though, is "simply" a string: well, either that or code
>- if the /e modifier is supplied. In both cases you can *think* of it,
>possibly at the expense of a tiny wrapper layer, as a sub. Thus a
>solution to your problem, albeit not just as "slim" as you may have
>hoped for, may be given in terms of a couple consisting of a regex and
>a sub. Sounds reasonable?
>
>
>Michele
No matter how I look at it, the replacement is still a string-
constructed in the scope of the block that invokes regexp engine.
So s/.../$somereplacement$1$2$3/ can be valid.
Or s/.../somesub($1,$2,$3)/e can be valid.
And only qr// can be compiled ahead of =~ if constant, ie: the regular expression.
In this case (s)///(g) or //(g) has no meaning, nor does //(e) I take it,
because the (.) is not part of the regular expression, but some modifiers are like //i
because it acts on the regular expression.
To me then it is a misnomer to call this: 's/$regx/$txt/g' a regular expression since
it can't be known before a scope block that invokes it, but qr// can be.
In my opinion, s///g should be allowed by qr{} using the scoping block it was created
in, and later correctly used (s///g) within the context of a block that invokes the engine.
This may violate 'first-order object' of the language. But then why are code extensions allowed?
qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
if allowed would would internally result in a dynamic code issue like eval.
I don't that this 'code' extension isn't treated as a literal anyway.
I don't know if invoking a 'sub' (/e) is going to be any better than having to
parse through a passed in argument list for the proper form. In all cases, it looks
like the replacement text cannot include special var's unles an eval is used
at runtime.
Can you give an example of your regex and a sub solution?
Thanks.
sln
>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
>in, and later correctly used (s///g) within the context of a block that invokes the engine.
>
>This may violate 'first-order object' of the language. But then why are code extensions allowed?
>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
>if allowed would would internally result in a dynamic code issue like eval.
>I don't that this 'code' extension isn't treated as a literal anyway.
Do not misunderstand me, I'm all with you: would you write a Perl
extension that allows to treat substitutions as first order objects of
the language? I would cherish that... Unfortunately I *for one*
haven't the slightest idea of where one could begin!
In the meanwhile we must be happy with a clumsier solution, like...
>I don't know if invoking a 'sub' (/e) is going to be any better than having to
>parse through a passed in argument list for the proper form. In all cases, it looks
>like the replacement text cannot include special var's unles an eval is used
>at runtime.
>
>Can you give an example of your regex and a sub solution?
... sure:
my %subst = ( regex => qr/.../, code => sub { ... } );
And then you use that to perform the substitution. You may even make
that the core data of a class, thus allowing objects like $subst with
a suitable ->apply($string) method.
>On Mon, 03 Nov 2008 23:01:35 GMT, s...@netherlands.com wrote:
>
>>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
>>in, and later correctly used (s///g) within the context of a block that invokes the engine.
>>
>>This may violate 'first-order object' of the language. But then why are code extensions allowed?
>>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
>>if allowed would would internally result in a dynamic code issue like eval.
>>I don't that this 'code' extension isn't treated as a literal anyway.
>
>Do not misunderstand me, I'm all with you: would you write a Perl
>extension that allows to treat substitutions as first order objects of
>the language? I would cherish that... Unfortunately I *for one*
>haven't the slightest idea of where one could begin!
>
>In the meanwhile we must be happy with a clumsier solution, like...
>
>>I don't know if invoking a 'sub' (/e) is going to be any better than having to
>>parse through a passed in argument list for the proper form. In all cases, it looks
>>like the replacement text cannot include special var's unles an eval is used
>>at runtime.
>>
>>Can you give an example of your regex and a sub solution?
>
>... sure:
>
> my %subst = ( regex => qr/.../, code => sub { ... } );
>
>And then you use that to perform the substitution. You may even make
>that the core data of a class, thus allowing objects like $subst with
>a suitable ->apply($string) method.
>
>
>Michele
I'm in your debt. There is virtually no overhead in calling that
sub for the substitution, and it executes in context. There is no
comparison with eval, this is the way to go for me.
I will, and have already resigned that its the callers responsibility
to ensure proper regexp usage, so/and I am just providing the rope.
In my circumstances, its all about performance. Any added indirection,
calls/assignments, etc.. will mean hazard in my usage. I won't get into
the gory details unless you want to know.
Below, is raw isolated test code, in the case of method 2, no error checking.
I already have an object function that an array of regex/code sub's could be passed to
where it then operates on data highly bound to the object.
Introducing a new object, RegxProc in the simple case below, would aleviate parsing,
but an unknown object type might not be acessable. But would aleviate internal processing.
I could internalize the RegxProc in the existing class, providing a wrapper method I guess
but the caller could not specify search/replace/replace global without additional parameter
parsing.
This is a relief for me though. Thanks alot...
sln
-----------------
use strict;
use warnings;
# method 1
# ------------
# my $data = "This is some data, this gets substituted";
# my $subst = {
# 'regex' => qr/(\whis)/i,
# 'code' => sub { print "$1\n"; return 'That'; }
# };
# $data =~ s/$subst->{'regex'}/ &{$subst->{'code'}}/ge;
# print "$data\n";
# method 2
# -------------
my $data = "This(1) is some data, this(2) gets substituted,
and so does this(3).";
print "\nData = $data\n\n";
my $rxp = new RegxProc (
'regex' => qr/(\whis\(\d\))/si,
'code' => sub { print "\ncode: \$1 = $1\n"; return 'That'; }
);
if ($rxp->search ($data)) {
print "search worked\n";
}
if ($rxp->replace ($data)) {
print "replace worked, data = $data\n";
}
if ($rxp->replace_g ($data)) {
print "global replace worked, data = $data\n";
}
package RegxProc;
use vars qw(@ISA);
@ISA = qw();
sub new
{
my ($class, @args) = @_;
my $self = {};
while (my ($name, $val) = splice (@args, 0, 2)) {
if ('regex' eq lc $name) {
$self->{regex} = $val;
}
elsif ('code' eq lc $name) {
$self->{code} = $val;
}
}
return bless ($self, $class);
}
sub search
{
my $self = shift;
return 0 unless (defined $_[0]);
return $_[0] =~ /$self->{regex}/;
}
sub replace
{
my $self = shift;
return 0 unless (defined $_[0]);
return $_[0] =~ s/$self->{regex}/&{$self->{code}}/e;
}
sub replace_g
{
my $self = shift;
return 0 unless (defined $_[0]);
return $_[0] =~ s/$self->{regex}/&{$self->{code}}/ge;
}
__END__
Data = This(1) is some data, this(2) gets substituted,
and so does this(3).
search worked
code: $1 = This(1)
replace worked, data = That is some data, this(2) gets substituted,
and so does this(3).
code: $1 = this(2)
code: $1 = this(3)
global replace worked, data = That is some data, That gets substituted,
and so does That.
[snip]
>
>This is a relief for me though. Thanks alot...
>
[snip]
>
I settled on this lightweight class that handles the substution with some
variable type's. Still it is with minimal error checking to reduce overhead.
Added a few methods to generalize access, and it benchmarks pretty good.
See any potential problems or performance issues ?
sln
----------------------
use strict;
use warnings;
my $data = "This(1) is some data, this(2) gets substituted,
and so does this(3).";
my $tempdata = $data;
my $rxp = RxP->new (
'regex' => qr/(\whis\(\d\))/si,
'code' => sub { print "code: \$1 = $1\n"; return 'That'; },
'type' => 'r'
);
# test apply, set/get_type methods
if (1)
{
print "\n","-"x20,"\nData = $data\n\n";
$rxp->set_type('s');
if ($rxp->apply ($data)) {
print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
}
$rxp->set_type('r');
if ($rxp->apply ($data)) {
print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
}
$rxp->set_type('g');
if ($rxp->apply ($data)) {
print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
}
}
# test direct call and search, replace, replace_g methods
if (1)
{
$rxp->set_type('r');
$data = $tempdata;
print "\n","-"x20,"\nData = $data\n\n";
if ($rxp->{'dflt_sub'}($rxp, $data)) {
print "Direct {dflt_sub} worked, data = $data\n\n";
}
if ($rxp->search ($data)) {
print "Search worked, data = $data\n\n";
}
if ($rxp->replace ($data)) {
print "Replace worked, data = $data\n\n";
}
if ($rxp->replace_g ($data)) {
print "Global replace worked, data = $data\n\n";
}
}
package RxP;
use vars qw(@ISA);
@ISA = qw();
sub new
{
my ($class, @args) = @_;
my $self = {
'dflt_sub' => \&search,
'type' => 's'
};
while (my ($name, $val) = splice (@args, 0, 2)) {
if ('regex' eq lc $name) {
$self->{'regex'} = $val;
}
elsif ('code' eq lc $name) {
$self->{'code'} = $val;
}
elsif ('type' eq lc $name && $val =~ /(s|r|g)/i) {
set_type ($self, $1);
}
}
return bless ($self, $class);
}
sub get_type
{
return $_[0]->{'type'};
}
sub set_type
{
return 0 unless (defined $_[1]);
if ($_[1] =~ /(s|r|g)/i) {
$_[0]->{'dflt_sub'} = {
's' => \&search,
'r' => \&replace,
'g' => \&replace_g
}->{$1};
$_[0]->{'type'} = $1;
return 1;
}
return 0;
}
sub apply
{
return 0 unless (defined $_[1]);
return &{$_[0]->{'dflt_sub'}};
}
sub search
{
return 0 unless (defined $_[1]);
return $_[1] =~ /$_[0]->{'regex'}/;
}
sub replace
{
return 0 unless (defined $_[1]);
return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/e;
}
sub replace_g
{
return 0 unless (defined $_[1]);
return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/ge;
}
__END__
--------------------
Data = This(1) is some data, this(2) gets substituted,
and so does this(3).
Apply 's' worked, data = This(1) is some data, this(2) gets substituted,
and so does this(3).
code: $1 = This(1)
Apply 'r' worked, data = That is some data, this(2) gets substituted,
and so does this(3).
code: $1 = this(2)
code: $1 = this(3)
Apply 'g' worked, data = That is some data, That gets substituted,
and so does That.
--------------------
Data = This(1) is some data, this(2) gets substituted,
and so does this(3).
code: $1 = This(1)
Direct {dflt_sub} worked, data = That is some data, this(2) gets substituted,
and so does this(3).
Search worked, data = That is some data, this(2) gets substituted,
and so does this(3).
code: $1 = this(2)
Replace worked, data = That is some data, That gets substituted,
and so does this(3).
code: $1 = this(3)
Global replace worked, data = That is some data, That gets substituted,
and so does That.
[snip]
Its better to have the regexp fail for some other reason than
undefined'ness.
Performance benchmarks are very good. Thx...
sub new
{
my ($class, @args) = @_;
my $self = {
'regex' => '',
'code' => '',
'type' => 's',
'dflt_sub' => \&search,
};
while (my ($name, $val) = splice (@args, 0, 2)) {
next if (!defined $val);
>I settled on this lightweight class that handles the substution with some
>variable type's. Still it is with minimal error checking to reduce overhead.
>Added a few methods to generalize access, and it benchmarks pretty good.
>
>See any potential problems or performance issues ?
I don't have time enough to dig through your implementation, but it
seems to me that you set up a fairly complete thingie: now,
performance is not generally a concern of mine. If it is for you, then
just profile your app. For the rest, I can only suggest you to set up
a test suite as well. As far as your implementation complies, you may
consider yourself reasonalby safe, ain't it?
>On Wed, 05 Nov 2008 19:54:14 GMT, s...@netherlands.com wrote:
>
>>I settled on this lightweight class that handles the substution with some
>>variable type's. Still it is with minimal error checking to reduce overhead.
>>Added a few methods to generalize access, and it benchmarks pretty good.
>>
>>See any potential problems or performance issues ?
>
>I don't have time enough to dig through your implementation, but it
>seems to me that you set up a fairly complete thingie: now,
>performance is not generally a concern of mine. If it is for you, then
>just profile your app. For the rest, I can only suggest you to set up
>a test suite as well. As far as your implementation complies, you may
>consider yourself reasonalby safe, ain't it?
>
>
>Michele
Never heard of test suites/cases. On my really big app, I'm making changes
so fast it scares me. I miss a compiler as opposed to a syntax checker.
No, no. Nunit isin't for me. I live on the edge, die on the edge, one
man - one piece of art...
sln
>On Wed, 05 Nov 2008 19:54:14 GMT, s...@netherlands.com wrote:
>
>>I settled on this lightweight class that handles the substution with some
>>variable type's. Still it is with minimal error checking to reduce overhead.
>>Added a few methods to generalize access, and it benchmarks pretty good.
>>
>>See any potential problems or performance issues ?
>
>I don't have time enough to dig through your implementation, but it
>seems to me that you set up a fairly complete thingie: now,
>performance is not generally a concern of mine. If it is for you, then
>just profile your app. For the rest, I can only suggest you to set up
>a test suite as well. As far as your implementation complies, you may
>consider yourself reasonalby safe, ain't it?
>
>
>Michele
I've already integrated this package into my bigger package and have exported
a thin wrapper sub that instantiates objects which are used as a drop in
by the caller, specifically used as a parameter (a ref from NewRxP) that
gets passed to the larger package method. Like a macro almost.
I'm learning the gory details of classes in Perl, something I didn't think
I would need to know beyond casual knowledge. I'm a hard core Windows
MFC C++ programmer, its how I make my living as a contractor.
Periodically, I'm laid off, like now. Perl is like candy to me, sweet to the
tongue, especially regular expressions. Its almost addicting. Unemployment is
running out, nobody is calling, I'm sure I will have to give this up and work
as a brick layer, my long past proffession, again. So, if I dissapear, its
been nice knowing you!
sln
Ran into issues that were fixed. I just want to close this out with
the correct default 'code' sub, changed types, and added 'search_g()' method.
Thanks.
sln
sub NewRxP
{
my ($regex,$code,$type) = @_;
if (defined $code && ref($code) ne 'CODE') {
my $temp = $type;
$type = $code;
$code = $temp;
}
return RxP->new('regex'=>$regex,'code'=>$code,'type'=>$type);
}
# =================
package RxP;
use vars qw(@ISA);
@ISA = qw();
sub new
{
my ($class, @args) = @_;
my $self = {
'regex' => '',
'code' => sub{''},
'type' => 's',
'dflt_sub' => \&search
};
while (my ($name, $val) = splice (@args, 0, 2)) {
next if (!defined $val);
if ('regex' eq lc $name) {
$self->{'regex'} = $val;
}
elsif ('code' eq lc $name && ref($val) eq 'CODE') {
$self->{'code'} = $val;
}
elsif ('type' eq lc $name && $val =~ /(sg|gs|rg|gr|s|r)/i) {
set_type ($self, lc $1);
}
}
return bless ($self, $class);
}
sub get_type
{
return $_[0]->{'type'};
}
sub set_type
{
return 0 unless (defined $_[1]);
if ($_[1] =~ /(sg|gs|rg|gr|s|r)/i) {
$_[0]->{'dflt_sub'} = {
's' => \&search,
'sg' => \&search_g,
'gs' => \&search_g,
'r' => \&replace,
'rg' => \&replace_g,
'gr' => \&replace_g
}->{lc $1};
$_[0]->{'type'} = lc $1;
return 1;
}
return 0;
}
sub apply
{
return 0 unless (defined $_[1]);
return &{$_[0]->{'dflt_sub'}};
}
sub search
{
return 0 unless (defined $_[1]);
return $_[1] =~ /$_[0]->{'regex'}/;
}
sub search_g
{
return 0 unless (defined $_[1]);
return $_[1] =~ /$_[0]->{'regex'}/g;