Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Dynamic regexp

2 views
Skip to first unread message

Winston Smith

unread,
Nov 13, 2003, 11:54:23 PM11/13/03
to
Hi everybody,

I'm looking for a way to make a batch of s/// substitutions. As a code
sample is worth a thousand words, let see what me code is presently :

---

my @rules = (
['^HELLO (.*)$', 'BONJOUR $1'],
# ... lots of other rules
);

foreach my $rule (@rules) {
if ($string =~ s/$rule->[0]/$rule->[1]/ei) {
last;
}
}

---

With this code, 'HELLO WINSTON' becomes 'BONJOUR $1' and not 'BONJOUR
WINSTON' as I'd like.

I tried to put double quotes in the table of strings but it makes no
difference.

I also tried to put a second e option to the s/// operator so that the
string 'BONJOUR $1' is reinterpolated. Then I get the message 'Use of
uninitialized value in substitution iterator ...' as if $1 is not
defined. But if I put a
print $1;
instruction just before the instruction
last;
it works and actually print 'WINSTON' if I use the same exemple than
previously.

Thank you in advance for your help.

Eric Joanis

unread,
Nov 14, 2003, 5:28:07 AM11/14/03
to
Dear Winston,

Winston Smith <winsto...@linuxmail.org> wrote:
>I'm looking for a way to make a batch of s/// substitutions. As a code
>sample is worth a thousand words, let see what me code is presently :
>

> my @rules = ( ['^HELLO (.*)$', 'BONJOUR $1'], ... );


> foreach my $rule (@rules) {
> if ($string =~ s/$rule->[0]/$rule->[1]/ei) {

The problem with this code is that $rule->[1] is only interpolated once
and thus the "$1" it contains is not itself interpolated to the contents of the
$1 variable. To fix this, you need to get Perl to evaluation the replacement
string twice. After some trial and error, I found that this works:

if ($string =~ s/$rule->[0]/eval qq("$rule->[1]") /ei) {

If $rule->[1] is 'BONJOUR $1', then qq("$rule->[1]") yields "BONJOUR $1".
When this is evaluated again using eval, $1 is interpolate as you want it
to be.

Note that
eval '"' . $rule->[1] . '"'
would be equivalent to
eval qq("$rule->[1]")

Warning: I expect this code to be fairly slow, because Perl has to
recompile the expression at every iteration. I'd be happy to see a more
elegant solution to force perl to perform interpolation twice on a string,
if anyone has one, but I couldn't come up with one myself.

Cheers,

Eric

Matthias Weckman

unread,
Nov 14, 2003, 6:00:54 AM11/14/03
to
Eric Joanis wrote:

> Dear Winston,
>
> Winston Smith <winsto...@linuxmail.org> wrote:
>
>>I'm looking for a way to make a batch of s/// substitutions. As a code
>>sample is worth a thousand words, let see what me code is presently :
>>
>>my @rules = ( ['^HELLO (.*)$', 'BONJOUR $1'], ... );
>>foreach my $rule (@rules) {
>> if ($string =~ s/$rule->[0]/$rule->[1]/ei) {
>
>
> The problem with this code is that $rule->[1] is only interpolated once
> and thus the "$1" it contains is not itself interpolated to the contents of the
> $1 variable. To fix this, you need to get Perl to evaluation the replacement
> string twice. After some trial and error, I found that this works:
>
> if ($string =~ s/$rule->[0]/eval qq("$rule->[1]") /ei) {
>

wouldn't doubling the e at the end work as well? like so:

if ($string =~ s/$rule->[0]/$rule->[1]/eei)
-----^^

Matthias Weckman

unread,
Nov 14, 2003, 7:13:03 AM11/14/03
to
> wouldn't doubling the e at the end work as well? like so:
>
> if ($string =~ s/$rule->[0]/$rule->[1]/eei)
> -----^^

argh. I guess I'll have to teach Thunderbird about tabs. :-(

I hope this comes out as intended (and indented:-):

Anno Siegel

unread,
Nov 14, 2003, 7:15:34 AM11/14/03
to
Matthias Weckman <matthias...@hotmail.com> wrote in comp.lang.perl.misc:

It works, but the Perl code /e has to execute is "BONJOUR $1", that is
"BONJOUR" is interpolated as a bareword. That only works because the
implicit "eval" runs without strictures. When using /e, the right hand
side should be an executable Perl expression. Make the rule

[ '^HELLO (.*)$', '"BONJOUR $1"']

Now the right side is an eval-able string interpolation.

Anno

Winston Smith

unread,
Nov 14, 2003, 10:28:28 AM11/14/03
to
Hi,

> if ($string =~ s/$rule->[0]/eval qq("$rule->[1]") /ei)

> ...


> I expect this code to be fairly slow

> [ '^HELLO (.*)$', '"BONJOUR $1"']
> ...


> if ($string =~ s/$rule->[0]/$rule->[1]/eei)

I tried both solutions and they work fine. Now I'm bothered just by
Eric's remark that his solution is probably slow. Do you feel the second
solution is faster or is it the same? For the moment I have only a few
rules so I can't tell the difference but this might get important as I
expect to have more than a hundred rules at the end of the project!

Anyway thank you a lot for your help.

Roman Khutkyy

unread,
Nov 14, 2003, 11:21:09 AM11/14/03
to
It's faster, no doubt. Try to check by yourself.

"Winston Smith" <winsto...@linuxmail.org> wrote in message
news:Bw6tb.15522$IK2.1...@news20.bellglobal.com...

Steve Grazzini

unread,
Nov 14, 2003, 11:40:15 AM11/14/03
to
Winston Smith <winsto...@linuxmail.org> wrote:
>> if ($string =~ s/$rule->[0]/eval qq("$rule->[1]") /ei)

>> [ '^HELLO (.*)$', '"BONJOUR $1"']
>> ...
>> if ($string =~ s/$rule->[0]/$rule->[1]/eei)
>
> I tried both solutions and they work fine. Now I'm bothered just by
> Eric's remark that his solution is probably slow. Do you feel the second
> solution is faster or is it the same?

There won't be a significant difference.

Using qr// and getting rid of the string eval() would probably speed it
up a bit -- but you'll want to Benchmark it with real rules/data.

@rules = (
[qr/^HELLO (.*)$/ => sub { "BONJOUR $1" }],
# ...
);

foreach $rule (@rules) {
last if $string =~ s/$rule->[0]/$rule->[1]()/ei;
}

--
Steve

Anno Siegel

unread,
Nov 14, 2003, 11:51:09 AM11/14/03
to
Winston Smith <winsto...@linuxmail.org> wrote in comp.lang.perl.misc:

For a large-scale project I'd consider the templating modules on CPAN.
They do that kind of thing routinely.

Applying many regexes to all lines of a file is frequently a performance
problem. "perldoc -q many" brings up a pertinent FAQ reply. It doesn't
deal with replacement directly, but it can be used to preselect those
replacements that must be performed on a line.

Anno

Klaus Johannes Rusch

unread,
Nov 14, 2003, 11:12:39 AM11/14/03
to
Winston Smith wrote:

> I'm looking for a way to make a batch of s/// substitutions. As a code
> sample is worth a thousand words, let see what me code is presently :
>
> ---
>
> my @rules = (
> ['^HELLO (.*)$', 'BONJOUR $1'],
> # ... lots of other rules
> );
>
> foreach my $rule (@rules) {
> if ($string =~ s/$rule->[0]/$rule->[1]/ei) {
> last;
> }
> }
>
> ---
>
> With this code, 'HELLO WINSTON' becomes 'BONJOUR $1' and not 'BONJOUR
> WINSTON' as I'd like.

my @rules = (
['^HELLO (.*)$', '"BONJOUR $1"'],
);


foreach my $rule (@rules) {
if ($string =~ s/$rule->[0]/eval($rule->[1])/ei) {
last;
}
}


--
Klaus Johannes Rusch
Klaus...@atmedia.net
http://www.atmedia.net/KlausRusch/

Greg Bacon

unread,
Nov 14, 2003, 8:54:42 AM11/14/03
to
[Newsgroups field trimmed to remove comp.lang.perl.]

In article <2eZsb.44803$xI2.9...@news20.bellglobal.com>,
Winston Smith <winsto...@linuxmail.org> wrote:

: my @rules = (


: ['^HELLO (.*)$', 'BONJOUR $1'],
: # ... lots of other rules
: );
:
: foreach my $rule (@rules) {
: if ($string =~ s/$rule->[0]/$rule->[1]/ei) {
: last;
: }
: }
:
: ---
:
: With this code, 'HELLO WINSTON' becomes 'BONJOUR $1' and not 'BONJOUR
: WINSTON' as I'd like.

It requires a little chicanery because you can't use the FAQ answer
for "How can I expand variables in text strings?" -- you'd zap the
value of $1 you want to interpolate. Note how I had to modify your
rule and make s/// do a double-eval:

$ cat try
#! /usr/local/bin/perl

use warnings;
use strict;

my @rules = (
['^HELLO (.*)$', 'qq{BONJOUR $1}'],


# ... lots of other rules
);

my $string = "HELLO WINSTON";

print "before: [$string]\n";

foreach my $rule (@rules) {
if ($string =~ s/$rule->[0]/$rule->[1]/eei) {
last;
}
}

print "after: [$string]\n";

$ ./try
before: [HELLO WINSTON]
after: [BONJOUR WINSTON]

Hope this helps,
Greg
--
This is the great illusion of our age, the idea that a certain class of
people [i.e., government] is exempt from the moral judgments that apply
to the rest of us.
-- Gene Callahan

Greg Bacon

unread,
Nov 14, 2003, 10:50:48 AM11/14/03
to
In article <2eZsb.44803$xI2.9...@news20.bellglobal.com>,
Winston Smith <winsto...@linuxmail.org> wrote:

: my @rules = (


: ['^HELLO (.*)$', 'BONJOUR $1'],
: # ... lots of other rules
: );
:
: foreach my $rule (@rules) {
: if ($string =~ s/$rule->[0]/$rule->[1]/ei) {
: last;

: }
: }

Here's a cleaner version than that in my other followup:

#! /usr/local/bin/perl

use warnings;
use strict;

my @rules = (


['^HELLO (.*)$', 'BONJOUR $1'],
# ... lots of other rules
);

my $string = "HELLO WINSTON";

print "before: [$string]\n";

foreach my $rule (@rules) {
if ($string =~ s/$rule->[0]/'qq{' . $rule->[1] . '}'/eei) {
last;
}
}

print "after: [$string]\n";

Hope this helps,
Greg
--
Sufficiently advanced political correctness is indistinguishable
from irony.
-- unknown

Malcolm Dew-Jones

unread,
Nov 14, 2003, 2:50:14 PM11/14/03
to
Winston Smith (winsto...@linuxmail.org) wrote:
: Hi everybody,

: I'm looking for a way to make a batch of s/// substitutions. As a code
: sample is worth a thousand words, let see what me code is presently :

: ---

: my @rules = (
: ['^HELLO (.*)$', 'BONJOUR $1'],
: # ... lots of other rules
: );

: foreach my $rule (@rules) {
: if ($string =~ s/$rule->[0]/$rule->[1]/ei) {

Two ways to do this (well there's more than two but this is enough)

1
($string =~ s/$rule->[0]/"\"$rule->[1]\""/eei)
^^^ ^^^ ^^

OR

2

'"BONJOUR $1"']
^ ^

($string =~ s/$rule->[0]/$rule->[1]/eei)
^

Brian McCauley

unread,
Nov 14, 2003, 3:03:52 PM11/14/03
to
gba...@hiwaay.net (Greg Bacon) writes:

> It requires a little chicanery because you can't use the FAQ answer
> for "How can I expand variables in text strings?"

This, of course, is because the FAQ answer is $EXPLETIVE!

If the FAQ gave the true answer - rather than pretenting that a
different question was asked then you could use the FAQ answer.

I've tried several times to get the FAQ amended but the maintainers
are more concerned that the FAQ should not expose readers to
potentially dangerous techniques than that they actually answer the
questions honestly.

The honest answer to "How can I expand variables in text strings?" is:

chop( $string = eval "<<__EOS__\n$string\n__EOS__\n" );

There are very good reasons why often the above is a bad idea. (Let's
not discuss them here - we all know what they are).

However there is no good reason not to mention it in the FAQ. When
someone wants to learn how to fell trees you tell them about chainsaws
and you tell them about the dangers of chainsaws. You don't just tell
them some much less effective but less dangerous way and hope that
they won't discover chainsaws. Of course they will discover
chainsaws and then:

1) they'll not have had any training in their safe use.
2) they'll never trust you again as a mentor.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Brian McCauley

unread,
Nov 14, 2003, 3:13:11 PM11/14/03
to
gba...@hiwaay.net (Greg Bacon) writes:

> my @rules = (
> ['^HELLO (.*)$', 'BONJOUR $1'],

> );

It is better to use the natural representation of things.

The 1st element of each rule is natuarally regex.

The 2nd element is natuarally code.

So the natural way to express this is:

my @rules = (
[ qr/^HELLO (.*)$/i, sub { "BONJOUR $1" } ],
);


> if ($string =~ s/$rule->[0]/'qq{' . $rule->[1] . '}'/eei) {

Using the natural representation this becomes:

> if ($string =~ s/$rule->[0]/$rule->[1]->()/e) {

Of course you could argue the the natural representation of the whole
rule is simply CODE.

my @rules = (
sub{ s/^HELLO (.*)$/BONJOUR $1/i },
);

for ( $string ) {
foreach my $rule (@rules) {
if ( &$rule ) {
last;

0 new messages