I wrote a patch to scratch an itch I've had over the years. The patch
adds a new flag to s/// ("f") to make it functional instead of destructive.
So you can do this:
$a = s/aaa/bbb/f;
Which makes s/// return the replacement instead of changing $_ (or whatever
=~ bound it to).
In the past I've tried to encapsulate this in some sort of library but
the different flags on the s/// make the interface really ugly. This patch
keeps the beauty, terseness, and functionality of the the original s///
syntax and because it's a new flag it shouldn't affect old code.
There is no documentation changed in the patch because I wanted to see if
there was any interest first.
So, I'm looking for comments and hopefully an indication of what it takes
to get this into perl. Be gentle, it's my first foray into the perl
internals and so I've probably done something brain-dead :-).
-David
+1 for the concept.
> So, I'm looking for comments and hopefully an indication of what it
>takes to get this into perl.
Unfortunately you just missed the feature cutoff for 5.12, so this is a
bad time for a feature patch. But basically the process is what you've
just done: mailing a patch to perl5-porters. The patch would have to
include documentation to be fully acceptable.
-zefram
It would also probably want tests that include some more advanced/insane
regex features and tests that "prove" what it should or shouldn't be
doing with captures.
>
> -zefram
>
--
> [I wrote a patch to add] a new flag to s/// ("f") to make it functional
> instead of destructive.
> In the past I've tried to encapsulate this in some sort of library but the
> different flags on the s/// make the interface really ugly.
>
use List::MoreUtils qw( apply );
my $bar = apply { s/// } $foo;
use Algorithm::Loops qw( Filter );
my $bar = Filter { s/// } $foo;
Both work on lists as well (like map).
> I wrote a patch to scratch an itch I've had over the years. The patch adds a new flag to s/// ("f") to make it functional instead of destructive. So you can do this:
>
> $a = s/aaa/bbb/f;
So instead of
($a = $b) =~ s/aaa/bbb/;
We can now do:
$a = $b =~ s/aaa/bbb/f;
Not obviously a big win for me; but there are probably other use cases where this is more useful. For instance:
print HTML "<pre>" . $text =~ s/&/&/fg =~ s/</</fg . "</pre>";
This is kind of cool; but it's not really obvious to me which substitution would happen first here. It's quite unreadable as well :-(
Another one is:
@foo = map s/foo/bar/f, @bar;
which I often encounter in some form and I always found harder than it ought to be in perl.
--Gisle
--Gisle
Eeek, please no. The precedence of the binding operator is tricky
enough as is without making it subject to change based on one of those
embarassingly placed postfix flags.
If you want a function suitable for use in C<map> that invokes s///,
please just write one
sub substf($$$;$){
my ($val, $pat, $repl, $flags) = @_;
(eval "sub {\$[0]=~s/\$_[1]/\$_[2]/$flags; shift}" ) -> ($val,
$pat, $repl)
$val;
}
or a set of them each with different flags
sub substf_with_flags::AUTOLOAD{
... left as an exercise ...
}
after which, you could write
@foo = map subst_with_flags::gi('foo','bar'), @bar;
--
"In the case of an infinite collection, the question of the existence
of a choice function is problematic"
And m//f could return $&. (Got to have some meaning for m//f).
Abigail
> there are probably other use cases where this is more useful. For
> instance:
>
> print HTML "<pre>" . $text =~ s/&/&/fg =~ s/</</fg . "</pre>";
>
> This is kind of cool; but it's not really obvious to me which substitution
> would happen first here. It's quite unreadable as well :-(
>
=~ is left associative like most ops*, so left to right. And you know how to
make it clear: Add parens.
* -- The assignment ops and exponentiation are right-associative.
Yeah. It's more obvious when I think about it a bit more. Since that has really never been a reason to chain =~ before I didn't have to really care about the associativeness of '=~' while it's very clear to be how '=' works; and they do look similar.
How about:
$a = $b !~ s/aaa/bbb/f;
Would that construct ever be useful?
--Gisle
> On Mon, Nov 23, 2009 at 1:40 PM, Gisle Aas <gi...@activestate.com> wrote:
>> Another thought; if we do this for s/// then we should do it for tr/// as well.
>>
>> --Gisle
>
> Eeek, please no. The precedence of the binding operator is tricky
> enough as is without making it subject to change based on one of those
> embarassingly placed postfix flags.
I do agree that there is a mismatch between the significance of /f flag and how easy it is to spot a single letter flag at the end of the expression.
> If you want a function suitable for use in C<map> that invokes s///,
> please just write one
The point is that I do _not_ want to write a function :-)
--Gisle
then upload it to CPAN.
although doing it right might imply some prototyping syntax for
binding-op quote preferences... okay I'll go away, please continue
adding bizarre new stuff.
so the /f modifier means the return value is no longer a boolean of if
it matched or not, but the thing that would have normally been
assigned to the bound l-value, which is unchanged, so you can use
subst on read-only strings without copying them to temps first.
Right?
> How about:
>
> $a = $b !~ s/aaa/bbb/f;
>
> Would that construct ever be useful?
>
[ Gisle: My previous message sent itself while I was typing it up. Ignore it
]
That gets compiled as
$a = !( $b =~ s/aaa/bbb/f );
You could do
$num = <>;
if ( $num !~ s/\n//f ) { ... }
as an alternative to
chomp( $num = <> );
if ( !$num ) { ... }
but I wouldn't recommend it.
> And m//f could return $&. (Got to have some meaning for m//f).
I hereby dub this the “MoFo” operator.
;-p
Best,
David
> On Mon, Nov 23, 2009 at 1:40 PM, Gisle Aas <gi...@activestate.com> wrote:
>> Another thought; if we do this for s/// then we should do it for tr///
>> as well.
>>
>> --Gisle
>
> Eeek, please no. The precedence of the binding operator is tricky
> enough as is without making it subject to change based on one of those
> embarassingly placed postfix flags.
The patch doesn't change the precedence of the binding operator... It
already has "the correct" precedence.
-David
I didn't include any tests for captures because the patch didn't touch
anything that would affect captures. It basically copies the target scalar
before anything is done and operates on the copy instead of the original.
Then it changes the return value to return the target scalar instead of the
number of matches. All the middle subst regexp stuff is untouched.
I'm completely open to writing more/better tests. I don't see how testing
the captures adds anything, though. I'm happy to be enilightened,
preferably with one of the lighter weight clue sticks.
-David
> On Nov 22, 2009, at 23:37 , David Caldwell wrote:
>
>> I wrote a patch to scratch an itch I've had over the years. The patch
>> adds a new flag to s/// ("f") to make it functional instead of
>> destructive. So you can do this:
>
> $a = $b =~ s/aaa/bbb/f;
>
> Not obviously a big win for me; but there are probably other use cases
> where this is more useful.
I typically find myself wanting it in function calls, or big map
expressions where creating a new variable becomes unsightly.
some_function($a, $b,"string", $c =~ s/x/y/f);
> For instance:
>
> print HTML "<pre>" . $text =~ s/&/&/fg =~ s/</</fg . "</pre>";
>
> This is kind of cool; but it's not really obvious to me which
> substitution would happen first here. It's quite unreadable as well :-(
That is genious, I hadn't thought of chaining them. I don't know--I think
it's not bad, readability-wise, especially if you consider =~ to be like a
shell pipe. Lining it up vertically and aligning the =~ might make it look
nicer too, especially if there were more chained together.
I added that as a test because I wasn't sure how the bind operator would
bind in that case, but it did "the right thing":
$a = 'david';
$b = $a =~ s/david/sucks/f =~ s/sucks/rules/f;
ok( $a eq 'david' && $b eq 'rules' );
-David
for that, you want HTML::Entities::Interpolate from CPAN, which lets
you spell that
print HTML "<pre>$Entitize{$text}</pre>";
> That is genious, I hadn't thought of chaining them.
so the /f flag turns a substitution expression $target =~ s/pat/repl/xyz
into
do { (my $tmp = $target) =~ s/pat/repl/xyz; $tmp }
do blocks are unmodifiable even when they return lvalues (at least in 5.8.8)
should do blocks be usable as lvalues?
* David Caldwell <da...@porkrind.org> [2009-11-23 17:20]:
> I wrote a patch to scratch an itch I've had over the years.
> The patch adds a new flag to s/// ("f") to make it functional
> instead of destructive.
may I make just one stroke of paint on that shed, please? I’d
like the flag to be something else, and more prominent and
different. I am thinking either /C (for “copy”) or maybe /R (for
“return”).
> So you can do this:
>
> $a = s/aaa/bbb/f;
>
> Which makes s/// return the replacement instead of changing $_
> (or whatever =~ bound it to).
Thanks and ++ for this. It’s something I’ve wanted for as along
as I’ve been writing Perl.
(To the others who’ve replied with “just write a function” or
“here, use one of these modules”: sure, but the resulting code
is just ugly. Not very ugly, but enough to be off-putting.)
Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>
I really like this, I'd love to see it in perl.
I'll try to make time to look through the patch itself in the next couple
of days.
Hugo
perhaps s///f should be spelled
m///
since the match doesnt change the string,
it makes more mnemonic sense than substitute /// fake
1 while ( $buf =~ m/ ... / process_matches($1,$2,$3) /eg ) ;
I wonder whether the tokenizer / parser could be readily
adapted to this ?
Consider:
($foo) = m/pattern/m; # Note the /m;
Is that modifying $_, returning the modified $_ in $foo, or just matching
against $_, setting $foo to $1?
Abigail
On Thu, November 26, 2009 3:28 am, Jim Cromie wrote:
> On Tue, Nov 24, 2009 at 3:37 PM, <h...@crypt.org> wrote:
>> David Caldwell <da...@porkrind.org> wrote:
>> : �I wrote a patch to scratch an itch I've had over the years. The patch
>> :adds a new flag to s/// ("f") to make it functional instead of
>> destructive.
>
> perhaps s///f should be spelled
> m///
>
> since the match doesnt change the string,
> it makes more mnemonic sense than substitute /// fake
A couple of weeks ago, this is exactly what I was trying to convince the
guys at work what I needed Perl to do in this situation. Instead, they
called me a heathen and chased me out the door with pitchforks. I'm glad
other people see the need in this one.
s///f looked like exactly what I wanted, but kinda ugly. m/// just makes
sense!
Alfie
If we're going to bike-shed it, I vote to add a new *intro* character
instead of "m" and "s".
my $foo = $bar =~ f/one/two/;
-- David
Both propositions have large parsing and compatibility problems.
First proposal: it will be difficult for the tokenizer to distinguish
between m// and m/// ; in the general case it will need too much
lookahead. (Especially considering the /x flag and arbitrary delimiters)
Second proposal: if we introduce a new lead, like f///, that means that
we define a new keyword : suddendly things like f() will fail to
compile, because perl will expect f()() constructs.
--
The hypothesis of a lone inventor - an infinite Leibniz laboring away darkly
and modestly - has been unanimously discounted. -- Borges
Right. But what I think is important is to make sure that there isn't any
part of the behaviour that *relies* on this implementation. Because it would
be useful in future to have the possibility to optimise the implementation.
Right now there is an explicit, up-front copy, which then gets shuffled around
in-place every time a length-changing substitution is made. It would be nice
to be able to change to having the matching part of the regexp engine walk
the original (read-only), only copying chunks across that are unchanged, and
writing substitutions out directly. But we'd only have the freedom to do this
if we have all the corner cases covered now, to make sure that there isn't
a way to "see through" this and spot the difference, at the point when /f
is first introduced.
However, I can't (yet) find a way to "spot" this, as it seems that
1: the variable that one is matching isn't "changed" until the end of the
match, so that actions such as C<length> on it don't alter midway:
$ ./perl -lw
$_ = 'abcbcdcde';
s/c/2 ** length $_/ge;
print $_;
__END__
ab512b512d512de
2: variables such as C<$'>, which *would* differ depending on the
implementation above, can still be given the existing behaviour, by making
them (continue to) track the remainder of the original.
$ ./perl -lw
$_ = 'abcbcdcde';
s/c/2 ** length $'/ge;
print $_;
__END__
ab64b16d4de
Nicholas Clark
We don't want to lose our s/// roots, though.
For maximum clarity, I suggest U+017F:
http://www.fileformat.info/info/unicode/char/017f/index.htm
hdp.
I thought the general "we can't add new keywords" problem had been
addressed some time ago (but I can't remember the details).
Tim.
> Second proposal: if we introduce a new lead, like f///, that means that
> we define a new keyword : suddendly things like f() will fail to
> compile, because perl will expect f()() constructs.
>
use feature, plus a "rarer" keyword than "f"?