This is a bug report for perl from perl-...@ton.iguana.be,
generated with the help of perlbug 1.34 running under perl v5.8.0.
-----------------------------------------------------------------
[Please enter your report here]
#! /usr/bin/perl -w
$a="abcdefg";
for (substr($a,0, 4,"")) {
print "$_\n";
$_="12";
print "$_\n";
}
prints the expected:
abcd
12
#! /usr/bin/perl -w
$a="abcdefg";
for (substr($a,0, 4)) {
print "$_\n";
$_="12";
print "$_\n";
}
however prints:
abcd
12ef
Sure, I can see what's going on here from an implementation point of
view, and the substr docs are pretty unspecific on this, so I can't
absolutely claim it as a bug. But it feels wrong to me to be able to
pull in stuff from outside the [0..3] range of the original string
into the substr window. Assigning something to a variable and have it
be different as a result is hardly normal lvalue behaviour.
(what happens to $a is as expected in all cases). I think it should
not only narrow or expand the original string as needed, but also
the range of the substr alias itself.
If the decision is to leave this as is, it would at least like an
update to the substr manpage. Now it just explains the result of a
substr as a plain old lvalue in a way strongly suggesting that what
you do preserves the boundaries.
This lead to a interesting bugs when i was parsing input from a record
based protocol on a string like "AAApadBBBpad" using code roughly doing:
parse("AAApadBBBpad");
sub parse {
....
while ($arg ne "") {
parts(substr(substr($arg, 0, 6, ""), 0, 3));
}
}
sub parts {
....
while ($_[0] ne "") {
process(substr($_[0], 0, 1, ""));
}
}
because I just kept pulling in the padding.
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl v5.8.0:
Configured by ton at Tue Nov 12 01:56:18 CET 2002.
Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=linux, osvers=2.4.19, archname=i686-linux-thread-multi-64int-ld
uname='linux quasar 2.4.19 #5 wed oct 2 02:34:25 cest 2002 i686 unknown '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=define
usemymalloc=y, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -fomit-frame-pointer',
cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lndbm -ldb -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.2.4'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.0:
/usr/lib/perl5/5.8.0/i686-linux-thread-multi-64int-ld
/usr/lib/perl5/5.8.0
/usr/lib/perl5/site_perl/5.8.0/i686-linux-thread-multi-64int-ld
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
.
---
Environment for perl v5.8.0:
HOME=/home/ton
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/local/jre/bin:/home/oracle/product/9.0.1/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:.
PERL_BADLANG (unset)
SHELL=/bin/bash
Yes, I'd like to see this addressed for 5.10.0. Issues need to be worked out for how it should work with negative offset or length.
> If the decision is to leave this as is, it would at least like an
> update to the substr manpage. Now it just explains the result of a
> substr as a plain old lvalue in a way strongly suggesting that what
> you do preserves the boundaries.
I used my time machine. Rather than document the existing behaviour
that I want to change, I just warned against trusting it. I think it
is in 5.8.1.
> sub parse {
> ....
> while ($arg ne "") {
> parts(substr(substr($arg, 0, 6, ""), 0, 3));
> }
> }
You can use foo(scalar substr(...)) to suppress creating an lvalue.
If you can think of a good place to document this, I'll try to
write up something (or feel free to do it yourself).
Why would negative offsets be an issue ? When the LV is created any
negative offset is resolved and the LV is created with an absolute
offset and a length.
From what I can see what should be needed is for assignment to an LV to
change LvTARGLEN of the LV to be the length of the value assigned.
Graham.
It gets worse
#! /usr/bin/perl -w
$a="abcdefg";
for (substr($a,0, 4)) {
print "a=$a\n";
print "$_\n";
$_="12";
print "$_\n";
print "a=$a\n";
$_=".";
print "$_\n";
print "a=$a\n";
}
prints:
a=abcdefg
abcd
12ef
a=12efg
.g
a=.g
So multiple assigns to the LV with strings that are shorter than the
original LV length will result in the string being nibbled away. Below
is a patch that makes it output what I think is expected
a=abcdefg
abcd
12
a=12efg
.
a=.efg
All tests still pass, so if there are no objections to this patch I
shall add some tests to t/op/substr.t to test for this specifically
--- mg.c.orig Wed Oct 29 08:28:52 2003
+++ mg.c Wed Oct 29 09:20:31 2003
@@ -1744,16 +1744,20 @@
sv_utf8_upgrade(lsv);
sv_pos_u2b(lsv, &lvoff, &lvlen);
sv_insert(lsv, lvoff, lvlen, tmps, len);
+ LvTARGLEN(sv) = sv_len_utf8(sv);
SvUTF8_on(lsv);
}
else if (lsv && SvUTF8(lsv)) {
sv_pos_u2b(lsv, &lvoff, &lvlen);
+ LvTARGLEN(sv) = len;
tmps = (char*)bytes_to_utf8((U8*)tmps, &len);
sv_insert(lsv, lvoff, lvlen, tmps, len);
Safefree(tmps);
}
- else
- sv_insert(lsv, lvoff, lvlen, tmps, len);
+ else {
+ sv_insert(lsv, lvoff, lvlen, tmps, len);
+ LvTARGLEN(sv) = len;
+ }
return 0;
}
Graham.
Depends on why this actually works, which is currently unclear to me.
It doesn't work if you try to do 'scalar' when the context can't reach
the substr anymore, or try to use a scalar prototype on the call:
perl -le '
sub foo($) {
for (scalar shift) {
print $_="q"
}
}
foo(substr($a="abcdefgh", 0, 3))
'
qde
So is this special compile time magic ? If it's purely depending on the
scalar context of substr, why doesn't it kick in for a scalar prototype ?
(not that I would want it to kick in in that case, that would be wrong)
Under what circumstances exactly *does* it kick in ?
Sorry, now I see you have parts modifying $_[0], so scalar isn't helpful.
By visual inspection, looks good. At least one person claimed to be
using the previous "fixed window" functionality, so I would hesitate
to put this in maint.
That is how it is now, yes. But I can see a case for storing the negative
values instead for situations like this:
perl -wle'$x = "abcdef"; for (substr($x,-4,-1)) { chop$x; chop$x; print $_ }'
Whether this goes into maint is upto Nick. But I have not seen any
claim of people using the fixed window. Can you point us in the right
direction.
Graham.
A decision can wait until (at least) 5.8.3, I feel.
I'd second the request for pointers to people using the existing fixed
window behaviour.
Nicholas Clark
The reference I'd seen was in the posts by tlhf in this thread:
http://perlmonks.com/index.pl?node_id=191334
I'm not completely clear that he/she actually understands the current
behavior, and there have been two or three other bug reports I've seen
that Graham's patch would address (#16834, #24069, and I vaguely recall
another).
Thanks, applied to bleed as change #22414.
Are you still intending to write those tests? ;-)
>
> --- mg.c.orig Wed Oct 29 08:28:52 2003
> +++ mg.c Wed Oct 29 09:20:31 2003
> @@ -1744,16 +1744,20 @@
> sv_utf8_upgrade(lsv);
> sv_pos_u2b(lsv, &lvoff, &lvlen);
> sv_insert(lsv, lvoff, lvlen, tmps, len);
> + LvTARGLEN(sv) = sv_len_utf8(sv);
> SvUTF8_on(lsv);
> }
> else if (lsv && SvUTF8(lsv)) {
> sv_pos_u2b(lsv, &lvoff, &lvlen);
> + LvTARGLEN(sv) = len;
> tmps = (char*)bytes_to_utf8((U8*)tmps, &len);
> sv_insert(lsv, lvoff, lvlen, tmps, len);
> Safefree(tmps);
> }
> - else
> - sv_insert(lsv, lvoff, lvlen, tmps, len);
> + else {
> + sv_insert(lsv, lvoff, lvlen, tmps, len);
> + LvTARGLEN(sv) = len;
> + }
>
> return 0;
> }
>
> Graham.
--
Technology is dominated by two types of people: those who understand what
they do not manage, and those who manage what they do not understand.
I tried to get some input on this at:
http://perlmonks.org/index.pl?node_id=306449
without much success, other than Abigail arguing strongly for backward
compatibility even for misfeatures. At the moment, I'm inclined to
think Graham's patch should be applied to blead but not maint. It
needs documentation, also.
So bugs that have a usable side-effect shouldn't be fixed ?
I kinda liked the s///e bug. Can I keep it ? :-)
From my point of view:
- It caused a hard to track down intermittent problem in real code
on the current perl. I'm not convinced not applying the patch keeps
more programs working than it causes to stop working.
- The effect it had after applying once could somewhat be defended,
but the repeated effect Graham Barr found makes this an all out bug
to my mind.
- I wasn't planning to go with 5.9 and later for the moment (I hate the
fact that all my "use fields" based classes will stop working), and
I'd like to be able to pass the result of a substr() as a function
argument (which is the case where I ran into the bug).
How about the following:
--- perlfunc.pod- Mon Mar 1 23:41:25 2004
+++ perlfunc.pod Mon Mar 1 23:58:25 2004
@@ -5578,15 +5578,21 @@
parts of the EXPR and return what was there before in one operation,
just as you can with splice().
-If the lvalue returned by substr is used after the EXPR is changed in
-any way, the behaviour may not be as expected and is subject to change.
-This caveat includes code such as C<print(substr($foo,$a,$b)=$bar)> or
-C<(substr($foo,$a,$b)=$bar)=$fud> (where $foo is changed via the
-substring assignment, and then the substr is used again), or where a
-substr() is aliased via a C<foreach> loop or passed as a parameter or
-a reference to it is taken and then the alias, parameter, or deref'd
-reference either is used after the original EXPR has been changed or
-is assigned to and then used a second time.
+Note that the lvalue returned by by the 3-arg version of substr() acts as
+a 'magic bullet'; each time it is assigned to, it remembers which part
+of the original string is being modifed; for example:
+
+ $x = '1234';
+ for (substr($x,1,2)) {
+ $_ = 'a'; print $x,"\n"; # prints 1a4
+ $_ = 'xyz'; print $x,"\n"; # prints 1xyz4
+ $x = '56789';
+ $_ = 'pq'; print $x,"\n"; # prints 5pq9
+ }
+
+
+Prior to Perl version 5.9.1, the result of using an lvalue multiple times was
+unspecified.
=item symlink OLDFILE,NEWFILE
--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
-- Things That Never Happen in "Star Trek" #7
Shouldn't that last one be C< # prints 5pq89 >?
Hugo
Thank you, that makes it perfectly clear to me. I'd suggest a similar
amplification be included in the docpatch.
:So the question is which functionality do we want, this or the ability
:to modify outside the original window ? It would be possible todo both,
:but frankly, is it worth the extra overhead that would be required.
Now that I understand the behaviour being described it seems perfectly
reasonable.
Hugo
I can see why you say that, but thats not what happens. The reason is
that the length of the LVALUE changes to be the length of the last
assignment to it, this is how modification outside its original window
is prevented.
As last assignment via the LVALUE was 3 characters long this assignment
replaces three characters, even though the underlying SV has changed.
So the question is which functionality do we want, this or the ability
to modify outside the original window ? It would be possible todo both,
but frankly, is it worth the extra overhead that would be required.
Graham.
It's a bug, right?
And we fix bugs in maintenance releases?
Nicholas Clark
What makes something a bug? It's non-intuitive but consistent, and
has been reported as a bug several times.
What do you think:
$x = "abc";
for (substr($x, 1, 1)) {
print $_; # "b"
$_ = "bb";
print $_; # NEW: "bb" OLD: "b"
}
$x = "abc";
for (substr($x, -2, -1)) {
print $_; # b
$_ = "bcd";
print $_; # NEW: "bcd" OLD: "b"
}
It causes action at a distance if you pass a substr() result
as an argument.
It makes lvalues get a different value as what you assigned
to it.
#! /usr/bin/perl -wl
# Supposes this is in some module by author X
sub process {
print $& while $_[0] =~ s/.//;
}
# suppose user Y tries to use the module like this:
$a="aBCDef";
process(substr($a, 1, 3));
It was perfectly reasonable for the author of "process"
to write a sub that "consumes" it's argument.
It was perfectly reasonable for the caller of process
to expect 3 chars to get processed.
But actually the whole string gets consumed, it eats away
OUTSIDE the substr bounds.
Leaving this as it was basically means you can never
pass the result of a substr to a sub for modification
unless you KNOW the sub does its work in ONE step.
I think the old effect on ONE assign is somewhat
defendable, but the fact that repeated short assigns
pull in more and more of the original string makes it
an outright bug.
Not trying to argue that it's not a bug; but you can say:
process(scalar substr($a, 1, 3));
to prevent this. Should this be documented? Why does it work?
yst> On Tue, Mar 09, 2004 at 10:46:51AM +0000, Ton Hospel
yst> <perl5-...@ton.iguana.be> wrote:
th> It causes action at a distance if you pass a substr() result as an
th> argument. It makes lvalues get a different value as what you assigned
th> to it.
[snippage by /sb]
th> But actually the whole string gets consumed, it eats away OUTSIDE the
th> substr bounds.
yst> Not trying to argue that it's not a bug; but you can say:
yst> process(scalar substr($a, 1, 3));
yst> to prevent this. Should this be documented? Why does it work?
Well, as the person most responsible for the bug, I suppose I should chime
in here. First, I agree that it's a bug. It's one of a couple which I'd
meant to revisit, then never got the time because of changes in my
circumstances. The assignments through an LV-substr should adjust the
'margins' to account for changes in the 'window'. [The other has already
been addressed, so far as I can tell, in that the LV now starts with its
fetched contents when passed -- as it should.]
As to why C<scalar()> avoids the issue, it's because in the cited code
above you're now ref'ing an OP_SCALAR instead of an OP_SUBSTR -- and the
scalar is not passing ref-ness down to the substr. Thus, it's a
better-optimized form of C< '' . substr($a, 1, 3) >. In other words, it's
an expression instead of a direct reference. The modifiable value passed
to the sub is a temporary rather than an LV-substr. [Or maybe it's
readonly? I haven't checked exactly what's passed, only that it's not
been made an LV.]
That's why/how it works. I don't think it should be documented, really,
because I think it's wrong. IMnsHO, the ref-ness of scalar should be
passed down to certain LV-able children OPs, such as substr and (probably)
keys.
When these manipulations first got done, C< \keys %h > was a one-shot LV,
in that it disassociated itself from %h after the first assignment. This
was because LVs weren't yet ref-counting their targets. They now do, and
if LV-keys is still a one-shot, that's another bug that should get fixed.
Hope this helps,
--s.
Since scalar is just a directive to change context, I don't see why it
should change *anything* else. So the LVALUE creation code (wherever
it is) should skip OP_SCALAR. IMO.
--
Chip Salzenberg - a.k.a. - <ch...@pobox.com>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K
Looks to me like scalar passes ref-ness down:
Perl_ref(pTHX_ OP *o, I32 type)
{
...
case OP_SCALAR:
case OP_NULL:
if (!(o->op_flags & OPf_KIDS))
break;
ref(cBINOPo->op_first, type);
Looks like it's Perl_mod not passing lvalueness down.
> Thus, it's a
> better-optimized form of C< '' . substr($a, 1, 3) >. In other words, it's
> an expression instead of a direct reference. The modifiable value passed
> to the sub is a temporary rather than an LV-substr. [Or maybe it's
> readonly? I haven't checked exactly what's passed, only that it's not
> been made an LV.]
>
> That's why/how it works. I don't think it should be documented, really,
> because I think it's wrong. IMnsHO, the ref-ness of scalar should be
> passed down to certain LV-able children OPs, such as substr and (probably)
> keys.
I kind of like having an operator that turns off lvalueness. Especially
one with no runtime impact.
> When these manipulations first got done, C< \keys %h > was a one-shot LV,
> in that it disassociated itself from %h after the first assignment. This
> was because LVs weren't yet ref-counting their targets. They now do, and
> if LV-keys is still a one-shot, that's another bug that should get fixed.
I don't seem to be able to get an lvalue with \keys %h. The \ forces
list context. Doing it this way seems to show it working, unless I
misunderstand what you are questioning:
$ perl -wle'sub foo ($) { print $_[0]; %h = 0..999; print $_[0] } foo(keys %h)'
0
500
What do you think about ties on lvalues:
http://rt.perl.org/rt3/Ticket/Display.html?id=27010
Are you saying Perl_mod should or should not recurse on an OP_SCALAR's
child?
I'm saying it should. Not with great conviction, though.
yst> On Tue, Mar 09, 2004 at 11:16:02AM -0500, Spider Boardman
yst> <spi...@leggy.zk3.dec.com> wrote:
sb> As to why C<scalar()> avoids the issue, it's because in the cited code
sb> above you're now ref'ing an OP_SCALAR instead of an OP_SUBSTR -- and
sb> the scalar is not passing ref-ness down to the substr.
yst> Looks to me like scalar passes ref-ness down:
yst> Looks like it's Perl_mod not passing lvalueness down.
Yes, I mis-remembered where in op.c I was making changes back when. It is
indeed mod-ness, not ref-ness, that's the issue.
sb> Thus, it's a better-optimized form of C< '' . substr($a, 1, 3) >.
sb> That's why/how it works. I don't think it should be documented,
sb> really, because I think it's wrong. IMnsHO, the ref-ness of scalar
sb> should be passed down to certain LV-able children OPs, such as substr
sb> and (probably) keys.
yst> I kind of like having an operator that turns off lvalueness.
yst> Especially one with no runtime impact.
While I can understand that, I still think Chip's explanation of his
agreement was spot on. It really only ought to affect context, not
lvalueness. Also, even though this gets into "in a perfect world" type of
discussions, you seem only to really care about eliminating lvalueness
because it's not working properly. If it worked, would you still care?
In any case, it only sometimes stops lvalueness in general, as distinct
from LV-ness, as this shows:
$ perl -le 'sub a($){$_[0]x=2} $a="a";a scalar $a;print $a'
aa
sb> When these manipulations first got done, C< \keys %h > was a one-shot
sb> LV, in that it disassociated itself from %h after the first
sb> assignment. This was because LVs weren't yet ref-counting their
sb> targets. They now do, and if LV-keys is still a one-shot, that's
sb> another bug that should get fixed.
yst> I don't seem to be able to get an lvalue with \keys %h. The \ forces
yst> list context. Doing it this way seems to show it working, unless I
yst> misunderstand what you are questioning:
You did misunderstand my sloppy explanation. However, the behaviour in
question has since been fixed, as this shows:
$ perl -le '%a=(a=>1);$a=\(keys %a=42);$$a=63;$$a=65;print scalar %a'
1/128
yst> What do you think about ties on lvalues:
yst> http://rt.perl.org/rt3/Ticket/Display.html?id=27010
I think that attempts to stack various types of assignment-intercepting
magic in perl5 expose the lairs of dragons. That's without having had the
time to follow the reference, or otherwise to refresh myself on that
thread. I hope to find the time to re-read it, and thus to be able to
make more meaningful comments, but I won't promise anything, given my
current schedule.
--s.
It wouldn't "consume" the substr anymore, which supposedly was
the point of "process".
I could argue this is a bug in "scalar" in fact...
Thanks, applied as #22488.