[perl #24346] pulling in stuff from outside the substr lvalue window

perlbug-...@perl.org

unread,

Oct 28, 2003, 9:06:12 PM10/28/03

to bugs-bi...@netlabs.develooper.com

# New Ticket Created by perl-...@ton.iguana.be
# Please include the string: [perl #24346]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=24346 >

This is a bug report for perl from perl-...@ton.iguana.be,
generated with the help of perlbug 1.34 running under perl v5.8.0.

-----------------------------------------------------------------
[Please enter your report here]

#! /usr/bin/perl -w
$a="abcdefg";
for (substr($a,0, 4,"")) {
print "$_\n";
$_="12";
print "$_\n";
}

prints the expected:
abcd
12

#! /usr/bin/perl -w
$a="abcdefg";
for (substr($a,0, 4)) {
print "$_\n";
$_="12";
print "$_\n";
}

however prints:
abcd
12ef

Sure, I can see what's going on here from an implementation point of
view, and the substr docs are pretty unspecific on this, so I can't
absolutely claim it as a bug. But it feels wrong to me to be able to
pull in stuff from outside the [0..3] range of the original string
into the substr window. Assigning something to a variable and have it
be different as a result is hardly normal lvalue behaviour.
(what happens to $a is as expected in all cases). I think it should
not only narrow or expand the original string as needed, but also
the range of the substr alias itself.

If the decision is to leave this as is, it would at least like an
update to the substr manpage. Now it just explains the result of a
substr as a plain old lvalue in a way strongly suggesting that what
you do preserves the boundaries.

This lead to a interesting bugs when i was parsing input from a record
based protocol on a string like "AAApadBBBpad" using code roughly doing:

parse("AAApadBBBpad");

sub parse {
....
while ($arg ne "") {
parts(substr(substr($arg, 0, 6, ""), 0, 3));
}
}

sub parts {
....
while ($_[0] ne "") {
process(substr($_[0], 0, 1, ""));
}
}

because I just kept pulling in the padding.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl v5.8.0:

Configured by ton at Tue Nov 12 01:56:18 CET 2002.

Summary of my perl5 (revision 5.0 version 8 subversion 0) configuration:
Platform:
osname=linux, osvers=2.4.19, archname=i686-linux-thread-multi-64int-ld
uname='linux quasar 2.4.19 #5 wed oct 2 02:34:25 cest 2002 i686 unknown '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=define
usemymalloc=y, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -fomit-frame-pointer',
cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='2.95.3 20010315 (release)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lndbm -ldb -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
perllibs=-lnsl -ldl -lm -lpthread -lc -lposix -lcrypt -lutil
libc=/lib/libc-2.2.4.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.2.4'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

---
@INC for perl v5.8.0:
/usr/lib/perl5/5.8.0/i686-linux-thread-multi-64int-ld
/usr/lib/perl5/5.8.0
/usr/lib/perl5/site_perl/5.8.0/i686-linux-thread-multi-64int-ld
/usr/lib/perl5/site_perl/5.8.0
/usr/lib/perl5/site_perl
.

---
Environment for perl v5.8.0:
HOME=/home/ton
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/ton/bin.Linux:/home/ton/bin:/home/ton/bin.SampleSetup:/usr/local/bin:/usr/local/sbin:/usr/local/jre/bin:/home/oracle/product/9.0.1/bin:/usr/local/ar/bin:/usr/games/bin:/usr/X11R6/bin:/usr/share/bin:/usr/bin:/usr/sbin:/bin:/sbin:.
PERL_BADLANG (unset)
SHELL=/bin/bash

Yitzchak Scott-Thoennes

unread,

Oct 29, 2003, 12:55:42 AM10/29/03

to perl5-...@perl.org

On Wed, Oct 29, 2003 at 02:06:12AM -0000, "perl-...@ton.iguana.be (via RT)" <perlbug-...@perl.org> wrote:
> Sure, I can see what's going on here from an implementation point of
> view, and the substr docs are pretty unspecific on this, so I can't
> absolutely claim it as a bug. But it feels wrong to me to be able to
> pull in stuff from outside the [0..3] range of the original string
> into the substr window. Assigning something to a variable and have it
> be different as a result is hardly normal lvalue behaviour.
> (what happens to $a is as expected in all cases). I think it should
> not only narrow or expand the original string as needed, but also
> the range of the substr alias itself.

Yes, I'd like to see this addressed for 5.10.0. Issues need to be worked out for how it should work with negative offset or length.

> If the decision is to leave this as is, it would at least like an
> update to the substr manpage. Now it just explains the result of a
> substr as a plain old lvalue in a way strongly suggesting that what
> you do preserves the boundaries.

I used my time machine. Rather than document the existing behaviour
that I want to change, I just warned against trusting it. I think it
is in 5.8.1.

> sub parse {
> ....
> while ($arg ne "") {
> parts(substr(substr($arg, 0, 6, ""), 0, 3));
> }
> }

You can use foo(scalar substr(...)) to suppress creating an lvalue.
If you can think of a good place to document this, I'll try to
write up something (or feel free to do it yourself).

Graham Barr

unread,

Oct 29, 2003, 3:08:37 AM10/29/03

to Yitzchak Scott-Thoennes, perl5-...@perl.org

On Oct 29, 2003, at 5:55, Yitzchak Scott-Thoennes wrote:
> On Wed, Oct 29, 2003 at 02:06:12AM -0000, "perl-...@ton.iguana.be
> (via RT)" <perlbug-...@perl.org> wrote:
>> Sure, I can see what's going on here from an implementation point of
>> view, and the substr docs are pretty unspecific on this, so I can't
>> absolutely claim it as a bug. But it feels wrong to me to be able to
>> pull in stuff from outside the [0..3] range of the original string
>> into the substr window. Assigning something to a variable and have it
>> be different as a result is hardly normal lvalue behaviour.
>> (what happens to $a is as expected in all cases). I think it should
>> not only narrow or expand the original string as needed, but also
>> the range of the substr alias itself.
>
> Yes, I'd like to see this addressed for 5.10.0. Issues need to be
> worked out for how it should work with negative offset or length.

Why would negative offsets be an issue ? When the LV is created any
negative offset is resolved and the LV is created with an absolute
offset and a length.

From what I can see what should be needed is for assignment to an LV to
change LvTARGLEN of the LV to be the length of the value assigned.

Graham.

Graham Barr

unread,

Oct 29, 2003, 4:56:53 AM10/29/03

to perl5-...@perl.org

On Oct 29, 2003, at 2:06, perl-...@ton.iguana.be (via RT) wrote:
> #! /usr/bin/perl -w
> $a="abcdefg";
> for (substr($a,0, 4)) {
> print "$_\n";
> $_="12";
> print "$_\n";
> }
>
> however prints:
> abcd
> 12ef

It gets worse

#! /usr/bin/perl -w
$a="abcdefg";
for (substr($a,0, 4)) {

print "a=$a\n";

print "$_\n";
$_="12";
print "$_\n";

print "a=$a\n";
$_=".";
print "$_\n";
print "a=$a\n";
}

prints:

a=abcdefg
abcd
12ef
a=12efg
.g
a=.g

So multiple assigns to the LV with strings that are shorter than the
original LV length will result in the string being nibbled away. Below
is a patch that makes it output what I think is expected

a=abcdefg
abcd
12
a=12efg
.
a=.efg

All tests still pass, so if there are no objections to this patch I
shall add some tests to t/op/substr.t to test for this specifically

--- mg.c.orig Wed Oct 29 08:28:52 2003
+++ mg.c Wed Oct 29 09:20:31 2003
@@ -1744,16 +1744,20 @@
sv_utf8_upgrade(lsv);
sv_pos_u2b(lsv, &lvoff, &lvlen);
sv_insert(lsv, lvoff, lvlen, tmps, len);
+ LvTARGLEN(sv) = sv_len_utf8(sv);
SvUTF8_on(lsv);
}
else if (lsv && SvUTF8(lsv)) {
sv_pos_u2b(lsv, &lvoff, &lvlen);
+ LvTARGLEN(sv) = len;
tmps = (char*)bytes_to_utf8((U8*)tmps, &len);
sv_insert(lsv, lvoff, lvlen, tmps, len);
Safefree(tmps);
}
- else
- sv_insert(lsv, lvoff, lvlen, tmps, len);
+ else {
+ sv_insert(lsv, lvoff, lvlen, tmps, len);
+ LvTARGLEN(sv) = len;
+ }

return 0;
}

Graham.

Ton Hospel

unread,

Oct 29, 2003, 6:40:02 AM10/29/03

to perl5-...@perl.org

In article <20031029055542.GA2612@e_n.org>,

Yitzchak Scott-Thoennes <stho...@efn.org> writes:
> I used my time machine. Rather than document the existing behaviour
> that I want to change, I just warned against trusting it. I think it
> is in 5.8.1.
>

>> sub parse {
>> ....
>> while ($arg ne "") {
>> parts(substr(substr($arg, 0, 6, ""), 0, 3));
>> }
>> }
>

> You can use foo(scalar substr(...)) to suppress creating an lvalue.
> If you can think of a good place to document this, I'll try to
> write up something (or feel free to do it yourself).

Depends on why this actually works, which is currently unclear to me.
It doesn't work if you try to do 'scalar' when the context can't reach
the substr anymore, or try to use a scalar prototype on the call:

perl -le '
sub foo($) {
for (scalar shift) {
print $_="q"
}
}
foo(substr($a="abcdefgh", 0, 3))
'

qde

So is this special compile time magic ? If it's purely depending on the
scalar context of substr, why doesn't it kick in for a scalar prototype ?
(not that I would want it to kick in in that case, that would be wrong)
Under what circumstances exactly *does* it kick in ?

Yitzchak Scott-Thoennes

unread,

Oct 29, 2003, 12:05:49 PM10/29/03

to perl5-...@perl.org

On Tue, Oct 28, 2003 at 09:55:42PM -0800, Yitzchak Scott-Thoennes <stho...@efn.org> wrote:
> > sub parse {
> > ....
> > while ($arg ne "") {
> > parts(substr(substr($arg, 0, 6, ""), 0, 3));
> > }
> > }
>
> You can use foo(scalar substr(...)) to suppress creating an lvalue.
> If you can think of a good place to document this, I'll try to
> write up something (or feel free to do it yourself).

Sorry, now I see you have parts modifying $_[0], so scalar isn't helpful.

Yitzchak Scott-Thoennes

unread,

Oct 29, 2003, 1:58:36 PM10/29/03

to Graham Barr, perl5-...@perl.org

On Wed, Oct 29, 2003 at 09:56:53AM +0000, Graham Barr <gb...@pobox.com> wrote:
> All tests still pass, so if there are no objections to this patch I
> shall add some tests to t/op/substr.t to test for this specifically

By visual inspection, looks good. At least one person claimed to be
using the previous "fixed window" functionality, so I would hesitate
to put this in maint.

Yitzchak Scott-Thoennes

unread,

Oct 29, 2003, 1:58:58 PM10/29/03

to Graham Barr, perl5-...@perl.org

On Wed, Oct 29, 2003 at 08:08:37AM +0000, Graham Barr <gb...@pobox.com> wrote:
> On Oct 29, 2003, at 5:55, Yitzchak Scott-Thoennes wrote:
> >On Wed, Oct 29, 2003 at 02:06:12AM -0000, "perl-...@ton.iguana.be
> >(via RT)" <perlbug-...@perl.org> wrote:
> >>Sure, I can see what's going on here from an implementation point of
> >>view, and the substr docs are pretty unspecific on this, so I can't
> >>absolutely claim it as a bug. But it feels wrong to me to be able to
> >>pull in stuff from outside the [0..3] range of the original string
> >>into the substr window. Assigning something to a variable and have it
> >>be different as a result is hardly normal lvalue behaviour.
> >>(what happens to $a is as expected in all cases). I think it should
> >>not only narrow or expand the original string as needed, but also
> >>the range of the substr alias itself.
> >
> >Yes, I'd like to see this addressed for 5.10.0. Issues need to be
> >worked out for how it should work with negative offset or length.
>
> Why would negative offsets be an issue ? When the LV is created any
> negative offset is resolved and the LV is created with an absolute
> offset and a length.

That is how it is now, yes. But I can see a case for storing the negative
values instead for situations like this:

perl -wle'$x = "abcdef"; for (substr($x,-4,-1)) { chop$x; chop$x; print $_ }'

Graham Barr

unread,

Oct 29, 2003, 2:49:44 PM10/29/03

to Yitzchak Scott-Thoennes, perl5-...@perl.org

Whether this goes into maint is upto Nick. But I have not seen any
claim of people using the fixed window. Can you point us in the right
direction.

Graham.

Nicholas Clark

unread,

Oct 29, 2003, 4:54:03 PM10/29/03

to Graham Barr, Yitzchak Scott-Thoennes, perl5-...@perl.org

A decision can wait until (at least) 5.8.3, I feel.
I'd second the request for pointers to people using the existing fixed
window behaviour.

Nicholas Clark

Ton Hospel

unread,

Oct 29, 2003, 6:26:45 PM10/29/03

to perl5-...@perl.org

In article <20031029170549.GB3540@e_n.org>,

Yitzchak Scott-Thoennes <stho...@efn.org> writes:
>> You can use foo(scalar substr(...)) to suppress creating an lvalue.
>> If you can think of a good place to document this, I'll try to
>> write up something (or feel free to do it yourself).
>

> Sorry, now I see you have parts modifying $_[0], so scalar isn't helpful.

Actually, scalar applied to the substr extracting the record with
padding worked perfectly, thanks for the tip. But it was all getting
just too magical, so I introduced a temporary variable in my real code.

Yitzchak Scott-Thoennes

unread,

Oct 30, 2003, 10:36:44 AM10/30/03

to Nicholas Clark, Graham Barr, perl5-...@perl.org

The reference I'd seen was in the posts by tlhf in this thread:
http://perlmonks.com/index.pl?node_id=191334

I'm not completely clear that he/she actually understands the current
behavior, and there have been two or three other bug reports I've seen
that Graham's patch would address (#16834, #24069, and I vaguely recall
another).

Dave Mitchell

unread,

Feb 29, 2004, 11:43:01 AM2/29/04

to Graham Barr, perl5-...@perl.org

Thanks, applied to bleed as change #22414.
Are you still intending to write those tests? ;-)

>
> --- mg.c.orig Wed Oct 29 08:28:52 2003
> +++ mg.c Wed Oct 29 09:20:31 2003
> @@ -1744,16 +1744,20 @@
> sv_utf8_upgrade(lsv);
> sv_pos_u2b(lsv, &lvoff, &lvlen);
> sv_insert(lsv, lvoff, lvlen, tmps, len);
> + LvTARGLEN(sv) = sv_len_utf8(sv);
> SvUTF8_on(lsv);
> }
> else if (lsv && SvUTF8(lsv)) {
> sv_pos_u2b(lsv, &lvoff, &lvlen);
> + LvTARGLEN(sv) = len;
> tmps = (char*)bytes_to_utf8((U8*)tmps, &len);
> sv_insert(lsv, lvoff, lvlen, tmps, len);
> Safefree(tmps);
> }
> - else
> - sv_insert(lsv, lvoff, lvlen, tmps, len);
> + else {
> + sv_insert(lsv, lvoff, lvlen, tmps, len);
> + LvTARGLEN(sv) = len;
> + }
>
> return 0;
> }
>
> Graham.

--
Technology is dominated by two types of people: those who understand what
they do not manage, and those who manage what they do not understand.

Yitzchak Scott-Thoennes

unread,

Feb 29, 2004, 3:07:02 PM2/29/04

to perl5-...@perl.org

On Sun, Feb 29, 2004 at 04:43:01PM +0000, Dave Mitchell <da...@fdisolutions.com> wrote:
> Thanks, applied to bleed as change #22414.
> Are you still intending to write those tests? ;-)

I tried to get some input on this at:

http://perlmonks.org/index.pl?node_id=306449

without much success, other than Abigail arguing strongly for backward
compatibility even for misfeatures. At the moment, I'm inclined to
think Graham's patch should be applied to blead but not maint. It
needs documentation, also.

Ton Hospel

unread,

Feb 29, 2004, 10:20:43 PM2/29/04

to perl5-...@perl.org

In article <20040229200702.GA204@e_n.org>,

Yitzchak Scott-Thoennes <stho...@efn.org> writes:
> I tried to get some input on this at:
>
> http://perlmonks.org/index.pl?node_id=306449
>
> without much success, other than Abigail arguing strongly for backward
> compatibility even for misfeatures. At the moment, I'm inclined to
> think Graham's patch should be applied to blead but not maint. It
> needs documentation, also.

So bugs that have a usable side-effect shouldn't be fixed ?
I kinda liked the s///e bug. Can I keep it ? :-)

From my point of view:

- It caused a hard to track down intermittent problem in real code
on the current perl. I'm not convinced not applying the patch keeps
more programs working than it causes to stop working.
- The effect it had after applying once could somewhat be defended,
but the repeated effect Graham Barr found makes this an all out bug
to my mind.
- I wasn't planning to go with 5.9 and later for the moment (I hate the
fact that all my "use fields" based classes will stop working), and
I'd like to be able to pass the result of a substr() as a function
argument (which is the case where I ran into the bug).

Dave Mitchell

unread,

Mar 1, 2004, 6:59:21 PM3/1/04

to Yitzchak Scott-Thoennes, perl5-...@perl.org

How about the following:

--- perlfunc.pod- Mon Mar 1 23:41:25 2004
+++ perlfunc.pod Mon Mar 1 23:58:25 2004
@@ -5578,15 +5578,21 @@
parts of the EXPR and return what was there before in one operation,
just as you can with splice().

-If the lvalue returned by substr is used after the EXPR is changed in
-any way, the behaviour may not be as expected and is subject to change.
-This caveat includes code such as C<print(substr($foo,$a,$b)=$bar)> or
-C<(substr($foo,$a,$b)=$bar)=$fud> (where $foo is changed via the
-substring assignment, and then the substr is used again), or where a
-substr() is aliased via a C<foreach> loop or passed as a parameter or
-a reference to it is taken and then the alias, parameter, or deref'd
-reference either is used after the original EXPR has been changed or
-is assigned to and then used a second time.
+Note that the lvalue returned by by the 3-arg version of substr() acts as
+a 'magic bullet'; each time it is assigned to, it remembers which part
+of the original string is being modifed; for example:
+
+ $x = '1234';
+ for (substr($x,1,2)) {
+ $_ = 'a'; print $x,"\n"; # prints 1a4
+ $_ = 'xyz'; print $x,"\n"; # prints 1xyz4
+ $x = '56789';
+ $_ = 'pq'; print $x,"\n"; # prints 5pq9
+ }
+
+
+Prior to Perl version 5.9.1, the result of using an lvalue multiple times was
+unspecified.

=item symlink OLDFILE,NEWFILE

--
The Enterprise successfully ferries an alien VIP from one place to another
without serious incident.
-- Things That Never Happen in "Star Trek" #7

h...@crypt.org

unread,

Mar 1, 2004, 9:01:30 PM3/1/04

to perl5-...@perl.org

Dave Mitchell <da...@fdisolutions.com> wrote:
:How about the following:
[...]
:+ $x = '1234';

:+ for (substr($x,1,2)) {
:+ $_ = 'a'; print $x,"\n"; # prints 1a4
:+ $_ = 'xyz'; print $x,"\n"; # prints 1xyz4
:+ $x = '56789';
:+ $_ = 'pq'; print $x,"\n"; # prints 5pq9
:+ }

Shouldn't that last one be C< # prints 5pq89 >?

Hugo

h...@crypt.org

unread,

Mar 2, 2004, 6:52:10 AM3/2/04

to Graham Barr, perl5-...@perl.org

Graham Barr <gb...@pobox.com> wrote:
:On 2 Mar 2004, at 02:01, h...@crypt.org wrote:

:
:I can see why you say that, but thats not what happens. The reason is
:that the length of the LVALUE changes to be the length of the last
:assignment to it, this is how modification outside its original window
:is prevented.
:
:As last assignment via the LVALUE was 3 characters long this assignment
:replaces three characters, even though the underlying SV has changed.

Thank you, that makes it perfectly clear to me. I'd suggest a similar
amplification be included in the docpatch.

:So the question is which functionality do we want, this or the ability
:to modify outside the original window ? It would be possible todo both,
:but frankly, is it worth the extra overhead that would be required.

Now that I understand the behaviour being described it seems perfectly
reasonable.

Hugo

Graham Barr

unread,

Mar 2, 2004, 3:09:45 AM3/2/04

to h...@crypt.org, perl5-...@perl.org

On 2 Mar 2004, at 02:01, h...@crypt.org wrote:

I can see why you say that, but thats not what happens. The reason is

that the length of the LVALUE changes to be the length of the last
assignment to it, this is how modification outside its original window
is prevented.

As last assignment via the LVALUE was 3 characters long this assignment
replaces three characters, even though the underlying SV has changed.

So the question is which functionality do we want, this or the ability

to modify outside the original window ? It would be possible todo both,
but frankly, is it worth the extra overhead that would be required.

Graham.

Nicholas Clark

unread,

Mar 6, 2004, 12:09:21 PM3/6/04

to Yitzchak Scott-Thoennes, Graham Barr, perl5-...@perl.org

Mmm. I ought to make a decision on this before 5.8.4

It's a bug, right?
And we fix bugs in maintenance releases?

Nicholas Clark

Yitzchak Scott-Thoennes

unread,

Mar 8, 2004, 11:40:06 AM3/8/04

to perl5-...@perl.org, Graham Barr

On Sat, Mar 06, 2004 at 05:09:21PM +0000, Nicholas Clark <ni...@ccl4.org> wrote:
> Mmm. I ought to make a decision on this before 5.8.4
>

> It's a bug, right?
> And we fix bugs in maintenance releases?

What makes something a bug? It's non-intuitive but consistent, and
has been reported as a bug several times.

What do you think:

$x = "abc";
for (substr($x, 1, 1)) {
print $_; # "b"
$_ = "bb";
print $_; # NEW: "bb" OLD: "b"
}

$x = "abc";
for (substr($x, -2, -1)) {
print $_; # b
$_ = "bcd";
print $_; # NEW: "bcd" OLD: "b"
}

Ton Hospel

unread,

Mar 9, 2004, 5:46:51 AM3/9/04

to perl5-...@perl.org

In article <20040308163945.GA1284@e_n.org>,

Yitzchak Scott-Thoennes <stho...@efn.org> writes:
> What makes something a bug? It's non-intuitive but consistent, and
> has been reported as a bug several times.
>
> What do you think:
>
> $x = "abc";
> for (substr($x, 1, 1)) {
> print $_; # "b"
> $_ = "bb";
> print $_; # NEW: "bb" OLD: "b"
> }
>
> $x = "abc";
> for (substr($x, -2, -1)) {
> print $_; # b
> $_ = "bcd";
> print $_; # NEW: "bcd" OLD: "b"
> }

It causes action at a distance if you pass a substr() result
as an argument.
It makes lvalues get a different value as what you assigned
to it.

#! /usr/bin/perl -wl
# Supposes this is in some module by author X
sub process {
print $& while $_[0] =~ s/.//;
}

# suppose user Y tries to use the module like this:
$a="aBCDef";
process(substr($a, 1, 3));

It was perfectly reasonable for the author of "process"
to write a sub that "consumes" it's argument.

It was perfectly reasonable for the caller of process
to expect 3 chars to get processed.

But actually the whole string gets consumed, it eats away
OUTSIDE the substr bounds.

Leaving this as it was basically means you can never
pass the result of a substr to a sub for modification
unless you KNOW the sub does its work in ONE step.

I think the old effect on ONE assign is somewhat
defendable, but the fact that repeated short assigns
pull in more and more of the original string makes it
an outright bug.

Yitzchak Scott-Thoennes

unread,

Mar 9, 2004, 10:45:26 AM3/9/04

to Ton Hospel, perl5-...@perl.org

On Tue, Mar 09, 2004 at 10:46:51AM +0000, Ton Hospel <perl5-...@ton.iguana.be> wrote:
> It causes action at a distance if you pass a substr() result
> as an argument.
> It makes lvalues get a different value as what you assigned
> to it.
>
> #! /usr/bin/perl -wl
> # Supposes this is in some module by author X
> sub process {
> print $& while $_[0] =~ s/.//;
> }
>
> # suppose user Y tries to use the module like this:
> $a="aBCDef";
> process(substr($a, 1, 3));
>
> It was perfectly reasonable for the author of "process"
> to write a sub that "consumes" it's argument.
>
> It was perfectly reasonable for the caller of process
> to expect 3 chars to get processed.
>
> But actually the whole string gets consumed, it eats away
> OUTSIDE the substr bounds.
>
> Leaving this as it was basically means you can never
> pass the result of a substr to a sub for modification
> unless you KNOW the sub does its work in ONE step.

Not trying to argue that it's not a bug; but you can say:

process(scalar substr($a, 1, 3));

to prevent this. Should this be documented? Why does it work?

Spider Boardman

unread,

Mar 9, 2004, 11:16:02 AM3/9/04

to Yitzchak Scott-Thoennes, Ton Hospel, perl5-...@perl.org

On Tue, 9 Mar 2004 07:45:26 -0800, Yitzchak Scott-Thoennes wrote (in part):

yst> On Tue, Mar 09, 2004 at 10:46:51AM +0000, Ton Hospel
yst> <perl5-...@ton.iguana.be> wrote:

th> It causes action at a distance if you pass a substr() result as an
th> argument. It makes lvalues get a different value as what you assigned
th> to it.

[snippage by /sb]

th> But actually the whole string gets consumed, it eats away OUTSIDE the
th> substr bounds.

yst> Not trying to argue that it's not a bug; but you can say:

yst> process(scalar substr($a, 1, 3));

yst> to prevent this. Should this be documented? Why does it work?

Well, as the person most responsible for the bug, I suppose I should chime
in here. First, I agree that it's a bug. It's one of a couple which I'd
meant to revisit, then never got the time because of changes in my
circumstances. The assignments through an LV-substr should adjust the
'margins' to account for changes in the 'window'. [The other has already
been addressed, so far as I can tell, in that the LV now starts with its
fetched contents when passed -- as it should.]

As to why C<scalar()> avoids the issue, it's because in the cited code
above you're now ref'ing an OP_SCALAR instead of an OP_SUBSTR -- and the
scalar is not passing ref-ness down to the substr. Thus, it's a
better-optimized form of C< '' . substr($a, 1, 3) >. In other words, it's
an expression instead of a direct reference. The modifiable value passed
to the sub is a temporary rather than an LV-substr. [Or maybe it's
readonly? I haven't checked exactly what's passed, only that it's not
been made an LV.]

That's why/how it works. I don't think it should be documented, really,
because I think it's wrong. IMnsHO, the ref-ness of scalar should be
passed down to certain LV-able children OPs, such as substr and (probably)
keys.

When these manipulations first got done, C< \keys %h > was a one-shot LV,
in that it disassociated itself from %h after the first assignment. This
was because LVs weren't yet ref-counting their targets. They now do, and
if LV-keys is still a one-shot, that's another bug that should get fixed.

Hope this helps,

--s.

Chip Salzenberg

unread,

Mar 9, 2004, 11:28:48 AM3/9/04

to Yitzchak Scott-Thoennes, Ton Hospel, perl5-...@perl.org

According to Spider Boardman:

> IMnsHO, the ref-ness of scalar should be passed down to certain
> LV-able children OPs, such as substr and (probably) keys.

Since scalar is just a directive to change context, I don't see why it
should change *anything* else. So the LVALUE creation code (wherever
it is) should skip OP_SCALAR. IMO.
--
Chip Salzenberg - a.k.a. - <ch...@pobox.com>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K

Yitzchak Scott-Thoennes

unread,

Mar 9, 2004, 12:06:33 PM3/9/04

to perl5-...@perl.org, Ton Hospel

On Tue, Mar 09, 2004 at 11:16:02AM -0500, Spider Boardman <spi...@leggy.zk3.dec.com> wrote:
> On Tue, 9 Mar 2004 07:45:26 -0800, Yitzchak Scott-Thoennes wrote (in part):
> As to why C<scalar()> avoids the issue, it's because in the cited code
> above you're now ref'ing an OP_SCALAR instead of an OP_SUBSTR -- and the
> scalar is not passing ref-ness down to the substr.

Looks to me like scalar passes ref-ness down:

Perl_ref(pTHX_ OP *o, I32 type)
{
...
case OP_SCALAR:
case OP_NULL:
if (!(o->op_flags & OPf_KIDS))
break;
ref(cBINOPo->op_first, type);

Looks like it's Perl_mod not passing lvalueness down.

> Thus, it's a
> better-optimized form of C< '' . substr($a, 1, 3) >. In other words, it's
> an expression instead of a direct reference. The modifiable value passed
> to the sub is a temporary rather than an LV-substr. [Or maybe it's
> readonly? I haven't checked exactly what's passed, only that it's not
> been made an LV.]
>
> That's why/how it works. I don't think it should be documented, really,
> because I think it's wrong. IMnsHO, the ref-ness of scalar should be
> passed down to certain LV-able children OPs, such as substr and (probably)
> keys.

I kind of like having an operator that turns off lvalueness. Especially
one with no runtime impact.

> When these manipulations first got done, C< \keys %h > was a one-shot LV,
> in that it disassociated itself from %h after the first assignment. This
> was because LVs weren't yet ref-counting their targets. They now do, and
> if LV-keys is still a one-shot, that's another bug that should get fixed.

I don't seem to be able to get an lvalue with \keys %h. The \ forces
list context. Doing it this way seems to show it working, unless I
misunderstand what you are questioning:

$ perl -wle'sub foo ($) { print $_[0]; %h = 0..999; print $_[0] } foo(keys %h)'
0
500

What do you think about ties on lvalues:
http://rt.perl.org/rt3/Ticket/Display.html?id=27010

Yitzchak Scott-Thoennes

unread,

Mar 9, 2004, 12:20:25 PM3/9/04

to Chip Salzenberg, Ton Hospel, perl5-...@perl.org

On Tue, Mar 09, 2004 at 11:28:48AM -0500, Chip Salzenberg <ch...@pobox.com> wrote:
> According to Spider Boardman:
> > IMnsHO, the ref-ness of scalar should be passed down to certain
> > LV-able children OPs, such as substr and (probably) keys.
>
> Since scalar is just a directive to change context, I don't see why it
> should change *anything* else. So the LVALUE creation code (wherever
> it is) should skip OP_SCALAR. IMO.

Are you saying Perl_mod should or should not recurse on an OP_SCALAR's
child?

Chip Salzenberg

unread,

Mar 9, 2004, 12:30:09 PM3/9/04

to Yitzchak Scott-Thoennes, Ton Hospel, perl5-...@perl.org

According to Yitzchak Scott-Thoennes:

I'm saying it should. Not with great conviction, though.

Spider Boardman

unread,

Mar 9, 2004, 1:37:37 PM3/9/04

to Yitzchak Scott-Thoennes, perl5-...@perl.org, Ton Hospel

On Tue, 9 Mar 2004 09:06:33 -0800, Yitzchak Scott-Thoennes wrote (in part):

yst> On Tue, Mar 09, 2004 at 11:16:02AM -0500, Spider Boardman
yst> <spi...@leggy.zk3.dec.com> wrote:

sb> As to why C<scalar()> avoids the issue, it's because in the cited code
sb> above you're now ref'ing an OP_SCALAR instead of an OP_SUBSTR -- and
sb> the scalar is not passing ref-ness down to the substr.

yst> Looks to me like scalar passes ref-ness down:

yst> Looks like it's Perl_mod not passing lvalueness down.

Yes, I mis-remembered where in op.c I was making changes back when. It is
indeed mod-ness, not ref-ness, that's the issue.

sb> Thus, it's a better-optimized form of C< '' . substr($a, 1, 3) >.

sb> That's why/how it works. I don't think it should be documented,
sb> really, because I think it's wrong. IMnsHO, the ref-ness of scalar
sb> should be passed down to certain LV-able children OPs, such as substr
sb> and (probably) keys.

yst> I kind of like having an operator that turns off lvalueness.
yst> Especially one with no runtime impact.

While I can understand that, I still think Chip's explanation of his
agreement was spot on. It really only ought to affect context, not
lvalueness. Also, even though this gets into "in a perfect world" type of
discussions, you seem only to really care about eliminating lvalueness
because it's not working properly. If it worked, would you still care?
In any case, it only sometimes stops lvalueness in general, as distinct
from LV-ness, as this shows:

$ perl -le 'sub a($){$_[0]x=2} $a="a";a scalar $a;print $a'
aa

sb> When these manipulations first got done, C< \keys %h > was a one-shot
sb> LV, in that it disassociated itself from %h after the first
sb> assignment. This was because LVs weren't yet ref-counting their
sb> targets. They now do, and if LV-keys is still a one-shot, that's
sb> another bug that should get fixed.

yst> I don't seem to be able to get an lvalue with \keys %h. The \ forces
yst> list context. Doing it this way seems to show it working, unless I
yst> misunderstand what you are questioning:

You did misunderstand my sloppy explanation. However, the behaviour in
question has since been fixed, as this shows:

$ perl -le '%a=(a=>1);$a=\(keys %a=42);$$a=63;$$a=65;print scalar %a'
1/128

yst> What do you think about ties on lvalues:
yst> http://rt.perl.org/rt3/Ticket/Display.html?id=27010

I think that attempts to stack various types of assignment-intercepting
magic in perl5 expose the lairs of dragons. That's without having had the
time to follow the reference, or otherwise to refresh myself on that
thread. I hope to find the time to re-read it, and thus to be able to
make more meaningful comments, but I won't promise anything, given my
current schedule.

--s.

Ton Hospel

unread,

Mar 9, 2004, 2:40:54 PM3/9/04

to perl5-...@perl.org

In article <20040309154526.GB3952@e_n.org>,

Yitzchak Scott-Thoennes <stho...@efn.org> writes:
> Not trying to argue that it's not a bug; but you can say:
>

> process(scalar substr($a, 1, 3));

>
> to prevent this. Should this be documented? Why does it work?

It wouldn't "consume" the substr anymore, which supposedly was
the point of "process".

I could argue this is a bug in "scalar" in fact...

Rafael Garcia-Suarez

unread,

Mar 11, 2004, 6:16:41 PM3/11/04

to Dave Mitchell, Yitzchak Scott-Thoennes, perl5-...@perl.org

Dave Mitchell wrote:
> On Sun, Feb 29, 2004 at 12:07:02PM -0800, Yitzchak Scott-Thoennes wrote:
> > At the moment, I'm inclined to
> > think Graham's patch should be applied to blead but not maint. It
> > needs documentation, also.
>
> How about the following:
>
>
> --- perlfunc.pod- Mon Mar 1 23:41:25 2004
> +++ perlfunc.pod Mon Mar 1 23:58:25 2004

Thanks, applied as #22488.

Father Chrysostomos via RT

unread,

Dec 3, 2011, 4:52:10 PM12/3/11

to perl5-...@perl.org

Unfortunately this never was addressed for 5.10.0. It is getting in the
way of other bugs, such as this:

sub myprint { print @_ }
print substr("", 2); # warning
myprint substr("", 2); # error

so I’m working on it for 5.15.6.

See
<http://www.nntp.perl.org/group/perl.perl5.porters/;msgid=63562EAE-84D6-48BF...@cpan.org>.

(I’m adding it to this ticket mainly for future reference.)

--

Father Chrysostomos

Father Chrysostomos via RT

unread,

Dec 4, 2011, 4:35:13 PM12/4/11

to perl5-...@perl.org

On Sat Dec 03 13:52:09 2011, sprout wrote:
> On Wed Oct 29 10:59:37 2003, ysth wrote:
> > On Wed, Oct 29, 2003 at 08:08:37AM +0000, Graham Barr
> > <gb...@pobox.com> wrote:
> > > On Oct 29, 2003, at 5:55, Yitzchak Scott-Thoennes wrote:
> > > >On Wed, Oct 29, 2003 at 02:06:12AM -0000, "perl-

> 5....@ton.iguana.be

> 84D6-48BF-856...@cpan.org>.

It’s now fixed with commit 83f78d1a2.

--

Father Chrysostomos

Father Chrysostomos via RT

unread,

Dec 5, 2011, 4:43:03 PM12/5/11

to perl5-...@perl.org

Several others have argued the same thing in this ticket. Attached is a
patch to fix it. It also allows (foo(), scalar bar())=@list for lvalue
subroutines, which I think is good. But it allows scalar($foo)=3, but
not scalar(@foo)=3. Do we want this? I think it is harmless and makes
scalar() consistent with the implicit scalar() provided by the ($)
prototype.

--

Father Chrysostomos

let scalar() propagate lvalueness.txt

Father Chrysostomos via RT

unread,

Dec 13, 2011, 11:54:22 AM12/13/11

to perl5-...@perl.org

I’ve now applied it as d408447.

--

Father Chrysostomos

Father Chrysostomos via RT

unread,

Dec 18, 2011, 2:16:12 PM12/18/11

to perl5-...@perl.org

On Tue Dec 13 08:54:22 2011, sprout wrote:
> On Mon Dec 05 13:43:02 2011, sprout wrote:
> > On Tue Mar 09 11:42:01 2004, perl5-...@ton.iguana.be wrote:

> > Several others have argued the same thing in this ticket. Attached is a
> > patch to fix it. It also allows (foo(), scalar bar())=@list for lvalue
> > subroutines, which I think is good. But it allows scalar($foo)=3, but
> > not scalar(@foo)=3. Do we want this? I think it is harmless and makes
> > scalar() consistent with the implicit scalar() provided by the ($)
> > prototype.
> >
>
> I’ve now applied it as d408447.
>

And reverted it as 41b1a11c4. See ticket #106288 for details.

--

Father Chrysostomos