Memoize problem with forbidden scalar context

Tels

unread,

Aug 29, 2002, 12:17:48 PM8/29/02

to Tony Bowden, perl5-...@perl.org, mjd-perl...@plover.com

-----BEGIN PGP SIGNED MESSAGE-----

Moin,

Tony wrote:

>I'm tripping over a strange problem with Memoize. I've a simple script
>that plays the 'Snakes and Ladders' calculator game[1] (get a number -

I remember this under another name, but forgot which. I remember also
trying this with BigInt and bignum, but not with Memoize.

Anyway it is present in v5.8.0:

te@null:~> perl snake.pl
Anonymous function called in forbidden scalar context; faulting at snake.pl
line 16
te@null:~> perl -v

This is perl, v5.8.0 built for i686-linux

HTH,

Tels

- --
perl -MDev::Bollocks -e'print Dev::Bollocks->rand(),"\n"'
conveniently facilitate scalable infrastructures

http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBPW5JJXcLPEOTuEwVAQEemwf8Cp02ZOMLh/Ld14hP5OuNwFKt57S6EV5x
L6hfuBRCmJnZuCXAdtaCDdpNTSwM+Hi/TtHSchtaDF9PvMedzgFhj5wSSDiAQ47f
JxsezkDbl4vrC/ToC4M6gZ2GgAtCpoW2ZZ10rY0kmSzJr88pzOIHKuJQrZqOr8Q6
GtGK5lywreCT/B5XqWZ/Z3/AiEGWdE6r2yEECpWCsqNM/A6w/nmY4HgiKGebdwn8
m5M5BaHzebpDB8KniTgAaEJoSeVyRO3X4FvRo9w4/k6xQdSgL/svX0fNH4owWFiM
nGg49uErA62K1NSEJeVkMNVRADcIq2tOLUfmN3SRaceuMax09hHUTQ==
=U49v
-----END PGP SIGNATURE-----

Tony Bowden

unread,

Aug 29, 2002, 12:09:34 PM8/29/02

to perl5-...@perl.org, mjd-perl...@plover.com

I'm tripping over a strange problem with Memoize. I've a simple script
that plays the 'Snakes and Ladders' calculator game[1] (get a number -

if it's even divide by 2, if it's odd multiply by 3 and add 1 - see how
long it takes you get to 1) [2]:

#!/usr/bin/perl

use strict;
use warnings;
no warnings 'recursion';

use Memoize;
memoize('count_steps');

my $i = 703;
printf "%d: %d\n", $i, count_steps($i);

sub count_steps {
my $num = shift;
return 0 if $num == 1;
return 1 + (($num % 2)
? count_steps($num * 3 + 1)
: count_steps($num / 2)
)
}

This all works fine for any value of $i up to this point. But at 703 it
falls over with:

Anonymous function called in forbidden scalar context; faulting at

sandl.simple line 16

I sprinkled some debug prints and things got even stranger...

If I add the following line to the start of sub _memoizer:

sub _memoizer {
warn "MEMO @_\n";
my $orig = shift; # stringized version of ref to original func.

Suddenly everything starts working! Adding this one line stops the error
from happening...

Moving it one line later, triggers the bug again:

sub _memoizer {
my $orig = shift; # stringized version of ref to original func.
warn "MEMO @_\n";

Then gives up after a while with:

....
MEMO 35720
MEMO 17860
MEMO CODE(0x813dc3c)

(In the previous version, that worked, that line was, as expected:
MEMO CODE(0x813dc3c) 35720
MEMO CODE(0x813dc3c) 17860
MEMO CODE(0x813dc3c) 8930
)

My brain was hurting too much at this point, so I gave up, and decided
to report it as a bug...

(lastest CPAN version, with 5.6.1 - I don't have 5.8 installed anywhere
yet...)

Tony

[1] I'm sure there's a more official name for this, but this is how I
learnt it...
[2] I also believe that no one has yet actually proven that this will
always reach one ..

Abigail

unread,

Aug 29, 2002, 1:18:39 PM8/29/02

to perl5-...@perl.org

On Thu, Aug 29, 2002 at 06:17:48PM +0200, Tels wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
>
> Moin,
>
> Tony wrote:
>

> >I'm tripping over a strange problem with Memoize. I've a simple script
> >that plays the 'Snakes and Ladders' calculator game[1] (get a number -
>

> I remember this under another name, but forgot which. I remember also
> trying this with BigInt and bignum, but not with Memoize.
>
> Anyway it is present in v5.8.0:
>
> te@null:~> perl snake.pl
> Anonymous function called in forbidden scalar context; faulting at snake.pl
> line 16
> te@null:~> perl -v

That's more than I get - all I get is "Segmentation fault".
Here's a stack trace:

#0 0x80fe470 in Perl_pp_grepstart () at pp_ctl.c:775
#1 0x80b3583 in Perl_runops_debug () at dump.c:1398
#2 0x8061c31 in S_run_body (oldscope=1) at perl.c:1681
#3 0x80617dd in perl_run (my_perl=0x81765e8) at perl.c:1600
#4 0x805df2b in main (argc=2, argv=0xbffff5f4, env=0xbffff600)
at perlmain.c:85
#5 0x4006f2eb in __libc_start_main (main=0x805de90 <main>, argc=2,
ubp_av=0xbffff5f4, init=0x805cf5c <_init>, fini=0x81516ec <_fini>,
rtld_fini=0x4000c130 <_dl_fini>, stack_end=0xbffff5ec)
at ../sysdeps/generic/libc-start.c:129

valgrind also says:

==1617== Invalid read of size 4
==1617== at 0x80FE470: Perl_pp_grepstart (pp_ctl.c:775)
==1617== by 0x80B3583: Perl_runops_debug (dump.c:1398)
==1617== by 0x8061C31: S_run_body (perl.c:1684)
==1617== by 0x80617DD: perl_run (perl.c:1600)
==1617== Address 0x8 is not stack'd, malloc'd or free'd

Abigail

Mark-Jason Dominus

unread,

Aug 29, 2002, 2:05:35 PM8/29/02

to Tony Bowden, perl5-...@perl.org, mjd-perl...@plover.com

> I'm tripping over a strange problem with Memoize.

Thanks for the report. This appears to be a bug with Perl's handling
of @_ in conjunction with magic goto. I've encountered other bugs
with @_ and magic goto before, but been unable to resolve them. It's
possible that I will be able to make more progress now. But I won't
be able to look into it in detail for some time, since I'm moving next
week.

--- Technical notes follow:

I have no good characterization of the bug at present. It's highly
perturbable.

Memoize uses string 'eval' to create a stub function which it installs
into the symbol table in place of count_steps. The stub function's
job is to unshift a reference to count_steps onto @_ and then use
magic 'goto' to transfer control to Memoize::_memoizer, which manages
the cache and calls the real count_steps function if necessary.

I ran perl5.7.3 -Dst snakes.pl and examined the stack trace output.
The trace indicated that although the stub is modifying @_ correctly,
the contents of @_ are wrong when _memoizer is called. An excerpt
follows. First we're going to execute the stub function, which is:

unhift @_, $cref; goto &_memoizer;

There are two statements here. The first does the 'unshift':

((eval 1):1) pushmark => ... * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) * IV(1) *

These many IV(1)'s are to be expected, because Tony's count_steps
subroutine looks like sub count_steps { ... return 1 + ... count_steps(...) }.
I will omit them in the following lines.

((eval 1):1) gv(main::_)
=> ... * GV()
((eval 1):1) rv2av => ... * AV()
((eval 1):1) padsv($cref)
=> ... * AV() \CV(count_steps)

As we can see on the previous line, $cref has the right value and @_
is on the stack, ready to receive it. Then the 'unshift' happens:

((eval 1):1) unshift => ... IV(2)
((eval 1):1) nextstate => ...

Now we're in the "goto &_memoizer;" statement:

((eval 1):1) pushmark => ... *
((eval 1):1) gv(Memoize::_memoizer)
=> ... * GV()
((eval 1):1) rv2cv => ... * CV(_memoizer)
((eval 1):1) refgen => ... \CV(_memoizer)
((eval 1):1) goto => ... * \CV(count_steps)

On the last line, the goto took place. It removed \&_memoizer from
the stack and is about to jump into the code for _memoizer, which
begins at Memoize.pm:222.

But how did \&count_steps get onto the stack? If the contents of @_
were spilled onto the stack, then why is only $_[0] there? ($_[1] at
this point should be the real original argument to count_steps, which
I think should be IV(8390).)

((eval 1):1) nextstate => ... * \CV(count_steps)
(/tmp/memo/Memoize.pm:222) gv(main::_)
=> ... * \CV(count_steps) GV()

Now we're in _memoizer. The first line, line 222, is:

my $orig = shift;

The previous trace line shows *_ on the stack, waiting for the shift().

(/tmp/memo/Memoize.pm:222) rv2av => ... * \CV(count_steps) AV()

Now @_ is on the stack.

(/tmp/memo/Memoize.pm:222) shift => ... * \CV(count_steps) \CV(_memoizer)

The shift() found \&_memoizer in @_. But @_ was supposed to contain
(\&count_steps, 8390)! How did \&_memoizer get there? I certainly
didn't put it there.

A slightly different version of Memoize.pm produces SV_UNDEF at this
point instead of \&_memoizer. This is equally disastrous.

(/tmp/memo/Memoize.pm:222) padsv($orig)
=> ... * \CV(count_steps) \CV(_memoizer) UNDEF
(/tmp/memo/Memoize.pm:222) sassign => ... * \CV(count_steps) \CV(_memoizer) UNDEF

Now the '$orig' argument contains \&_memoizer, but it should contain
\&_count_steps.

After this point, everything goes haywire, because _memoize
is going to try to act on behalf of _memoize instead of count_steps.
Since the appropriate cache structures were never set up, it gets
confused and eventually croaks.

The only fault in Memoize itself is that the diagnostic was bad. It
should say "Hey! My argument is a reference to a function that is not
memoized!" and immediately drop dead, instead of going ahead and
producing the spurious 'forbidden scalar context' error.

I don't know why \&count_steps got onto the stack during the magic
goto, whether this is correct behavior or not. It does not appear to
have been spilled in this way in the many previous calls to the stub.
For example:

((eval 1):1) padsv($cref)
=> ... * AV() \CV(count_steps)
((eval 1):1) unshift => ... IV(2)
((eval 1):1) nextstate => ...
((eval 1):1) pushmark => ... *
((eval 1):1) gv(Memoize::_memoizer)
=> ... * GV()
((eval 1):1) rv2cv => ... * CV(_memoizer)
((eval 1):1) refgen => ... \CV(_memoizer)
((eval 1):1) goto => ...
((eval 1):1) nextstate => ...
(/tmp/memo/Memoize.pm:222) gv(main::_)
=> ... GV()

I've placed a copy of the complete, unedited trace at
http://www.plover.com/~mjd/misc/trace.gz, in case anyone wants to see
it. I hope these notes are helpful.

Yves Orton

unread,

Aug 30, 2002, 10:55:07 AM8/30/02

to Mark-Jason Dominus, Tony Bowden, perl5-...@perl.org, mjd-perl...@plover.com

I did a little checking into this before I read MJD's comments, and I got a
similar result by hacking Memoize::_crap_out() to do a Carp::confess()
instead of a Carp::carp() (which ive chopped for the same reason as MJD
did.)

Anonymous function called in forbidden scalar context; faulting at

E:/Perl/site/lib/Memoize.pm line 351
Memoize::_crap_out(undef, 'scalar') called at
E:/Perl/site/lib/Memoize.pm line 243
Memoize::_memoizer(undef, undef) called at c:\temp\snake.pl line 19
main::count_steps(17860) called at E:/Perl/site/lib/Memoize.pm line
247
Memoize::_memoizer('CODE(0x1be452c)', 17860) called at
c:\temp\snake.pl line 19
main::count_steps(35720) called at E:/Perl/site/lib/Memoize.pm line
247
Memoize::_memoizer('CODE(0x1be452c)', 35720) called at
c:\temp\snake.pl line 19
main::count_steps(71440) called at E:/Perl/site/lib/Memoize.pm line
247
Memoize::_memoizer('CODE(0x1be452c)', 71440) called at
c:\temp\snake.pl line 19
....

However while trying to figure out what was going on I modified the orginal
code as follows:

sub count_steps {
my $num = shift;

my $foo=join("--",$num,($num * 3 + 1),($num/2)); # this for some reason
saves the day....

return 0 if $num == 1;
return 1 + (($num % 2)
? count_steps($num * 3 +1)
: count_steps($num / 2)
)
}

Which stops the fault from occuring on my machine. So maybe if someone who
understands perlguts enough can figure out why the join() stops this from
happening, then they will also have a good idea where the bug is. Another
point is that a handrolled sub _doesnt_ have the same effect as join() did,
so I surmise that there is something happening in the CORE:: subs that isnt
happening in user defined subs. For instance changing the call to join() to
a call to

sub outside {
return @_;
}

doesnt have the same effect as join().

As an aside though I dont really understand why Tony is using memoize() for
this anyway:
BEGIN {
my %cache = ( 1 => 0 );

sub count_steps {
my $num = shift;

return $cache{$num} if exists $cache{$num};
$cache{$num} = 1 + (
( $num % 2 )
? ( exists( $cache{ +$num * 3 + 1 } ) )
? $cache{ +$num * 3 + 1 }
: count_steps( $num * 3 + 1 )
: ( exists( $cache{ +$num / 2 } ) )
? $cache{ +$num / 2 }
: count_steps( $num / 2 )
);
return $cache{$num};
}
}
would I think be faster than using memoize as it uses many less subroutine
calls than memoize would.

Yves

Tony Bowden

unread,

Aug 30, 2002, 11:09:59 AM8/30/02

to Orton, Yves, perl5-...@perl.org, mjd-perl...@plover.com

On Fri, Aug 30, 2002 at 03:55:07PM +0100, Orton, Yves wrote:
> As an aside though I dont really understand why Tony is using memoize() for
> this anyway:

[ code elided]

> would I think be faster than using memoize as it uses many less subroutine
> calls than memoize would.

Because it's just a toy script, and just dropping in Memoize gives me
a much bigger bang for the buck than hand-rolling my own version of it.

Surely you could apply this argument to any place where Memoize could be
used?

But thanks for investigating this. There seems to be several things that,
quite strangely, stop the bug happening. It seems to be quite a weird
problem indeed.

Tony

Mark-Jason Dominus

unread,

Aug 30, 2002, 11:08:50 AM8/30/02

to Orton, Yves, Mark-Jason Dominus, Tony Bowden, perl5-...@perl.org

> However while trying to figure out what was going on I modified the orginal
> code as follows:
>

> my $foo=join("--",$num,($num * 3 + 1),($num/2)); # this for some reason
> saves the day....
>

> Which stops the fault from occuring on my machine.

When I said

[The bug is] highly perturbable.

this is the sort of thing I meant. All sorts of little changes to the
source code that should be irrelevant change the bug in various ways.
For example, adding

print "_memoizer(@_)\n";

to the top of Memoize::_memoizer makes the bug 'go away'. But adding

my $s = join " ", @_;

does not.

> So maybe if someone who understands perlguts enough can figure out
> why the join() stops this from happening, then they will also have a
> good idea where the bug is.

My experience with this kind of perturbable bug suggests that that
isn't likely to yield much of use. It's analogous to the kind of
errors that you see in a C programs with a corrupt malloc() arena.
All sorts of things are going to behave very weirdly, because the
fundamental rules that you usually use to reason about the program's
behavior are no longer true.

> As an aside though I dont really understand why Tony is using memoize() for
> this anyway:

The main value of using Memoize is exactly that you *don't* have to
modify the code in the way that you showed. Hand-rolled caching code
like that is always going to beat Memoize. The big gain from Memoize
over hand-rolled caches is in development and maintenance time.

Presumably, Tony thinks that the speed loss is offset by the
readability gain in this case; I would agree.

Yves Orton

unread,

Aug 30, 2002, 11:36:18 AM8/30/02

to Tony Bowden, Orton, Yves, perl5-...@perl.org, mjd-perl...@plover.com

Tony Bowden said on 30 August 2002 17:10

>> would I think be faster than using memoize as it uses many less
subroutine
>> calls than memoize would.
>
>Because it's just a toy script, and just dropping in Memoize gives me
>a much bigger bang for the buck than hand-rolling my own version of it.

Oh sorry, i didn't realize that that was just a toy. (nor would i have
rewritten it... :-)

>Surely you could apply this argument to any place where Memoize could be
used?

I wouldnt say so. Its a question of scale. If I need some basic caching of
a function I'll roll my own every time (only after investigating changes in
algorithm to avoid using caching in the first place). If I need
sophisticated caching handling both LIST and SCALAR context, a normalizer or
equivelent methods then that would be the time I pull MJD's super duper
version out of the tool box.

Consider the classic example: a recursive fibonacci number generator. Which
makes more sense, using the recursive routine with Memoize wrapped around it
or rewriting it to use a combination of iteration and (maybe) cached values?
To me the latter every time... But i admit that others may not take the same
approach.

>But thanks for investigating this. There seems to be several things that,
>quite strangely, stop the bug happening. It seems to be quite a weird
>problem indeed.

No prob, I was interested because a similar but different bug has bitten me
in the past.

yves

Yves Orton

unread,

Aug 30, 2002, 11:26:00 AM8/30/02

to Mark-Jason Dominus, perl5-...@perl.org, Tony Bowden

Mark-Jason Dominus said on 30 August 2002 17:09

> When I said
>
> [The bug is] highly perturbable.
>
> this is the sort of thing I meant. All sorts of little changes to the
> source code that should be irrelevant change the bug in various ways.
> For example, adding
>
> print "_memoizer(@_)\n";
>
> to the top of Memoize::_memoizer makes the bug 'go away'. But adding
>
> my $s = join " ", @_;
>
> does not.

This is all too weird to me. But i thought it was useful to point out that a
change in the memoized function would stop the error, versus a change
_memoizer() which involves changing your module locally.

>
> > So maybe if someone who understands perlguts enough can figure out
> > why the join() stops this from happening, then they will also have a
> > good idea where the bug is.
>
> My experience with this kind of perturbable bug suggests that that
> isn't likely to yield much of use. It's analogous to the kind of
> errors that you see in a C programs with a corrupt malloc() arena.
> All sorts of things are going to behave very weirdly, because the
> fundamental rules that you usually use to reason about the program's
> behavior are no longer true.

Ah so. Well I was just trying to help. (I still think in the back of my mind
that if it were possible to identify exhaustively which keywords avoided the
problem it would be a place to start, but I bow to your experience.)

> The main value of using Memoize is exactly that you *don't* have to
> modify the code in the way that you showed. Hand-rolled caching code
> like that is always going to beat Memoize. The big gain from Memoize
> over hand-rolled caches is in development and maintenance time.

Ah see, theres the difference in our thinking. Ive always looked at Memoize
as a tool to handle more sophisticated caching scenarios, such as when both
LIST and SCALR context are involved, so the simple example threw me off.

Sorry about that Tony.

yves

Benjamin Goldberg

unread,

Aug 30, 2002, 3:34:08 PM8/30/02

to Orton, Yves, p5p

Orton, Yves wrote:
[snip]

> As an aside though I dont really understand why Tony is using
> memoize() for this anyway:

[snip modified version of code which does caching]

> would I think be faster than using memoize as it uses many less
> subroutine calls than memoize would.

As has been said, the point was that it was a toy script... however, I
would like to suggest my own version of how one would rewrite it with
caching:

BEGIN {
my %cache = ( 1 => 0 );
sub count_steps {

my $num = int(shift());
my @steps;
until( exists $cache{$num} ) {
unshift @steps, $num;
($num & 1) ?
($num = $num * 3 + 1) :
($num /= 2);
}
@cache{ @steps } =
$cache{$num}+1 .. $cache{$num}+@steps;
$cache{$num}+@steps;
}
}
[untested]

PS: In isolation, push() seems to be 11% faster than unshift()... I'm
not sure whether or not it would be faster to replace the unshift() with
push(), then later reverse() when slicing %cache. Is there any
particular reason for unshift() to be slower?

--
tr/`4/ /d, print "@{[map --$| ? ucfirst lc : lc, split]},\n" for
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';

Yves Orton

unread,

Aug 30, 2002, 3:49:07 PM8/30/02

to Benjamin Goldberg, Orton, Yves, p5p

Benjamin Goldberg said on 30 August 2002 21:34

> BEGIN {
> my %cache = ( 1 => 0 );
> sub count_steps {
> my $num = int(shift());
> my @steps;
> until( exists $cache{$num} ) {
> unshift @steps, $num;
> ($num & 1) ?
> ($num = $num * 3 + 1) :
> ($num /= 2);
> }
> @cache{ @steps } =
> $cache{$num}+1 .. $cache{$num}+@steps;
> $cache{$num}+@steps;
> }
> }

Heh. MJD sent me a very similar piece of code. If you'd like ill send you a
copy...

>
> PS: In isolation, push() seems to be 11% faster than unshift()... I'm
> not sure whether or not it would be faster to replace the
> unshift() with
> push(), then later reverse() when slicing %cache. Is there any
> particular reason for unshift() to be slower?

Apparently to optimize for the common case. (People push/pop more often than
they unshift/shift.) There is an excellent article on it on perlmonks titled
"Shift, Pop, Unshift and Push with Impunity!" reachable from
http://www.perlmonks.org/index.pl?node_id=17890

Yves

Benjamin Goldberg

unread,

Aug 30, 2002, 4:17:27 PM8/30/02

to Orton, Yves, p5p

> Orton, Yves wrote:
>
> Benjamin Goldberg said on 30 August 2002 21:34

[snip]

> > PS: In isolation, push() seems to be 11% faster than unshift()...
> > I'm not sure whether or not it would be faster to replace the
> > unshift() with push(), then later reverse() when slicing %cache. Is
> > there any particular reason for unshift() to be slower?
>
> Apparently to optimize for the common case. (People push/pop more
> often than they unshift/shift.) There is an excellent article on it on
> perlmonks titled "Shift, Pop, Unshift and Push with Impunity!"
> reachable from
>
> http://www.perlmonks.org/index.pl?node_id=17890

It seems to me that although a newly created AV should be biased to make
push()es more efficient than unshift()s, perl *could* be even more
clever -- if an AV needs to be re-allocated due to a push, bias the AV
so push is more efficient; if an AV needs to be re-allocated due to an
unshift, bias the AV so that unshift is more efficient. This of course
assumes that a bunch of push()es will be followed shortly thereafter by
more push()es, and that a bunch of unshift()s will be followed shortly
thereafter by more unshift()s -- not an unreasonable assumption, IMHO.

If this sounds like a good idea to anyone else besides me, I'll write a
patch.

Nicholas Clark

unread,

Aug 30, 2002, 5:16:49 PM8/30/02

to Benjamin Goldberg, Orton, Yves, p5p

On Fri, Aug 30, 2002 at 04:17:27PM -0400, Benjamin Goldberg wrote:
> It seems to me that although a newly created AV should be biased to make
> push()es more efficient than unshift()s, perl *could* be even more
> clever -- if an AV needs to be re-allocated due to a push, bias the AV
> so push is more efficient; if an AV needs to be re-allocated due to an
> unshift, bias the AV so that unshift is more efficient. This of course
> assumes that a bunch of push()es will be followed shortly thereafter by
> more push()es, and that a bunch of unshift()s will be followed shortly
> thereafter by more unshift()s -- not an unreasonable assumption, IMHO.
>
> If this sounds like a good idea to anyone else besides me, I'll write a
> patch.

Seems like a good idea to me.
Hardest part - are you able to write a benchmark that shows the new unshift
code actually speeds things up by a measurable amount.

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

h...@crypt.org

unread,

Aug 30, 2002, 8:19:37 PM8/30/02

to Benjamin Goldberg, p5p

Benjamin Goldberg <gol...@earthlink.net> wrote:
:It seems to me that although a newly created AV should be biased to make

:push()es more efficient than unshift()s, perl *could* be even more
:clever -- if an AV needs to be re-allocated due to a push, bias the AV
:so push is more efficient; if an AV needs to be re-allocated due to an
:unshift, bias the AV so that unshift is more efficient. This of course
:assumes that a bunch of push()es will be followed shortly thereafter by
:more push()es, and that a bunch of unshift()s will be followed shortly
:thereafter by more unshift()s -- not an unreasonable assumption, IMHO.
:
:If this sounds like a good idea to anyone else besides me, I'll write a
:patch.

If I remember rightly, this has been proposed and maybe tried before.
It would be worth having a go at the archives.

Hugo

Benjamin Goldberg

unread,

Aug 30, 2002, 11:36:25 PM8/30/02

to Nicholas Clark, p5p

That's actually fairly easy -- simply use cmpthese with push and unshift
before and after the change. After the change, if there was any
improvement, they should be much closer to each other in terms of speed.

Benjamin Goldberg

unread,

Aug 31, 2002, 12:39:51 AM8/31/02

to Nicholas Clark, p5p

Nicholas Clark wrote:
>
> On Fri, Aug 30, 2002 at 04:17:27PM -0400, Benjamin Goldberg wrote:
> > It seems to me that although a newly created AV should be biased to
> > make push()es more efficient than unshift()s, perl *could* be even
> > more clever -- if an AV needs to be re-allocated due to a push, bias
> > the AV so push is more efficient; if an AV needs to be re-allocated
> > due to an unshift, bias the AV so that unshift is more efficient.
> > This of course assumes that a bunch of push()es will be followed
> > shortly thereafter by more push()es, and that a bunch of unshift()s
> > will be followed shortly thereafter by more unshift()s -- not an
> > unreasonable assumption, IMHO.
> >
> > If this sounds like a good idea to anyone else besides me, I'll
> > write a patch.
>
> Seems like a good idea to me.

Having looked through the code, I learned that pp_unshift and av_unshift
*do* put the old values on the end, so that the free space is up in
front. The difference is that there are more C-level subroutine calls
when doing pp_unshift than when doing pp_push.

I'm not sure how to fix that.

> Hardest part - are you able to write a benchmark that shows the new
> unshift code actually speeds things up by a measurable amount.
>
> Nicholas Clark
> --
> Even better than the real thing: http://nms-cgi.sourceforge.net/

--

Nick Ing-Simmons

unread,

Sep 2, 2002, 4:29:16 AM9/2/02

to gol...@earthlink.net, Orton, Yves, p5p

Benjamin Goldberg <gol...@earthlink.net> writes:
>> Orton, Yves wrote:
>>
>> Benjamin Goldberg said on 30 August 2002 21:34
>[snip]
>> > PS: In isolation, push() seems to be 11% faster than unshift()...
>> > I'm not sure whether or not it would be faster to replace the
>> > unshift() with push(), then later reverse() when slicing %cache. Is
>> > there any particular reason for unshift() to be slower?

The existing AV infrastructure easily allows for a hunk of the memory
allocated to be left unused at the end. To leave a chunk unused
at the begining would seem to need an extra index of "start of used"
in addition to the current "allocated" and "end" values.

>>
>> Apparently to optimize for the common case. (People push/pop more
>> often than they unshift/shift.) There is an excellent article on it on
>> perlmonks titled "Shift, Pop, Unshift and Push with Impunity!"
>> reachable from
>>
>> http://www.perlmonks.org/index.pl?node_id=17890
>
>It seems to me that although a newly created AV should be biased to make
>push()es more efficient than unshift()s, perl *could* be even more
>clever -- if an AV needs to be re-allocated due to a push, bias the AV
>so push is more efficient; if an AV needs to be re-allocated due to an
>unshift, bias the AV so that unshift is more efficient. This of course
>assumes that a bunch of push()es will be followed shortly thereafter by
>more push()es, and that a bunch of unshift()s will be followed shortly
>thereafter by more unshift()s -- not an unreasonable assumption, IMHO.
>
>If this sounds like a good idea to anyone else besides me, I'll write a
>patch.
--

Nick Ing-Simmons
http://www.ni-s.u-net.com/

Dave Mitchell

unread,

Sep 2, 2002, 5:38:10 AM9/2/02

to Nick Ing-Simmons, gol...@earthlink.net, Orton, Yves, p5p

On Mon, Sep 02, 2002 at 09:29:16AM +0100, Nick Ing-Simmons wrote:
> Benjamin Goldberg <gol...@earthlink.net> writes:
> >> Orton, Yves wrote:
> >>
> >> Benjamin Goldberg said on 30 August 2002 21:34
> >[snip]
> >> > PS: In isolation, push() seems to be 11% faster than unshift()...
> >> > I'm not sure whether or not it would be faster to replace the
> >> > unshift() with push(), then later reverse() when slicing %cache. Is
> >> > there any particular reason for unshift() to be slower?
>
> The existing AV infrastructure easily allows for a hunk of the memory
> allocated to be left unused at the end. To leave a chunk unused
> at the begining would seem to need an extra index of "start of used"
> in addition to the current "allocated" and "end" values.

Surely xav_array and xav_alloc already allow the start of the array to
differ from the physically malloc()ed area? Eg see av_shift().

Dave.

--
Nothing ventured, nothing lost.

Mark-Jason Dominus

unread,

Sep 2, 2002, 10:59:14 AM9/2/02

to Nick Ing-Simmons, p5p, m...@plover.com

Says Nick Ing-Simmons <nick.ing...@elixent.com>:

> The existing AV infrastructure easily allows for a hunk of the memory
> allocated to be left unused at the end. To leave a chunk unused
> at the begining would seem to need an extra index of "start of used"
> in addition to the current "allocated" and "end" values.

Unless I misunderstand your point, it already does have this.

struct xpvav {
char* xav_array; /* pointer to first array element */
...
SV** xav_alloc; /* pointer to beginning of C array of SVs */
...
};

And av_unshift already has code to take advantage of this unused space
if it is present.