Speed question: save/restore vs push/pop

Benjamin Goldberg

unread,

Mar 17, 2003, 1:37:36 AM3/17/03

to perl6-i...@perl.org

The answer to this question varies from platform to platform, and I've
only go windows to test on...

If I do 32 "save"s in a row, this will certainly be slower than doing a
single "push".

If I do 1 "save", this will (hopefully) be faster than 1 "push".

How many "save"s does it take to be to be slower than one "push"?

(When writing pasm by hand, what's a reasonable cutoff?)

--
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$@[$a%6
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Steve Fink

unread,

Mar 17, 2003, 2:48:47 AM3/17/03

to Benjamin Goldberg, perl6-i...@perl.org

On Mar-17, Benjamin Goldberg wrote:
>
> The answer to this question varies from platform to platform, and I've
> only go windows to test on...
>
> If I do 32 "save"s in a row, this will certainly be slower than doing a
> single "push".
>
> If I do 1 "save", this will (hopefully) be faster than 1 "push".
>
> How many "save"s does it take to be to be slower than one "push"?
>
> (When writing pasm by hand, what's a reasonable cutoff?)

I would guess that about 256 million saves is the same as 8 million
pushes, but that really depends on how much space an entry on the
respective stacks take and how large your virtual address space can be
-- because the only way to make them equivalent is to call enough of
them to run out of memory and crash. :-)

Sorry, I'm being obnoxious. You fell into the same trap as I recently
did. pushx and save are not interchangeable; they operate on
completely different stacks. 'save' pushes an entry of arbitrary type
onto the user stack; pushi, pushs, and friends push onto type-specific
register frame stacks.

Somehow, this needs to be documented better, because it's quite
surprising (to me, at least.)

Leopold Toetsch

unread,

Mar 17, 2003, 2:52:24 AM3/17/03

to Benjamin Goldberg, perl6-i...@perl.org

Benjamin Goldberg wrote:

> The answer to this question varies from platform to platform, and I've
> only go windows to test on...
>
> If I do 32 "save"s in a row, this will certainly be slower than doing a
> single "push".
>
> If I do 1 "save", this will (hopefully) be faster than 1 "push".

Yep slightly.

> How many "save"s does it take to be to be slower than one "push"?

This really depends on the architecture, the running core and so on. But
Dan estimated a cutoff value of 3, this test program indicates a cutoff
of 2:

set I0, 1000000
time N0
lp:
pushp # or save P0, ...
popp # or restore P0, ...
dec I0
if I0, lp
time N1
sub N1, N0
print N1
print " s\n"
end

Loop only 0.02s (0.002 -j)
1 save_p + 1 restore_p 0.2s
2 save_p + 2 restore_p 0.4s
3 save_p + 3 restore_p 0.6s
1 pushp + 1 popp 0.38s

All run with the CGP core (-P switch), which is fastest here because
pushX/save/restore are not JITed.

Athlon 800, i386/linux, non optimized compile.

leo

Benjamin Goldberg

unread,

Mar 17, 2003, 3:34:07 AM3/17/03

to perl6-i...@perl.org

Steve Fink wrote:
>
> On Mar-17, Benjamin Goldberg wrote:
> >
> > The answer to this question varies from platform to platform, and
> > I've only go windows to test on...
> >
> > If I do 32 "save"s in a row, this will certainly be slower than
> > doing a single "push".
> >
> > If I do 1 "save", this will (hopefully) be faster than 1 "push".
> >
> > How many "save"s does it take to be to be slower than one "push"?
> >
> > (When writing pasm by hand, what's a reasonable cutoff?)
>
> I would guess that about 256 million saves is the same as 8 million
> pushes, but that really depends on how much space an entry on the
> respective stacks take and how large your virtual address space can be
> -- because the only way to make them equivalent is to call enough of
> them to run out of memory and crash. :-)

Regarding memory:
Entries on the type-specific stacks consume no more memory than the
registers themselves.
Entries on the general stack (where save/restore goes) also have, in
addition to the saved register, an integer indicating the type of
register that's been saved, to prevent pushing an int and popping a
string.
Thus, 16 int save ops consume as much memory as one pushi, even
though the pushi is saving 32 int registers.

Regarding speed:
Each save is a (non-jit, non-inlined) function call.
Each push is a (non-jit, non-inlined) function call.
Even if there is *little* overhead for function calls, it's nonzero,
and it adds up.

> Sorry, I'm being obnoxious. You fell into the same trap as I recently
> did. pushx and save are not interchangeable; they operate on
> completely different stacks. 'save' pushes an entry of arbitrary type
> onto the user stack; pushi, pushs, and friends push onto type-specific
> register frame stacks.
>
> Somehow, this needs to be documented better, because it's quite
> surprising (to me, at least.)

Actually, I did realize that they go on completely different stacks.

What I didn't know (though I do now, thanks to Leopold's post) was their
relative speeds.

Dan Sugalski

unread,

Mar 17, 2003, 9:58:07 AM3/17/03

to Leopold Toetsch, Benjamin Goldberg, perl6-i...@perl.org

At 8:52 AM +0100 3/17/03, Leopold Toetsch wrote:

>Benjamin Goldberg wrote:
>
>>How many "save"s does it take to be to be slower than one "push"?
>
>
>This really depends on the architecture, the running core and so on.
>But Dan estimated a cutoff value of 3, this test program indicates a
>cutoff of 2:

For me it was something like 2.37 push = 1 save.

Saves also have the advantage of, on some architectures (like SPARC),
not polluting the cache. Doing the pushes dirties your L1 & L2 caches
for both the source and destination, while the save doesn't, though
the registers are likely already in L2, if not L1, cache. SPARC has a
cache-bypassing memcpy, which is kind of cool. While you might think
that's a bad thing, you'd actually be incorrect.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk