[ this came up WRT calling conventions ]
> I assume he's doing bsr/ret to get into and
> out of the sub, which is going to be significantly faster.
Who says that?
As already stated, I don't consider these as either light-weight nor
faster. Here is a benchmark.
Below are 2 version of a recursive factorial program. fact(100) is
calculated 1000 times:
PIR 1.1 s
bsr/ret 2.4 s
PIR/tailcall 0.2s
Unoptimized Parrot, default i.e. slow run core.
leo
> [pmichaud@localhost pmichaud]$ parrot pmfact.imc #PIR
> 500500
> 5.819842
> [pmichaud@localhost pmichaud]$ parrot pmfactbsr.imc #bsr/ret
> 500500
> 2.010935
Ok:
$ parrot pmfactbsr.imc
500500
3.459947
$ parrot -Oc pmfact.imc
500500
1.237185
Now what ;)
Are you sure, that you can't do a tailcall sometimes? What about calling
back into PIR code?
Please no premature optimizations.
> Please keep in mind that I'm a newcomer to Parrot, so it's entirely
> possible that I'm made some invalid assumptions in my code that skew
> these results (and I'll freely admit them if pointed out).
I've first to understand the generated rules engine a bit. But generally
speaking: let's first do it right and then fast.
> And I will admit that the PIR code is still impressive speed-wise
> relative to what it is doing, but it's hard to ignore a 60% improvement.
Or more ...
> Pm
leo
Way to go with the overkill. I'm impressed. However, written more
sanely the results are:
PIR:
real 0m4.149s
user 0m4.120s
sys 0m0.030s
bsr/ret:
real 0m1.266s
user 0m1.260s
sys 0m0.000s
Chopping out the multiplication (since that's a not-insignificant
amount of the runtime for the bsr/ret version) gives:
PIR:
real 0m3.016s
user 0m2.990s
sys 0m0.030s
bsr/ret
real 0m0.344s
user 0m0.340s
sys 0m0.010s
The bsr/ret version is:
start:
new P16, .PerlInt
set P16, 1000
elements I16, P5
lt I16, 2, def
set S0, P5[1]
set P16, S0
def:
set I16, 0
time N16
save N16
loop:
clone P1, P16
new P0, .PerlInt
set P0, 1
save P16
save I16
bsr fact
restore I16
restore P16
inc I16
lt I16, 1000, loop
restore N16
time N17
sub N17, N17, N16
print P0
print "\n"
print N17
print "\n"
end
# in: P0 is product, p1 is count
# out: P0 is new product
fact:
gt P1, 1, doit
ret
doit:
mul P0, P0, P1
dec P1
bsr fact
ret
--
Dan
--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
It's important to note that
premature optimization == things Leo disapproves of
The bsr/ret version of things is fine. In the absolute best case
it'll be the same speed as tail calls, and in normal cases it'll be
significantly faster since it, by definition, has a lot less work to
do.
and with -Oc, for completeness:
real 0m0.416s
user 0m0.380s
sys 0m0.030s
> BTW, it may be very possible for me to write the p6ge generator so
> that it can be switched between the PIR and bsr/ret calling conventions,
> so we don't need to resolve this entirely now. And we could then benchmark
> the two against each other.
Keeping the code that flexible would be very interesting. If you can
achieve this without much extra pain, I think that it would be worth it.
Nicholas Clark
> BTW, it may be very possible for me to write the p6ge generator so
> that it can be switched between the PIR and bsr/ret calling conventions,
> so we don't need to resolve this entirely now. And we could then benchmark
> the two against each other.
That would be really great. There are a lot of things to consider, which
might or might not have an influence.
- tailcalls are faster then bsr/ret
- error traceback: not really easy with bsr/ret
- GC issues: the stack pushes consume GC-able object
- calling back into PIR (might work seemlessly or not - it's untested)
> Pm
Thanks,
leo
Sure, p6ge already makes heavy use of tailcalls. In a bsr/ret scheme,
the tail calls become branches (and, using a caller-saves convention,
there's no saving/restoring of registers). So, we end up with:
[pmichaud@localhost pmichaud]$ parrot pmfact.imc
500500
5.950604
[pmichaud@localhost pmichaud]$ parrot pmfactbsr.imc
500500
1.988304
[pmichaud@localhost pmichaud]$ parrot -Oc pmfact.imc
500500
0.933195
[pmichaud@localhost pmichaud]$ parrot -Oc pmfactbsrtail.imc
500500
0.221527
[pmichaud@localhost pmichaud]$ perl -e 'print 0.93 / 0.22, "\n"'
4.22727272727273
> What about calling back into PIR code?
Sure, I'll just use the standard PIR calling conventions for those--
no problem.
Pm
Believe it or not, I think it is less pain to do it this way
(keeping the conventions switchable) -- it certainly cleans up the
generator code a bit. We'll see how it works out. :-)
Pm
Sure, but the bsr/ret in your version is making lots of saveall calls
that I'd be avoiding. Also, this code is saving pmc's (big ones at
that) whereas I'll generally be pushing a few ints and maybe a string
onto the stack. So, rewriting the above for ints instead of PerlInts,
changing the multiply op to add to stay within the range of ints, and
removing the unneeded saves/restores for things that are being passed
as parameters anyway (and doubling the count save/restore to make it
somewhat closer to what I'd expect...):
[pmichaud@localhost pmichaud]$ parrot pmfact.imc #PIR
500500
5.819842
[pmichaud@localhost pmichaud]$ parrot pmfactbsr.imc #bsr/ret
500500
2.010935
Please keep in mind that I'm a newcomer to Parrot, so it's entirely
possible that I'm made some invalid assumptions in my code that skew
these results (and I'll freely admit them if pointed out).
And I will admit that the PIR code is still impressive speed-wise
relative to what it is doing, but it's hard to ignore a 60% improvement.
Pm
BTW, it may be very possible for me to write the p6ge generator so
that it can be switched between the PIR and bsr/ret calling conventions,
so we don't need to resolve this entirely now. And we could then benchmark
the two against each other.
Pm