light-weight calling conventions (was: Second cut at a P6 grammar engine, in Parrot)

Leopold Toetsch

unread,

Nov 17, 2004, 4:03:14 PM11/17/04

to Dan Sugalski, Patrick R. Michaud, Perl 6 Internals

Dan Sugalski wrote:

[ this came up WRT calling conventions ]

> I assume he's doing bsr/ret to get into and
> out of the sub, which is going to be significantly faster.

Who says that?

As already stated, I don't consider these as either light-weight nor
faster. Here is a benchmark.

Below are 2 version of a recursive factorial program. fact(100) is
calculated 1000 times:

PIR 1.1 s
bsr/ret 2.4 s
PIR/tailcall 0.2s

Unoptimized Parrot, default i.e. slow run core.

leo

fact.imc

factbsr.imc

Leopold Toetsch

unread,

Nov 17, 2004, 5:07:28 PM11/17/04

to Patrick R. Michaud, Dan Sugalski, Perl 6 Internals

Patrick R. Michaud wrote:
> On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote:

> [pmichaud@localhost pmichaud]$ parrot pmfact.imc #PIR
> 500500
> 5.819842
> [pmichaud@localhost pmichaud]$ parrot pmfactbsr.imc #bsr/ret
> 500500
> 2.010935

Ok:

$ parrot pmfactbsr.imc
500500
3.459947

$ parrot -Oc pmfact.imc
500500
1.237185

Now what ;)

Are you sure, that you can't do a tailcall sometimes? What about calling
back into PIR code?

Please no premature optimizations.

> Please keep in mind that I'm a newcomer to Parrot, so it's entirely
> possible that I'm made some invalid assumptions in my code that skew
> these results (and I'll freely admit them if pointed out).

I've first to understand the generated rules engine a bit. But generally
speaking: let's first do it right and then fast.

> And I will admit that the PIR code is still impressive speed-wise
> relative to what it is doing, but it's hard to ignore a 60% improvement.

Or more ...

> Pm

leo

Dan Sugalski

unread,

Nov 17, 2004, 5:08:30 PM11/17/04

to Leopold Toetsch, Patrick R. Michaud, Perl 6 Internals

Way to go with the overkill. I'm impressed. However, written more
sanely the results are:

PIR:
real 0m4.149s
user 0m4.120s
sys 0m0.030s

bsr/ret:
real 0m1.266s
user 0m1.260s
sys 0m0.000s

Chopping out the multiplication (since that's a not-insignificant
amount of the runtime for the bsr/ret version) gives:

PIR:
real 0m3.016s
user 0m2.990s
sys 0m0.030s

bsr/ret
real 0m0.344s
user 0m0.340s
sys 0m0.010s

The bsr/ret version is:
start:
new P16, .PerlInt
set P16, 1000

elements I16, P5
lt I16, 2, def
set S0, P5[1]
set P16, S0

def:
set I16, 0
time N16
save N16

loop:
clone P1, P16
new P0, .PerlInt
set P0, 1
save P16
save I16
bsr fact
restore I16
restore P16
inc I16
lt I16, 1000, loop
restore N16
time N17
sub N17, N17, N16
print P0
print "\n"
print N17
print "\n"
end

# in: P0 is product, p1 is count
# out: P0 is new product
fact:
gt P1, 1, doit
ret
doit:
mul P0, P0, P1
dec P1
bsr fact
ret

--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Dan Sugalski

unread,

Nov 17, 2004, 5:16:40 PM11/17/04

to Leopold Toetsch, Patrick R. Michaud, Perl 6 Internals

At 11:07 PM +0100 11/17/04, Leopold Toetsch wrote:
>Please no premature optimizations.

It's important to note that

premature optimization == things Leo disapproves of

The bsr/ret version of things is fine. In the absolute best case
it'll be the same speed as tail calls, and in normal cases it'll be
significantly faster since it, by definition, has a lot less work to
do.

Dan Sugalski

unread,

Nov 17, 2004, 5:19:55 PM11/17/04

to Leopold Toetsch, Patrick R. Michaud, Perl 6 Internals

At 5:08 PM -0500 11/17/04, Dan Sugalski wrote:
>Chopping out the multiplication (since that's a not-insignificant
>amount of the runtime for the bsr/ret version) gives:
>
>PIR:
>real 0m3.016s
>user 0m2.990s
>sys 0m0.030s
>
>bsr/ret
>real 0m0.344s
>user 0m0.340s
>sys 0m0.010s

and with -Oc, for completeness:

real 0m0.416s
user 0m0.380s
sys 0m0.030s

Nicholas Clark

unread,

Nov 17, 2004, 5:35:47 PM11/17/04

to Patrick R. Michaud, Leopold Toetsch, Dan Sugalski, Perl 6 Internals

On Wed, Nov 17, 2004 at 02:47:09PM -0700, Patrick R. Michaud wrote:

> BTW, it may be very possible for me to write the p6ge generator so
> that it can be switched between the PIR and bsr/ret calling conventions,
> so we don't need to resolve this entirely now. And we could then benchmark
> the two against each other.

Keeping the code that flexible would be very interesting. If you can
achieve this without much extra pain, I think that it would be worth it.

Nicholas Clark

Leopold Toetsch

unread,

Nov 18, 2004, 1:55:47 AM11/18/04

to Patrick R. Michaud, perl6-i...@perl.org

Patrick R. Michaud <pmic...@pobox.com> wrote:

> BTW, it may be very possible for me to write the p6ge generator so
> that it can be switched between the PIR and bsr/ret calling conventions,
> so we don't need to resolve this entirely now. And we could then benchmark
> the two against each other.

That would be really great. There are a lot of things to consider, which
might or might not have an influence.
- tailcalls are faster then bsr/ret
- error traceback: not really easy with bsr/ret
- GC issues: the stack pushes consume GC-able object
- calling back into PIR (might work seemlessly or not - it's untested)

> Pm

Thanks,
leo

Patrick R. Michaud

unread,

Nov 17, 2004, 6:12:01 PM11/17/04

to Leopold Toetsch, Dan Sugalski, Perl 6 Internals

On Wed, Nov 17, 2004 at 11:07:28PM +0100, Leopold Toetsch wrote:
> $ parrot pmfactbsr.imc
> 500500
> 3.459947
>
> $ parrot -Oc pmfact.imc
> 500500
> 1.237185
>
> Now what ;)
> Are you sure, that you can't do a tailcall sometimes?

Sure, p6ge already makes heavy use of tailcalls. In a bsr/ret scheme,
the tail calls become branches (and, using a caller-saves convention,
there's no saving/restoring of registers). So, we end up with:

[pmichaud@localhost pmichaud]$ parrot pmfact.imc

500500
5.950604

[pmichaud@localhost pmichaud]$ parrot pmfactbsr.imc

500500
1.988304
[pmichaud@localhost pmichaud]$ parrot -Oc pmfact.imc
500500
0.933195
[pmichaud@localhost pmichaud]$ parrot -Oc pmfactbsrtail.imc
500500
0.221527

[pmichaud@localhost pmichaud]$ perl -e 'print 0.93 / 0.22, "\n"'
4.22727272727273

> What about calling back into PIR code?

Sure, I'll just use the standard PIR calling conventions for those--
no problem.

Pm

Patrick R. Michaud

unread,

Nov 17, 2004, 6:34:29 PM11/17/04

to Leopold Toetsch, Dan Sugalski, Perl 6 Internals

Believe it or not, I think it is less pain to do it this way
(keeping the conventions switchable) -- it certainly cleans up the
generator code a bit. We'll see how it works out. :-)

Pm

Patrick R. Michaud

unread,

Nov 17, 2004, 4:41:08 PM11/17/04

to Leopold Toetsch, Dan Sugalski, Perl 6 Internals

On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote:

> As already stated, I don't consider these as either light-weight nor
> faster. Here is a benchmark.
>
> Below are 2 version of a recursive factorial program. fact(100) is
> calculated 1000 times:
>
> PIR 1.1 s
> bsr/ret 2.4 s
> PIR/tailcall 0.2s
>
> Unoptimized Parrot, default i.e. slow run core.

Sure, but the bsr/ret in your version is making lots of saveall calls
that I'd be avoiding. Also, this code is saving pmc's (big ones at
that) whereas I'll generally be pushing a few ints and maybe a string
onto the stack. So, rewriting the above for ints instead of PerlInts,
changing the multiply op to add to stay within the range of ints, and
removing the unneeded saves/restores for things that are being passed
as parameters anyway (and doubling the count save/restore to make it
somewhat closer to what I'd expect...):

[pmichaud@localhost pmichaud]$ parrot pmfact.imc #PIR
500500
5.819842
[pmichaud@localhost pmichaud]$ parrot pmfactbsr.imc #bsr/ret
500500
2.010935

Please keep in mind that I'm a newcomer to Parrot, so it's entirely
possible that I'm made some invalid assumptions in my code that skew
these results (and I'll freely admit them if pointed out).

And I will admit that the PIR code is still impressive speed-wise
relative to what it is doing, but it's hard to ignore a 60% improvement.

Pm

pmfact.imc

pmfactbsr.imc

Patrick R. Michaud

unread,

Nov 17, 2004, 4:47:09 PM11/17/04

to Leopold Toetsch, Dan Sugalski, Perl 6 Internals

On Wed, Nov 17, 2004 at 10:03:14PM +0100, Leopold Toetsch wrote:

> Dan Sugalski wrote:
>
> As already stated, I don't consider these as either light-weight nor
> faster. Here is a benchmark.
>
> Below are 2 version of a recursive factorial program. fact(100) is
> calculated 1000 times:
>
> PIR 1.1 s
> bsr/ret 2.4 s
> PIR/tailcall 0.2s
>
> Unoptimized Parrot, default i.e. slow run core.

BTW, it may be very possible for me to write the p6ge generator so

that it can be switched between the PIR and bsr/ret calling conventions,
so we don't need to resolve this entirely now. And we could then benchmark
the two against each other.

Pm