[RFC] imcc calling conventions

Leopold Toetsch

unread,

Feb 18, 2003, 6:31:31 AM2/18/03

to P6I

Attached is a pod
- describing the current existing stack calling convention
- proposing a syntax for parrot's NCI calling convention.

Comments ... welcome,
leo

calling_conventions.pod

Steve Fink

unread,

Feb 20, 2003, 12:55:35 PM2/20/03

to Leopold Toetsch, P6I

On Feb-18, Leopold Toetsch wrote:
> =head1 Stack calling conventions
>
> Arguments are B<save>d in reverse order onto the user stack:
>
> .arg y # save args in reversed order
> .arg x
> call _foo #(r, s) = _foo(x,y)
> .local int r
> .local int s
> .result s # restore results in reversed order
> .result r # [1]
>
> and return values are B<restore>d in reversed order from there.
>
> The subroutine is responsible for preserving registers.
>
> .sub _foo # sub foo(int a, int b)
> saveall
> .param int a
> .param int b
> ...
>
> .return pl # return (pl, mi)
> .return mi # [1]
> restoreall
> ret
> .end

My immediate reaction to this (okay, I really saw this before in
perl6-generated code) is "why don't the values from .return and
restoreall get mixed up?"

You may want to add a brief description of the kazillion different
stacks that Parrot uses. There are six, I think:

1. The user stack -- holds a stack of unions, so entries can be
int/float/pmc/string
2. The control stack -- holds return addresses
3. The int register stack -- holds frames of 32 ints each
4. The float register stack
5. The string register stack
6. The pmc register stack

I think this has been discussd before, but are we all okay with this
callee-save-everything policy? At the very least, I'd be tempted to
add a bitmasked saveall/restoreall pair to reduce the amount of cache
thrashing. ("saveall 0b00100110111111110000000000000000") It just
seems odd that you have to either save all 32 of one of the types of
registers, or to save selected ones onto a different stack. But it
*is* simpler to copy over the entire register file to a stack frame, I
guess.

Taking that farther, I've always liked systems that divide up the
registers into callee-save, caller-save, and scratch (nobody-save?)
Maybe that's just me. And I vaguelly recall that there was some
discussion I didn't follow about how that interferes with tail-call
optimization. (To me, "tail call optimization" == "replace recursive
call with a goto to the end of the function preamble")

Or, as another stab at the same problem, does Parrot really need 32*4
registers? I keep thinking we might be better off with 16 of each
type. But maybe I'm just grumbling.

[It's nice to have a net connection again!]

Nicholas Clark

unread,

Feb 22, 2003, 6:02:27 AM2/22/03

to Steve Fink, Leopold Toetsch, P6I

On Thu, Feb 20, 2003 at 09:55:35AM -0800, Steve Fink wrote:
> I think this has been discussd before, but are we all okay with this
> callee-save-everything policy? At the very least, I'd be tempted to
> add a bitmasked saveall/restoreall pair to reduce the amount of cache
> thrashing. ("saveall 0b00100110111111110000000000000000") It just
> seems odd that you have to either save all 32 of one of the types of
> registers, or to save selected ones onto a different stack. But it
> *is* simpler to copy over the entire register file to a stack frame, I
> guess.

I agree that the bitmasked savesome/restoresome would be less simple.
I suspect that a butmasked version would JIT very nicely.

Since when did parrot trade simplicity for speed?

Nicholas Clark

Leopold Toetsch

unread,

Feb 22, 2003, 6:09:12 AM2/22/03

to Steve Fink, P6I

Steve Fink wrote:

> On Feb-18, Leopold Toetsch wrote:

[ ... ]

>> .return mi # [1]
>> restoreall
>> ret
>> .end
>>
>
> My immediate reaction to this (okay, I really saw this before in
> perl6-generated code) is "why don't the values from .return and
> restoreall get mixed up?"

Yep. It is at least confusing at the beginning.

> You may want to add a brief description of the kazillion different
> stacks that Parrot uses. There are six, I think:

7 if you count the intstack too.

> I think this has been discussd before, but are we all okay with this
> callee-save-everything policy?

I'm not that convinced of it - I still think the subroutine knows best,
how to preserve its registers.

> ... At the very least, I'd be tempted to

> add a bitmasked saveall/restoreall pair to reduce the amount of cache
> thrashing. ("saveall 0b00100110111111110000000000000000") It just
> seems odd that you have to either save all 32 of one of the types of
> registers, or to save selected ones onto a different stack. But it
> *is* simpler to copy over the entire register file to a stack frame, I
> guess.

Probably yes.

> Taking that farther, I've always liked systems that divide up the
> registers into callee-save, caller-save, and scratch (nobody-save?)

Register windows, like (AFAIK) IA64 has? This would avoid a lot of
saveall/restoreall/saveX/restoreX for smaller subroutines.

> Maybe that's just me. And I vaguelly recall that there was some
> discussion I didn't follow about how that interferes with tail-call
> optimization. (To me, "tail call optimization" == "replace recursive
> call with a goto to the end of the function preamble")

Dan did mention it, when arguing for callee-save calling coventions. But
I don't know how it exactly works.

> Or, as another stab at the same problem, does Parrot really need 32*4
> registers? I keep thinking we might be better off with 16 of each
> type. But maybe I'm just grumbling.

I think 16 might be not enough. So the current setting is ok.

leo

Dan Sugalski

unread,

Feb 22, 2003, 5:59:58 PM2/22/03

to Steve Fink, Leopold Toetsch, P6I

At 9:55 AM -0800 2/20/03, Steve Fink wrote:
>I think this has been discussd before, but are we all okay with this
>callee-save-everything policy?

Nope. It's safe to say that a lot of folks aren't. :)

Still, I think it's the right way to go in the general case. Most sub
calls of any complexity will be saving off all the I and P registers
(at least, probably the S registers too) that I don't see it saving
anything except in the trivial leaf sub case. Which, I realize, is
common for some styles of programming, but those aren't the styles
that people use with perl/python/ruby generally.

>At the very least, I'd be tempted to
>add a bitmasked saveall/restoreall pair to reduce the amount of cache
>thrashing. ("saveall 0b00100110111111110000000000000000") It just
>seems odd that you have to either save all 32 of one of the types of
>registers, or to save selected ones onto a different stack. But it
>*is* simpler to copy over the entire register file to a stack frame, I
>guess.

Faster in a number of ways. I considered the bitmask, but then you
have the issue of lots of bit tests which isn't cheap in software.
(Hardware yes, but we don't have that on our side) Besides that,
doing a simple bit-blast is hardware accelerated on many systems, and
cache friendly on others, which makes it a better option overall.

At one point I did a test and found that whole-register-frame saves
were faster than saving three individual registers in a frame, though
we do have a relatively heavy-weight general purpose stack.

>Taking that farther, I've always liked systems that divide up the
>registers into callee-save, caller-save, and scratch (nobody-save?)
>Maybe that's just me. And I vaguelly recall that there was some
>discussion I didn't follow about how that interferes with tail-call
>optimization. (To me, "tail call optimization" == "replace recursive
>call with a goto to the end of the function preamble")

I can see doing this. If there was some sort of metainformation that
would allow us to know at compile time that registers were safe we
could emit different code, though there's still the issue of nested
calls where there's limited info.

>Or, as another stab at the same problem, does Parrot really need 32*4
>registers? I keep thinking we might be better off with 16 of each
>type. But maybe I'm just grumbling.

Yeah, 32 is a bunch. I've considered going with 16 on and off, and still might.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Benjamin Goldberg

unread,

Feb 22, 2003, 8:15:43 PM2/22/03

to perl6-i...@perl.org

Dan Sugalski wrote:
>
> At 9:55 AM -0800 2/20/03, Steve Fink wrote:

[snip]

> >Or, as another stab at the same problem, does Parrot really need 32*4
> >registers? I keep thinking we might be better off with 16 of each
> >type. But maybe I'm just grumbling.
>
> Yeah, 32 is a bunch. I've considered going with 16 on and off, and
> still might.

Given that registers are allocated with the lower numbers being the ones
used more often, how about having 32 registers, as we now have, but two
different ops for saving -- one of which saves registers 0 .. 15, the
other saves all 0 .. 31. Or is this just a dumb idea?

--
$;=qq qJ,krleahciPhueerarsintoitq;sub __{0 &&
my$__;s ee substr$;,$,&&++$__%$,--,1,qq;;;ee;
$__>2&&&__}$,=22+$;=~y yiy y;__ while$;;print

Leopold Toetsch

unread,

Feb 23, 2003, 4:14:28 AM2/23/03

to Benjamin Goldberg, perl6-i...@perl.org

Benjamin Goldberg wrote:

> Dan Sugalski wrote:
>
>>Yeah, 32 is a bunch. I've considered going with 16 on and off, and
>>still might.

16 is not enough for non trivial subroutines. E.g. the _Generate sub in
life.p6 consumes 25 P regs.

> Given that registers are allocated with the lower numbers being the ones
> used more often, how about having 32 registers, as we now have, but two
> different ops for saving -- one of which saves registers 0 .. 15, the
> other saves all 0 .. 31. Or is this just a dumb idea?

I think, this is a good idea. For caller-save subroutines we could have:

saveall 10,0,1,2 # save I0..I9, no S, P0, N0..N1
...
restoreall 10,0,1,2
ret

accompanied with push{i,s,p,n}_i and pop{i,s,p,n}_i, for
saving/restoring just a certain amount of one kind of registers.

leo

Piers Cawley

unread,

Feb 24, 2003, 12:29:26 PM2/24/03

to Steve Fink, Leopold Toetsch, P6I

Steve Fink <st...@fink.com> writes:
> I think this has been discussd before, but are we all okay with this
> callee-save-everything policy? At the very least, I'd be tempted to
> add a bitmasked saveall/restoreall pair to reduce the amount of cache
> thrashing. ("saveall 0b00100110111111110000000000000000") It just
> seems odd that you have to either save all 32 of one of the types of
> registers, or to save selected ones onto a different stack. But it
> *is* simpler to copy over the entire register file to a stack frame, I
> guess.
>
> Taking that farther, I've always liked systems that divide up the
> registers into callee-save, caller-save, and scratch (nobody-save?)
> Maybe that's just me. And I vaguelly recall that there was some
> discussion I didn't follow about how that interferes with tail-call
> optimization. (To me, "tail call optimization" == "replace recursive
> call with a goto to the end of the function preamble")

Um... no. tail call optimization implies being able to replace *any*
tail call, not just a recursive one with a simple goto. Consider the
following calling sequence:

sub a {
b(arg);
# Continuation A
...;
}

sub b {
...;
c(different_arg);
# Continuation B;
c(another_arg);
}

sub c {
...
}

We have three function calls here, b(arg), c(different_arg), and
c(another_arg).

Under caller saves, those calls look like the following:

b(arg) -> Push Continuation A onto the continuation chain
Setup the parameters
Save anything important.
Jump to B.

c(different_arg) -> Push Continuation B onto the continuation chain
Setup the params
Save anything important
jump to C, C will return to continuation B

at Continuation B
-> pop the continuation stack
-> restore saved params

c(another_arg) -> Setup the params
jump to C, *C will return to continuation A*

Under caller saves, this is easy to do. Under callee saves, b's second
call to c() cannot simply return to Continuation A but must unwind the
call stack via b to make sure that the right things are restored.

Thinking about this still further, I'm not sure whether it's tail call
optimization or simply the presence of first class continuations that
makes callee saves an unwise proposition.

Why do I have the feeling that I've managed to make this whole thing
even less clear now?

--
Piers

Leopold Toetsch

unread,

Feb 25, 2003, 2:47:55 AM2/25/03

to Piers Cawley, Steve Fink, P6I

Piers Cawley wrote:

> Steve Fink <st...@fink.com> writes:
>
>>... I didn't follow about how that interferes with tail-call

>>optimization. (To me, "tail call optimization" == "replace recursive
>>call with a goto to the end of the function preamble")
>>
>
> Um... no. tail call optimization implies being able to replace *any*
> tail call, not just a recursive one with a simple goto. Consider the
> following calling sequence:

> b(arg) -> Push Continuation A onto the continuation chain

Continuations are different anyway. They store the context of the parent
at the point they were created, and on invoke they swap context.

> Why do I have the feeling that I've managed to make this whole thing
> even less clear now?

You say it ;-)
leo

Jerome Vouillon

unread,

Feb 25, 2003, 4:27:20 AM2/25/03

to Leopold Toetsch, Piers Cawley, Steve Fink, P6I

On Tue, Feb 25, 2003 at 08:47:55AM +0100, Leopold Toetsch wrote:
> >Um... no. tail call optimization implies being able to replace *any*
> >tail call, not just a recursive one with a simple goto. Consider the
> >following calling sequence:
>
>
> > b(arg) -> Push Continuation A onto the continuation chain
>
>
> Continuations are different anyway. They store the context of the parent
> at the point they were created, and on invoke they swap context.

You don't mean the same thing by continuation. For Piers, a
continuation is just a place where one can jump. So, for a function
call, you push the return continuation (the place where you must jump
on return) onto the stack and jump to the continuation corresponding
to the function body. What you mean by a continuation is what one may
call "first-class continuation", that is a continuation which can be
manipulated like any other "object" (you can pass it as argument of a
function, return it from a function, put it in a variable, ...)

-- Jerome

Jerome Vouillon

unread,

Feb 25, 2003, 4:33:11 AM2/25/03

to Piers Cawley, Steve Fink, Leopold Toetsch, P6I

On Mon, Feb 24, 2003 at 05:29:26PM +0000, Piers Cawley wrote:
[Tail-call optimization]

> Under caller saves, this is easy to do. Under callee saves, b's second
> call to c() cannot simply return to Continuation A but must unwind the
> call stack via b to make sure that the right things are restored.

Note that we have the same situation with exceptions. Under caller
saves, we just need to consider the deeper stack frame, while under
callee saves one need to unwind the stack.

-- Jerome

Piers Cawley

unread,

Feb 25, 2003, 6:18:47 AM2/25/03

to Leopold Toetsch, Steve Fink, P6I

Leopold Toetsch <l...@toetsch.at> writes:

> Piers Cawley wrote:
>
>> Steve Fink <st...@fink.com> writes:
>>
>>>... I didn't follow about how that interferes with tail-call
>>>optimization. (To me, "tail call optimization" == "replace recursive
>>>call with a goto to the end of the function preamble")
>>>
>> Um... no. tail call optimization implies being able to replace *any*
>> tail call, not just a recursive one with a simple goto. Consider the
>> following calling sequence:
>
>
>> b(arg) -> Push Continuation A onto the continuation chain
>
>
> Continuations are different anyway. They store the context of the
> parent at the point they were created, and on invoke they swap context.

But the 'return target' of the current subroutine can be thought of as
a continuation. It often makes sense to do so.

--
Piers

Piers Cawley

unread,

Feb 25, 2003, 11:45:25 AM2/25/03

to Jerome Vouillon, Leopold Toetsch, Steve Fink, P6I

Jerome Vouillon <voui...@pps.jussieu.fr> writes:

> On Tue, Feb 25, 2003 at 08:47:55AM +0100, Leopold Toetsch wrote:
>> >Um... no. tail call optimization implies being able to replace *any*
>> >tail call, not just a recursive one with a simple goto. Consider the
>> >following calling sequence:
>>
>>
>> > b(arg) -> Push Continuation A onto the continuation chain
>>
>>
>> Continuations are different anyway. They store the context of the parent
>> at the point they were created, and on invoke they swap context.
>
> You don't mean the same thing by continuation. For Piers, a
> continuation is just a place where one can jump.

No, a continuation is just a place to which one might return.

> So, for a function call, you push the return continuation (the place
> where you must jump on return) onto the stack and jump to the
> continuation corresponding to the function body. What you mean by a
> continuation is what one may call "first-class continuation", that
> is a continuation which can be manipulated like any other "object"
> (you can pass it as argument of a function, return it from a
> function, put it in a variable, ...)

Which can also be thought of as 'just a place to which one can
return', but you can explicitly alter what one returns.

--
Piers

Benjamin Goldberg

unread,

Feb 25, 2003, 6:04:13 PM2/25/03

to perl6-i...@perl.org

Piers Cawley wrote:
[snip]

> Um... no. tail call optimization implies being able to replace *any*
> tail call, not just a recursive one with a simple goto.

[snip]

In perl5, doing a tail call optimization can be done with just a simple
goto... well, 'goto &subname', anyway. (Well, first you'd assign
something to @_, then goto &subname).

Since (supposedly) there's going to be a perl5->parrot compiler, there
needs to be support for perl5's goto &subname. ISTM that once we have
figured out an efficient way of implementing that, we'll also have an
efficient way of doing native tail call optimization.

As a wild-ass-guess, an optimized tail call will look something like:

.sub _foo # sub foo(int a, int b)
saveall

.param int a # a = pop @_
.param int b # b = pop @_
...

.arg int x # push @_, x
.arg int u # push @_, u
.arg int q # push @_, q
restoreall
jnsr _bar # goto &_bar
.end

.sub _bar # sub bar(int q, int u, int x) {
saveall
.param int q # q = pop @_
.param int u # u = pop @_
.param int x # x = pop @_
...

.return int pl # push @_, pl
.return int ml # push @_, ml
restoreall
ret
.end

The 'jnsr' opcode (JumpNoSaveReturn) might be spelled as 'j' or as
'goto', or something else entirely, depending on what's least confusing,
and most aesthetically pleasing.

Piers Cawley

unread,

Feb 26, 2003, 2:34:58 AM2/26/03

to Benjamin Goldberg, perl6-i...@perl.org

Benjamin Goldberg <gol...@earthlink.net> writes:

> Piers Cawley wrote:
> [snip]
>> Um... no. tail call optimization implies being able to replace *any*
>> tail call, not just a recursive one with a simple goto.
> [snip]
>
> In perl5, doing a tail call optimization can be done with just a simple
> goto... well, 'goto &subname', anyway. (Well, first you'd assign
> something to @_, then goto &subname).

Ah... this discussion has been done in p5p and elsewhere; whilst goto
&sub could, in theory, do tail call optimization, in practice it seems
to be as slow as any other function call.

The problem here is that you've pushed two loads of registers onto the
register stack, and the return is only going to pop one set off. And
it'll be the wrong set at that. And you can't add an extra
'restoreall' to _bar because _bar could easily be called normally as
well as via a tailcall.

--
Piers

Benjamin Goldberg

unread,

Feb 26, 2003, 10:03:14 PM2/26/03

to perl6-i...@perl.org

Piers Cawley wrote:
>
> Benjamin Goldberg <gol...@earthlink.net> writes:
>
> > Piers Cawley wrote:
> > [snip]
> >> Um... no. tail call optimization implies being able to replace *any*
> >> tail call, not just a recursive one with a simple goto.
> > [snip]
> >
> > In perl5, doing a tail call optimization can be done with just a simple
> > goto... well, 'goto &subname', anyway. (Well, first you'd assign
> > something to @_, then goto &subname).
>
> Ah... this discussion has been done in p5p and elsewhere; whilst goto
> &sub could, in theory, do tail call optimization, in practice it seems
> to be as slow as any other function call.

Really? I did not know that.

However, even if it's not significantly faster, at least it saves
memory, by discarding a level of saved stack information.

I think you're overlooking the "restoreall" done just before the
jump-no-save-returnaddress operation... I see two "saveall"s and
two "restoreall"s.

At the point that 'jnsr _bar' is called, all the stacks should look
exactly as if the code which called _foo had called _bar instead.

(The "ret" doesn't pop a set of registers, just an address to goto.)

> And it'll be the wrong set at that. And you can't add an extra
> 'restoreall' to _bar because _bar could easily be called normally as
> well as via a tailcall.

I would not have suggested such a thing. Tail call optomization in
parrot should be about the same logical semantics as perl5's goto
&subname (except maybe being faster).

Dan Sugalski

unread,

Feb 27, 2003, 8:48:45 AM2/27/03

to perl6-i...@perl.org

At 10:03 PM -0500 2/26/03, Benjamin Goldberg wrote:
>I would not have suggested such a thing. Tail call optomization in
>parrot should be about the same logical semantics as perl5's goto
>&subname (except maybe being faster).

No it shouldn't. Tail call optimization is far more useful in the case of:

sub foo {
# do something
bar();
}

or

sub foo {
# do something
foo();
}

than in perl's "goto &foo" feature, if for no other reason (and there
are other reasons) than it's far more common.

Benjamin Goldberg

unread,

Feb 27, 2003, 6:57:04 PM2/27/03

to perl6-i...@perl.org

Dan Sugalski wrote:
>
> At 10:03 PM -0500 2/26/03, Benjamin Goldberg wrote:
> >I would not have suggested such a thing. Tail call optomization in
> >parrot should be about the same logical semantics as perl5's goto
> >&subname (except maybe being faster).
>
> No it shouldn't. Tail call optimization is far more useful in the case
> of:
>
> sub foo {
> # do something
> bar();
> }
>
> or
>
> sub foo {
> # do something
> foo();
> }
>
> than in perl's "goto &foo" feature, if for no other reason (and there
> are other reasons) than it's far more common.

The above two things *could* be written as:

sub foo {
# do something

@_ = (); goto &bar;
}

or

sub foo {
# do something

@_ = (); goto &foo;
}

These are, for all intents and purposes, tail-call optimized versions of
the two functions you showed. True, this might not gain us much speed
in perl5, but it demonstrates the type of effect that I think that we're
aiming at, for parrot's tail-call optimization -- except of course that
in parrot, it would hopefully be faster.

Dan Sugalski

unread,

Feb 27, 2003, 7:27:13 PM2/27/03

to perl6-i...@perl.org

At 6:57 PM -0500 2/27/03, Benjamin Goldberg wrote:
>These are, for all intents and purposes, tail-call optimized versions of
>the two functions you showed. True, this might not gain us much speed
>in perl5, but it demonstrates the type of effect that I think that we're
>aiming at, for parrot's tail-call optimization -- except of course that
>in parrot, it would hopefully be faster.

But Ben... that's not what we're aiming for with the tail call
optimization stuff. We're shooting for the traditional tail call
semantics, the way I showed. Perl's funky goto, while useful and
something we definitely need to support, isn't the proper design
target here, as it's so infrequent relatively speaking.

Piers Cawley

unread,

Mar 3, 2003, 8:07:36 AM3/3/03

to Benjamin Goldberg, perl6-i...@perl.org

But with proper tail call optimization you'd only need *one*
saveall. That's why it's an optimization.

--
Piers

Jason Gloudon

unread,

Mar 3, 2003, 9:46:24 AM3/3/03

to Piers Cawley, Benjamin Goldberg, perl6-i...@perl.org

On Mon, Mar 03, 2003 at 01:07:36PM +0000, Piers Cawley wrote:

> > I think you're overlooking the "restoreall" done just before the
> > jump-no-save-returnaddress operation... I see two "saveall"s and
> > two "restoreall"s.
>
> But with proper tail call optimization you'd only need *one*
> saveall. That's why it's an optimization.

Tail call optimization is a space (stack size) optimization. The scheme
definition only requires space efficiency. The time optimization is important
but a secondary consideration for the functional language folks.

--
Jason

Benjamin Goldberg

unread,

Mar 3, 2003, 4:00:49 PM3/3/03

to perl6-i...@perl.org

Here's an idea -- considering that the first op of every sub seems to be
a 'saveall' instruction, and in my suggested optomized tail call
example, the one op immediately prior to the 'jsnr _bar' op was a
'restoreall', then the end of _foo might possibly have been written as:

.arg int x # push @_, x
.arg int u # push @_, u
.arg int q # push @_, q

branch (_bar + 1) # goto *second* op of sub _bar.
.end

That is, go to the op immediately after the 'saveall'.

Thus, one saveall is performed, and one restoreall is performed.

The possible problems with this that I forsee are:

1/ If _bar expects some data to be passed in with registers, rather
than with the stack.
2/ If _bar does not have 'saveall' as it's first op.

If _bar is a subroutine generated by the compiler, neither of these
would ever actually occur in practice.

Well, they *could* occur if _bar was written by hand, not compiler
generated. Then for each of the problem cases mentioned above:
1/ If _bar expects some data to be passed in with registers, then
there's no possible way that it could be called by a compiler-generated
function and have behavior that's guaranteed to be consistent -- nor
should we expect this, or worry about it. (Or, we could mark _bar as
expecting to be passed data through registers, and *forbid* it from ever
being called by compiler-generated subroutines, and only allow hand
written subs to call it).
2/ The compiler can easily examine the first op of _bar, and if it
isn't 'saveall', choose not to make the tail call optomization.

Dan Sugalski

unread,

Mar 3, 2003, 4:04:04 PM3/3/03

to perl6-i...@perl.org

At 4:00 PM -0500 3/3/03, Benjamin Goldberg wrote:
>Jason Gloudon wrote:
>>
>> On Mon, Mar 03, 2003 at 01:07:36PM +0000, Piers Cawley wrote:
>>
>> > > I think you're overlooking the "restoreall" done just before the
>> > > jump-no-save-returnaddress operation... I see two "saveall"s and
>> > > two "restoreall"s.
>> >
>> > But with proper tail call optimization you'd only need *one*
>> > saveall. That's why it's an optimization.
>>
>> Tail call optimization is a space (stack size) optimization. The
>> scheme definition only requires space efficiency. The time
>> optimization is important but a secondary consideration for the
>> functional language folks.
>
>Here's an idea -- considering that the first op of every sub seems to be
>a 'saveall' instruction,

If that's true, we need to thump IMCC some. There's no reason for
this to be the first op--if the caller wanted any registers saved it
had darned well beter have taken care of it itself. That's the point
of caller-save, after all...

Benjamin Goldberg

unread,

Mar 3, 2003, 7:36:22 PM3/3/03

to perl6-i...@perl.org

Dan Sugalski wrote:
> Benjamin Goldberg wrote:
>> Jason Gloudon wrote:
>>> Piers Cawley wrote:
>>>
>>>>> I think you're overlooking the "restoreall" done just before
>>>>> the jump-no-save-returnaddress operation... I see two "saveall"s
>>>>> and two "restoreall"s.
>>>>
>>>> But with proper tail call optimization you'd only need *one*
>>>> saveall. That's why it's an optimization.
>>>
>>> Tail call optimization is a space (stack size) optimization. The
>>> scheme definition only requires space efficiency. The time
>>> optimization is important but a secondary consideration for the
>>> functional language folks.
>>
>> Here's an idea -- considering that the first op of every sub seems to
>> be a 'saveall' instruction,
>
> If that's true, we need to thump IMCC some. There's no reason for
> this to be the first op--if the caller wanted any registers saved it
> had darned well beter have taken care of it itself. That's the point
> of caller-save, after all...

Erm, my statement was actually just an assumption that the first op
would be a 'saveall' -- I haven't looked at actual imcc output. By your
comment, I'll assume that my earlier assumption was wildly wrong.

-------

Is it possible to divide subroutines into two groups, caller-save and
callee-save? If so, then I *think* that we should be able to make at
least *some* tail-optomizations:

If a caller-save subroutine is going to return the results of another
caller-save subroutine, the first can jump directly into the other.

If a callee-save subroutine is going to return the results of another
callee-save subroutine, the first can jump to the Nth opcode of the
second subroutine, in the manner described down below.

In the other cases (callee-save going to caller-save, or vice-versa), I
suppose (but might be mistaken) that the only tail-call optomization
that can be done is to have the caller skip the step of removeing
results from the stack and then pushing them back on... which might
already be done.

---------

There's three or four ways that I can think of for an ordinary
callee-save subroutine to work:
1/ saveall/restoreall
2/ push[ipns]/pop[ipns]
3/ push/pop (with the help of rotate_up)
4/ a combo of 2 and 3.

I assume that most callee-save subroutines have the following stages:
1/ save the registers we're going to be dirtying
2/ (optional, use with methods 3&4 mentioned above)
rotate_up the stack so that our arguments are on the top.
3/ pop the arguments into registers
4/ do our work
5/ push the return values onto the stack.
6/ (optional) rotate_up the stack
7/ restore the registers we've dirtied.
8/ pop an address off the stack and goto it. (op "ret")

If a caller is going to perform a tail-optomized function call, then it
could look like:
1/ save the registers that the callee is going to restore in the
callee's step 7
2/ save any registers that we're using that weren't already saved
3/ rotate_up the stack, if step 2 used "push" ops.
4/ pop the arguments into registers.
5/ do our work
6/ copy the callee's arguments into whatever registers the callee
would put them into in the callee's step 3.
7/ restore the registers that were saved in our step 2.
8/ branch to step 4 of the callee.

It may be possible for the caller to avoid an explicit step 6, if it can
have it's computations put the arguments in the target registers
directly.

Note that steps 2,7 of the caller can't use saveall/restoreall, since
that would interfere with the registers written to in step 6. (If you
do want to use saveall/restoreall for 2,7, then step 6 becomes "push the
arguments for the callee onto the stack", and step 8 becomes "branch to
step 3 of the callee"). Similarly, if the callee puts arguments into
integer (etc.) registers, you probably can't use savei/restorei.

Leopold Toetsch

unread,

Mar 4, 2003, 3:28:11 AM3/4/03

to Benjamin Goldberg, perl6-i...@perl.org

Benjamin Goldberg wrote:

> Erm, my statement was actually just an assumption that the first op
> would be a 'saveall' -- I haven't looked at actual imcc output.

imcc does not emit any additional instructions currently. This is why I
did start this thread in the first place.
- imcc takes the code produced by $HL compiler *as is* and
- has to figure out, what $HL actually wants to do (WRT CFG and data flow)

Currently imcc checks the first few instructions of a subroutine. If a
saveall is found imcc assumes, that there is no inofficial (via
registers) data flow into the sub or back.
If no saveall is found, imcc assumes that data are passed in registers
and accounts in the CFG for this.

This is all fine, to make it assemble and run the *hand crafted* test
cases. But this is not the way to go (IMHO) for using imcc as a compiler
for another HL.

I would like to have nailed down the calling conventions and I would
like to have hints in imcc's input file, that state we gonna call this
sub and caller saves or callee saves, or use PDD03 ...

Please imagine something like this:

_main:
$P0 = new Sub
$P1 = new PerlArray
...
many set_addr
...
$P0 = $P1[$I99]
invoke $P0
ret
_sub1: ...
_sub2: ...
_sub3: ...

The address, where the invoke branches to is calculated somehow then
goes into an array (or may be passed from another sub) and can be
anywhere. When imcc has no assumptions about calling conventions, it has
to consider each label as possible branch target and has to follow the
data flow.
This will typically lead to a lot of spilling due to the register
allocator running out of free registers.

OTOH with known calling conventions imcc could optimize a few simple
leafs in the call chain (of procedural code) and e.g. reorganize
registers so that pushx/saveall/popx... are just thrown away.

leo

Dan Sugalski

unread,

Mar 4, 2003, 11:27:20 AM3/4/03

to perl6-i...@perl.org

At 7:36 PM -0500 3/3/03, Benjamin Goldberg wrote:
>Dan Sugalski wrote:
>> Benjamin Goldberg wrote:
>>> Jason Gloudon wrote:
>>>> Piers Cawley wrote:
>>>>
>>>>>> I think you're overlooking the "restoreall" done just before
>>>>>> the jump-no-save-returnaddress operation... I see two "saveall"s
>>>>>> and two "restoreall"s.
>>>>>
>>>>> But with proper tail call optimization you'd only need *one*
>>>>> saveall. That's why it's an optimization.
>>>>
>>>> Tail call optimization is a space (stack size) optimization. The
>>>> scheme definition only requires space efficiency. The time
>>>> optimization is important but a secondary consideration for the
>>>> functional language folks.
>>>
>>> Here's an idea -- considering that the first op of every sub seems to
>>> be a 'saveall' instruction,
>>
>> If that's true, we need to thump IMCC some. There's no reason for
>> this to be the first op--if the caller wanted any registers saved it
>> had darned well beter have taken care of it itself. That's the point
>> of caller-save, after all...
>
>Erm, my statement was actually just an assumption that the first op
>would be a 'saveall' -- I haven't looked at actual imcc output. By your
>comment, I'll assume that my earlier assumption was wildly wrong.

Ben, please don't make assumptions. We've done this to death in the
past, and it's all in the archives to go digging out. While there's
plenty of useful metadata we can expose to imcc to get it better
optimized output, the *last* thing we need is for someone to make a
bunch of WAG and go off on the conversation based on them.