Another issue with pdd03

Leopold Toetsch

unread,

Nov 14, 2004, 12:32:50 PM11/14/04

to Perl 6 Internals

As outlined in the analysis of dumper.t failures with the new register
allocator, we have another problem with current calling or better return
conventions.

Given this simple program:

$ cat ret.imc
.sub main @MAIN
P5 = new PerlString
P5 = "ok\n"
foo()
print P5
.end
.sub foo
.local pmc ok
ok = new PerlString
ok = "bug\n"
.return(ok)
.end

$ parrot ret.imc
bug

The usage of the register P5 in main isn't invalid: the register
allocator might have (in a more complex program) just not found any
other free register. And as the main program just calls a function in
void context the register P5 could have been used.

Defining now that P5 has to be preserved in main, because it's a
possible return result of foo() and therefore may be clobbered by foo()
is meaning, that we have effectively just 16 registers per kind
available for allocation around a function call.

If the latter is true according to current pdd03 then we are wasting
half of our registers for a very doubtful advantage: being able to pass
return values in R5..R15.

leo

Bill Coffman

unread,

Nov 15, 2004, 12:17:48 AM11/15/04

to Leopold Toetsch, Perl 6 Internals

PDD03: Responsibility for environment preservation
PDD03:
PDD03: The caller is responsible for preserving any environment it is interested
PDD03: in keeping. This includes any and all registers, lexical scoping and
PDD03: scratchpads, opcode libraries, and so forth.
PDD03:
PDD03: Use of the savetop opcode is recommended if the caller wishes to save
PDD03: everything, and the restoretop opcode to restore everything
savetop saved.
PDD03: This saves off the top 16 of each register type, leaving the bottom 16
PDD03: registers, which generally contain the return values, intact.

The way I read it, paragraph one implies that when you print P5 after
calling foo(), you are expecting to get the return value. You didn't
save and restore register P5, so you wanted foo() to do something to
it.

The above docs may be slightly ambiguous. The first paragraph says
that you have to save everything. The seconds says that
savetop/restoretop commands are there to help the user, by allowing
them to take care of registers 16-31, but what about 0-15? Is there
an implication that you don't have to save those? Based on the way
people are coding IMC, that seems to be the case.

Both the old and new register allocator are kind of hands off the
first 16 registers. Recall that I was asking about this some time
ago, and found that if the allocator tries to use the bottom half
first, chaos ensues. Our convention then ... our interpretation of
the calling convention, is that we "try to allocate" only the top 16
registers. If more are needed, the new allocator starts from R15, and
go on downward. The old one just went up from R0.

Note that PDD03 is specifying a dynamic calling convention. Because
of this, the register allocator cannot use the convention to find what
should be saved, and what it can use. If this were indicated at
compile time, the register allocator could keep hands off only the
used registers. As it is, we may need to do as you suggest, and leave
the bottom 16 registers for parameter passing, and the top 16 for
local symbols and register allocation. If so, we will obviously need
a better register allocator. One exception to this. If a sub calls
no other subs, it can use all 32 registers. Also, P4, S1-S4, N0-N4
seem to be free.

~Bill

Leopold Toetsch

unread,

Nov 15, 2004, 3:38:18 AM11/15/04

to Bill Coffman, perl6-i...@perl.org

Bill Coffman <bill.c...@gmail.com> wrote:

[ pdd03 ]

> The way I read it, paragraph one implies that when you print P5 after
> calling foo(), you are expecting to get the return value. You didn't
> save and restore register P5, so you wanted foo() to do something to
> it.

The nasty thing of a function call is:

i = foo() # fine P5 returned

vs.

foo() # P5 clobbered by foo

(but you can replace P5 with P15 too, or every register R5..R15)

> The above docs may be slightly ambiguous. The first paragraph says
> that you have to save everything.

Not quite. You have to save everything that should be preserved, i.e.
everything you reuse later.

> ... The seconds says that

> savetop/restoretop commands are there to help the user, by allowing
> them to take care of registers 16-31, but what about 0-15?

Well, that's obsolete and needs rewording. We changed from a scheme,
where the register saving is done by opcodes to a scheme where the
system preserves registers (as well as lexicals and such, which were
already saved by switching contexts).

By providing a new register frame for the called sub, basically all
registers are preserved. The exceptions are unused return values like in
the above example, which are actively returned by the framwork, when the
subroutine indicates the presence of return values.

> Both the old and new register allocator are kind of hands off the
> first 16 registers. Recall that I was asking about this some time
> ago, and found that if the allocator tries to use the bottom half
> first, chaos ensues.

Yes, exactly. And that's just because of unused returns.

> ... As it is, we may need to do as you suggest, and leave

> the bottom 16 registers for parameter passing, and the top 16 for
> local symbols and register allocation. If so, we will obviously need
> a better register allocator.

We need to refine the calling conventions. Reserving 16 registers of all
four kinds just for likely one return value is super-inefficient.

> ~Bill

leo

Jeff Clites

unread,

Nov 16, 2004, 1:05:30 PM11/16/04

to l...@toetsch.at, perl6-i...@perl.org, Bill Coffman

On Nov 15, 2004, at 12:38 AM, Leopold Toetsch wrote:

> Bill Coffman <bill.c...@gmail.com> wrote:
>
> [ pdd03 ]
>
>> The way I read it, paragraph one implies that when you print P5 after
>> calling foo(), you are expecting to get the return value. You didn't
>> save and restore register P5, so you wanted foo() to do something to
>> it.
>
> The nasty thing of a function call is:
>
> i = foo() # fine P5 returned
>
> vs.
>
> foo() # P5 clobbered by foo
>
> (but you can replace P5 with P15 too, or every register R5..R15)

This has never bothered me, probably because of the comparison to the
register-based calling conventions that the PPC uses: A called function
(which returns a value) has to store its result in some register,
whether or not the caller wants it. A call like "i = foo()" is really
two steps: call the function, then copy the result from r3 to the
appropriate location (maybe a location on the stack, maybe another
register). It may be possible to optimize so that "i" is already using
the same register as the return value, but in general that can't be
arranged for most cases; consider how this would compile:

i = foo() //call foo, copy result from r3 to other register--must
since bar() would clobber
j = bar() //call bar, copy result from r3 to other register--could
avoid copy, if j not
// needed past baz
baz(i + j) //add those two other registers into r3, and call baz

And due to the register-preservation semantics on the PPC, even a call
to a void-return function could clobber r3, since it could call another
function which returns a result and thus uses r3.

Not that parrot has to necessarily work this way, but it at least has
precedent, so it's not totally strange behavior.

JEff

Jeff Clites

unread,

Nov 16, 2004, 1:05:41 PM11/16/04

to Leopold Toetsch, Perl 6 Internals

On Nov 14, 2004, at 9:32 AM, Leopold Toetsch wrote:

> Defining now that P5 has to be preserved in main, because it's a
> possible return result of foo() and therefore may be clobbered by
> foo() is meaning, that we have effectively just 16 registers per kind
> available for allocation around a function call.
>
> If the latter is true according to current pdd03 then we are wasting
> half of our registers for a very doubtful advantage: being able to
> pass return values in R5..R15.

In effect this is quite similar to the PPC calling conventions: we have
roughly half of the registers preserved across function calls. In terms
of the volatile registers, it's fine to use them for local
calculations, as long as either you're using them to hold values which
don't need to persist across function calls, or you
preserve-and-restore them yourself.

But that loops back to a previous proposal of mine: If they're not
being preserved, and in fact need to be "synced" between caller and
callee, then having these registers physically located in the
interpreter structure, rather than in the bp-referenced frame, saves
all the copying, and makes it more obvious what's going on.

JEff

Leopold Toetsch

unread,

Nov 16, 2004, 2:52:15 PM11/16/04

to Jeff Clites, perl6-i...@perl.org

Jeff Clites <jcl...@mac.com> wrote:

[ PPC ABI ]

> Not that parrot has to necessarily work this way, but it at least has
> precedent, so it's not totally strange behavior.

Sure it's neither strnge nor unsimilar. Except that the PPC ABI defines
more preserved registers (r13..r31) assuming pressure is on P registers
only, and albeit I don't know how and where the hardware is preserving
them, it does obviosly not cause level 2 cache misses. Refetching from
stack is of course cheaper in hardware too.

> JEff

leo

Leopold Toetsch

unread,

Nov 16, 2004, 2:58:35 PM11/16/04

to Jeff Clites, perl6-i...@perl.org

Jeff Clites <jcl...@mac.com> wrote:

> But that loops back to a previous proposal of mine: If they're not
> being preserved, and in fact need to be "synced" between caller and
> callee, then having these registers physically located in the
> interpreter structure, rather than in the bp-referenced frame, saves
> all the copying, and makes it more obvious what's going on.

Well I answered that already. Having two distinct addressing schemes for
volatile and non-volatile registers has a serious overhead for
non-prederefed run cores. OTOH in the light of a recent discussion this
approach could be an alternative.

> JEff

leo

Leopold Toetsch

unread,

Nov 16, 2004, 4:07:58 PM11/16/04

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> wrote:
> Jeff Clites <jcl...@mac.com> wrote:

>> But that loops back to a previous proposal of mine: If they're not
>> being preserved, and in fact need to be "synced" between caller and
>> callee, then having these registers physically located in the
>> interpreter structure, rather than in the bp-referenced frame, saves
>> all the copying, and makes it more obvious what's going on.

> Well I answered that already. Having two distinct addressing schemes for
> volatile and non-volatile registers has a serious overhead for
> non-prederefed run cores.

Err, for all but unrolled run-cores (i.e only JIT could cope with it).
For prederefed cores all OUT arguments would need duplication, IN
arguments, which have usually a constant addressing too would use the
addressing of the constants for the volatiles.
CGoto and plain function core would grow towards insanity.

> ... OTOH in the light of a recent discussion this

> approach could be an alternative.

So not really, sorry.

>> JEff

leo