Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Register allocation/spilling and register volitility
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jeff Clites  
View profile  
 More options Nov 8 2004, 4:03 am
Newsgroups: perl.perl6.internals
From: jcli...@mac.com (Jeff Clites)
Date: Mon, 8 Nov 2004 01:03:48 -0800
Local: Mon, Nov 8 2004 4:03 am
Subject: Register allocation/spilling and register volitility
 From other threads:

> Now we are placing arguments or return values in registers according
> to PDD03 and the other end has immediately access to the placed
> values, because the register file is in the interpreter.

> With the indirect addressing of the register frame, this argument
> passing is probably the most costly part of function calls and
> returns, because we have to copy the values.

and

>> Anyway, if you can pop both register frames -and- context structures,
>> you won't run GC too often, and everything will nicely fit into the
>> cache.

> I thought about that too, but I'd rather have registers adjacent, so
> that the caller can place function arguments directly into the callers
> in arguments.

> OTOH it doesn't really matter, if the context structure is in the
> frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
> valid as I0 or I4, as long as it's assured, that it's exactly
> addressing the incoming argument area of the called function.

A problem with this is that you can't be sure that you can actually
have the "next" frame of registers adjacent to the current frame--they
might already be taken. Imagine A calls B, then B creates a
continuation and stores it in a global, then returns. If A makes
another function call, it can't just use the adjacent register
frame--it's still "taken". (I'm referring here to the "sliding register
window" idea of course.) But this is reminding me of another idea I've
had.

I've been thinking for a while about another idea to decrease some of
the copying which we are now doing for function calls, as well as the
allocation of register sets during subroutine calls, and which would as
a side-effect allow us to reduce the need for register spilling.

First, a code snippet. Consider the following C code:

int a(int x) { return b(x); }
int b(int x) { return c(x + 1); }
int c(int x) { return x + 7; }

As compiled on the PPC, this code is very compact--each function is
implemented by just one or two instructions, and only one register is
used. There are two key factors here: (1) no register preservation is
needed, and (2) the call from a() to b() is just a branch, since the
call to a() has already loaded the registers with the correct values
for the call to b().

By my thinking, register usage falls into three basic categories:

1) Registers used for parameter passing and returns values for function
calls.

2) Registers used to hold values which need to be preserved across
function calls.

3) Registers used to hold values which do not need to be preserved
across function calls.

Really, (1) and (3) are similar--in either case, you have registers
whose values are allowed to change across function calls. (For register
which hold return values that's obvious, and for those used to pass
parameters, it seems fair that these be expected to change across a
function call.) So we really have two cases--volatile and non-volatile
register usage.

In the PPC ABI, half the registers are treated as volatile, and the
other half as non-volatile. For the PPC, this corresponds to
caller-preserved v. callee-preserved. Because of continuations[1],
parrot can't have callee-preserved registers, but it could still have
volatile v. non-volatile registers. Here's what I'm thinking:

Keep the old-scheme registers inside the interpreter structure, *as
well as* the new indirect registers. Call the registers in the
interpreter I0 to I31, and the indirect registers I32 to whatever.[2]
I'll call these the "direct" and "indirect" registers. By their very
nature, the direct registers are volatile, and the indirect registers
are non-volatile. What does this buy us? The following:

1) Parameter passing and return occur via the established calling
conventions, in the volatile registers. No extra copying is needed upon
function call or return--the volatile registers are physically the same
for the caller and the callee.

2) "Temporary" calculations can use the volatile registers. Values
which need preserving can use the non-volatile registers directly.

3) In functions which need no non-volatile registers, there's no need
to allocate the indirect registers at all.

4) Cases like my example code above can compile just as compactly as
the PPC case. With the volatile Parrot registers mapped to volatile CPU
registers (in the PPC case at least), this would be highly efficient:
you'd end up with asm similar to what you have in the C case above, and
you'd not have a need to save the CPU registers back to Parrot
registers (other than those used in the calling conventions) for Parrot
function calls out of JIT code.

5) Because this opens things up to more than 32 registers, code could
use as many registers as needed. You'd never have register spilling
per-se in order to accommodate local variables, though you'd still want
a smart register allocator which minimized the number of registers
needed (for memory locality as well as minimizing allocation). But a
very simple one-non-volatile-register-per-local would function
correctly, as a baseline implementation (correct but slow).

The main value here is in the ability to avoid allocation of indirect
registers in many cases, minimize copying, and simplify the register
allocation policy.

Further notes:

1) For the JIT case, correct code can be emitted to handle direct v.
indirect registers (relative to the interpreter pointer, or relative to
the new base pointers, depending on the register). For the other cores,
we'd either need to generate additional op variants to handle direct v.
indirect registers (probably lots more op variants), or we'd need to
have register-accessing code be conditional on the register number
(slow--lots of branches). The latter approach doesn't bother me for the
non-prederef cores (they're slow anyway, and mostly good for testing),
but I don't know enough about the prederef cores to see if we could do
something efficient, without an op explosion.

2) Since the number of indirect registers could vary from
function-to-function, we'd need a syntax to specify how may are needed
for a given function--probably an op to allocate the right number of
registers, *or* it could somehow be metadata kept as part of the sub
itself. With the op approach, for safety upon function call we could
swap into the indirect register pointer, a pointer to a zero-filled
read-only chunk of memory, to catch attempts to use indirect registers
before allocating any. But we'd still have to deal with the issue of
how to catch trying to access more registers than you actually
allocated. A compiler could ensure that this would never happen, but it
would feel dangerous if the VM didn't have a way to catch this, for
hand-generated assembly. (On the other hand, a bytecode verifier could
catch it.)

3) Really, only the direct registers are "real" registers,
philosophically--they're the only ones which live at fixed memory
locations. The indirect registers are fulfilling a purpose that other
VMs (or calling conventions) would use "the stack" for. By calling them
registers, we can use them directly for calculations, but still pretend
that we are RISC.

4) This may not be directly relevant to the "slow fib benchmark"
problem, *except* that for tail calls this does speed things up
because, by definition, you don't need any register preservation for
tail calls. This is true (and an advantage) even without actual
tail-call optimizations (ie, even if you have the usual frame overhead,
at least you don't need to allocate indirect registers). And I believe
there's a theorem that all recursive operation can be re-written to be
tail-recursive, so it's sort of relevant, though I guess what's
important about the fib benchmark isn't specifically that it's
recursive, but rather just that it creates a lot of frames (lots of
function calls, deep call depth).

Comments of course welcome, and encouraged. Thanks if you read all of
that--it ended up longer than I was expecting.

JEff

[1] Callee-preserved means "callee saved and then restored just before
return". With continuations, you can jump past the restoration code,
and ruin this strategy.

[2] These could be numbered starting from 32, or they could be referred
to via another notation indexed from 0--NVI0 for "non-volatile I
register 0", or something.


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Nov 8 2004, 4:34 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Mon, 8 Nov 2004 10:34:46 +0100
Local: Mon, Nov 8 2004 4:34 am
Subject: Re: Register allocation/spilling and register volitility

Jeff Clites <jcli...@mac.com> wrote:
>> OTOH it doesn't really matter, if the context structure is in the
>> frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
>> valid as I0 or I4, as long as it's assured, that it's exactly
>> addressing the incoming argument area of the called function.
> A problem with this is that you can't be sure that you can actually
> have the "next" frame of registers adjacent to the current frame--they
> might already be taken. Imagine A calls B, then B creates a
> continuation and stores it in a global, then returns.

Please read the proposal summary by Miroslav Silovic, keyword "watermark".
If frames aren't adjacent, normal argument copying can be done anyway.

> Keep the old-scheme registers inside the interpreter structure, *as
> well as* the new indirect registers. Call the registers in the
> interpreter I0 to I31, and the indirect registers I32 to whatever.

That would need two different addressing modes depending on the register
number. That'll lead to considerable code bloat: we'd have all possible
permutations for direct/indirect registers. Doing it at runtime only
would be a serious slowdown.

It's not needed. I've a better scheme in mind, which addressess
efficieny as well as argument passing.

leo


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Clites  
View profile  
 More options Nov 8 2004, 5:18 am
Newsgroups: perl.perl6.internals
From: jcli...@mac.com (Jeff Clites)
Date: Mon, 8 Nov 2004 02:18:01 -0800
Local: Mon, Nov 8 2004 5:18 am
Subject: Re: Register allocation/spilling and register volitility
On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:

> Jeff Clites <jcli...@mac.com> wrote:

>>> OTOH it doesn't really matter, if the context structure is in the
>>> frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
>>> valid as I0 or I4, as long as it's assured, that it's exactly
>>> addressing the incoming argument area of the called function.

>> A problem with this is that you can't be sure that you can actually
>> have the "next" frame of registers adjacent to the current frame--they
>> might already be taken. Imagine A calls B, then B creates a
>> continuation and stores it in a global, then returns.

> Please read the proposal summary by Miroslav Silovic, keyword
> "watermark".
> If frames aren't adjacent, normal argument copying can be done anyway.

This would seem to require the same types of runtime checks that you
are objecting to below, unless user code is expected to explicitly
check whether it's supposed to be assigning to I3 v I(3 + 32x?). And
that still seems to require copying, in my case of a function a() which
calls a function b() with the exact same arguments.

>> Keep the old-scheme registers inside the interpreter structure, *as
>> well as* the new indirect registers. Call the registers in the
>> interpreter I0 to I31, and the indirect registers I32 to whatever.

> That would need two different addressing modes depending on the
> register
> number. That'll lead to considerable code bloat: we'd have all possible
> permutations for direct/indirect registers. Doing it at runtime only
> would be a serious slowdown.

I discussed that in item (1) under "Further notes".

> It's not needed. I've a better scheme in mind, which addressess
> efficieny as well as argument passing.

And spilling?

JEff


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Nov 8 2004, 6:20 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Mon, 8 Nov 2004 12:20:19 +0100
Local: Mon, Nov 8 2004 6:20 am
Subject: Re: Register allocation/spilling and register volitility

Jeff Clites <jcli...@mac.com> wrote:
> On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
>> If frames aren't adjacent, normal argument copying can be done anyway.
> This would seem to require the same types of runtime checks that you
> are objecting to below,

Not runtime. The register allocator knows the amount of needed
non-volatiles and volatile registers. Depending on that a call can place
arguments directly into the incoming regs of the caller or not. Then if
at runtime copying is needed (because e.g. frames aren't adjacent), the
function arguments are copyied. This is fully transparent.

> I discussed that in item (1) under "Further notes".

Yep, saw that then ;)

>> It's not needed. I've a better scheme in mind, which addressess
>> efficieny as well as argument passing.
> And spilling?

Well, I'm proposing a variable-sized register frame. With very little
additions we could run with more then 32 registers per kind (there are a
few bitmasks currently that would need adaptions, but not much).

But basically, spilling should not be needed at all, if the register
allocator isn't as broken as it now is. Dan got some really evil
subroutines, so we have RL test cases. We'll see.

> JEff

leo

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Bill Coffman  
View profile  
 More options Nov 12 2004, 7:10 am
Newsgroups: perl.perl6.internals
From: bill.coff...@gmail.com (Bill Coffman)
Date: Fri, 12 Nov 2004 04:10:20 -0800
Local: Fri, Nov 12 2004 7:10 am
Subject: Re: Register allocation/spilling and register volitility

I think this is a great idea.  Mostly, I was thinking of the case
where there are less than 32 registers needed.  The allocator can try
to reduce the number of registers that will need to be used.  I think
that addressing could become an issue is there are too many registers,
but I still haven't figured out how the bytecode looks yet.

> But basically, spilling should not be needed at all, if the register
> allocator isn't as broken as it now is. Dan got some really evil
> subroutines, so we have RL test cases. We'll see.

In that case, I'll focus on the register renaming, live-range
analysis, etc.  The memory leak, which might not be actually lost
memory, but just wasteful useage of memory, is pretty much related to
the spill code.  Most of the grotesqueness of the allocator is in the
spiller.  If this has a high probability of changing, I'll focus on
other stuff, like register renaming, which is a lot harder, but with
higher longterm payoff.

In light of this, I think I'll send in my patch, in case it is
helpful.  The problem is that some routine outside of imc_reg_alloc()
is sending in conflicting register preallocation.  My code does lot's
of assertions, and finds this, one way or another.  I'll put in a
variable to turn off color consistency checking, which will probably
then pass the tests.

Bill


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Nov 12 2004, 8:29 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Fri, 12 Nov 2004 14:29:41 +0100
Local: Fri, Nov 12 2004 8:29 am
Subject: Re: Register allocation/spilling and register volitility

Bill Coffman wrote:
> In that case, I'll focus on the register renaming, live-range
> analysis, etc.  

Great.

> In light of this, I think I'll send in my patch, in case it is
> helpful.  The problem is that some routine outside of imc_reg_alloc()
> is sending in conflicting register preallocation.

Yep. I've already found two flaws within pcc.c. One is fixed (thanks to
your allocation conflict check).

The second is tricky: we have

    _global_dumper()
    ...
    object."dumper"(pmc, name)    # args are P5, S5

and _global_dumper() does:

    .return(object)

The function is returning the object in P5, which means that return
conventions are indicating a PMC result in P5, and that get's copied
into the caller's frame. But the caller doesn't use it. The usage of P5
in the caller (pmc) is clobbered...

I don't know, if that is an abuse of prototyped calling conventions (we
need a definition WRT call/return argument mismatches).

If not, we can never assign possible return values around a function call.

> Bill

leo

    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Michel Pelletier  
View profile  
 More options Nov 12 2004, 1:51 pm
Newsgroups: perl.perl6.internals
From: mic...@dialnetwork.com (Michel Pelletier)
Date: Fri, 12 Nov 2004 10:51:45 -0800
Local: Fri, Nov 12 2004 1:51 pm
Subject: Re: Register allocation/spilling and register volitility

> Date: Fri, 12 Nov 2004 04:10:20 -0800
> From: Bill Coffman <bill.coff...@gmail.com>
> > > And spilling?

> > Well, I'm proposing a variable-sized register frame. With very little
> > additions we could run with more then 32 registers per kind (there are a
> > few bitmasks currently that would need adaptions, but not much).

> I think this is a great idea.  

So do I.  Having a fixed number of registes is like having the C keywords
register0 through registerN.  All registers should be $Pn registers and let
the guts deal with how that maps physically.  I certainly don't care which
$Pn register maps to which Pn register, should a PIR programmer ever care?  
As far as I can tell the only reason why a PIR programmer would even use Pn
registers is for implicit arguments, leading me into my next point:

I strongly feel that Leo was right to suggest removing implicit register
arguments from opcodes.  I responded to that email but somehow it didn't get
to the list AFAICT.  I can't think of any good reason or use case for
implicit arguments.  Can one be provided?  From my experiences with Parakeet,
it's just more weird, implicit stuff that I have to keep in my head that
confuses new PIR programmers who look at my code.

Since I'm on the box I might as well get it all out: I don't think
unprototyped pcc calls should use the first 11 arguments in P registers and
the rest in an array, again it just doesn't make any sense to me other than
to violate DON (Don't Optimize Now, in tribute to Don Knuth).  Just pass an
array and be done with it.  If the guts want to register-optimize arguments
then fine, why does the PIR programmer have to do it by hand? (this is really
an specific extension of the argument against implicit register usage)

Unprototyped arguments are usually built dynamically at run time anyway or
called in from an external language.  I wager the performance benefit is
miniscule to nothing compared to the contortious PIR code needed to pack and
unpack arguments into registers, spill arguments into an array, avoid those
11 P regisers all the time, and the pressure it all puts on the register
allocator.  For prototyped calls I can (barely) see this as being useful, but
not unprototyped.

Fixed registers, implicit arguments, and calling contortions are "dead
chickens" (coined, AFAIK, by Jim Fulton).   They're the completely useless
things you have to wave over the machine to get it to do what you want.  The
less dead chickens you have to wave over Parrot, the better.

-Michel


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dan Sugalski  
View profile  
 More options Nov 12 2004, 3:35 pm
Newsgroups: perl.perl6.internals
From: d...@sidhe.org (Dan Sugalski)
Date: Fri, 12 Nov 2004 15:35:44 -0500
Local: Fri, Nov 12 2004 3:35 pm
Subject: Re: Register allocation/spilling and register volitility
At 10:51 AM -0800 11/12/04, Michel Pelletier wrote:

>  > Date: Fri, 12 Nov 2004 04:10:20 -0800
>>  From: Bill Coffman <bill.coff...@gmail.com>

>>  > > And spilling?

>>  > Well, I'm proposing a variable-sized register frame. With very little
>>  > additions we could run with more then 32 registers per kind (there are a
>>  > few bitmasks currently that would need adaptions, but not much).

>>  I think this is a great idea.

>So do I.

I don't. It's staying the way it is, as are the implicit operands to
some of the ops, and the unprototyped calling conventions.
--
                                Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
d...@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk


    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2010 Google