Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: register allocation questions

33 views
Skip to first unread message

Dan Sugalski

unread,
Oct 25, 2004, 9:31:10 PM10/25/04
to Bill Coffman, Perl 6 Internals
At 6:18 PM -0700 10/25/04, Bill Coffman wrote:
>Hello All,
>
>I have been hard at work, trying to grok the reg_alloc.c code, and
>with some success. My code is assigning registers, so that none are
>conflicting (which I double-verify), and I'm getting to the end of
>"make".

Woohoo! A non-trivial thing. :)

>Here's some questions:
>
>1) In the existing parrot code, when a register is assigned, it uses
>the following code:
> for (color = 0; color < MAX_COLOR; color++) {
> int c = (color + MAX_COLOR/2) % MAX_COLOR;
> if (!colors[c]) {
> reglist[x]->color = c;
>
> debug(interpreter, DEBUG_IMC,
> "#[%s] provisionally gets color [%d]"
> "(%d free colors, score %d)\n",
> reglist[x]->name, c,
> free_colors, reglist[x]->score);
> break;
> }
> }
>
>Thus, it seems to prefer to use register #16 and up, first, before it
>uses 0-15. This is mistifying to me, since it's not so trivial to
>code it this way, and yet I cannot figure out why this color
>preference. It seems like maybe there's a reason which I need to
>understand here. My new code doesn't do this behaviour.

Registers 0-15 are the registers used for sub/method/function calls.
Leaving them empty makes calls easier -- if the low-registers are in
use you need to shuffle the contents somewhere else.

[Snippage]

Looks like we've got some... incidental behavior. This isn't
particularly good. Feel free to rip out anything you want -- this is
all internals stuff, much of it's first-generation code, and nobody's
got any attachment to it. (I hope :)

>Incidentally, reg
>allocation is done on the following subs:
>_main __library_dumper_onload _dumper _register_dumper _global_dumper
>__library_data_dumper_onload dumper
>__library_data_dumper_default_onload dumpWithName dumpCached
>dumpProperties dumpHash dumpStringEscaped pmcDefault pmcPMCArray
>pmcIntList pmcStringArray pmcPerlArray pmcPerlHash pmcPerlString
>pmcPerlInt pmcPerlNum pmcPerlUndef pmcSub pmcNull pmcOrderedHash
>pmcManagedStruct pmcUnManagedStruct pmcArray
>__library_data_dumper_base_onload prepare cache createIndent indent
>newIndent deleteIndent dump simple String
>
>which seems like a lot to me, but I guess it's compiling a lot of stuff.

That doesn't seem right. Leo may well have an idea of why. On the
other hand, I'm fine with ripping a lot of this stuff out and
centralizing it.

--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,
Oct 26, 2004, 5:09:54 AM10/26/04
to Bill Coffman, perl6-i...@perl.org
Bill Coffman <bill.c...@gmail.com> wrote:
> Hello All,

> I have been hard at work, trying to grok the reg_alloc.c code, and
> with some success. My code is assigning registers, so that none are
> conflicting (which I double-verify), and I'm getting to the end of
> "make".

Wow.


> 1) In the existing parrot code, when a register is assigned, it uses
> the following code:

> int c = (color + MAX_COLOR/2) % MAX_COLOR;

> Thus, it seems to prefer to use register #16 and up, first, before it


> uses 0-15. This is mistifying to me, since it's not so trivial to
> code it this way,

Well, some time ago, register allocation started at zero. The split of
register frames in upper and lower halfs *plus* the premature
optimization to save only the upper half of registers made it necessary
to allocate from 16 up.

But things are a bit more compilcated. For function calls, we are
passing arguments from register x5 ..x15 and I0..I4 plus some more have
a special meaning. See docs/ppds/pdd03_calling_conventions for all the
details. The same convention is used for function returns.

So, if you want that really super efficient, you would allocate
registers around function calls directly to that wanted register number,
which should be in the SymReg's want_regno.

> 2) When function imc_reg_alloc (the main register allocation thingy)
> is called, some of the variables have already expressed a preference
> as to which register they want. I understand that this can optimize
> certain parameter passing, etc. The question that arrises is, what if
> two conflicting variables want the same color (reg#)? Obviously, they
> don't get their way. But what are the consequences, and what must be
> done to fix this.

Ah, ok. For each function call registers are moved around to the
correct number, if they aren't there. This is done by code in
imcc/pcc.c. But the problem still is there that your code really
shouldn't assign these registers in a conflicting way.

> ... Incidentally, reg


> allocation is done on the following subs:
> _main __library_dumper_onload _dumper _register_dumper _global_dumper

[ ...]

Sure. Register allocation code is done per .sub/.end always. So that's
fine. But you might start with simpler source code having one or two
function calls only.

> Maybe someone can point me at something to look at to fix this. I'll
> provide the patch if someone would like to play with it.

The main problem probably is to follow register usage in calling
conventions.

BTW: preserving the upper half of registers will be tossed soon. To
simulate this behavior in current CVS, you could run the code with -Oc,
which should preserve all allocated registers. I don't know, when the
indirect register frame patch whill hit CVS, but it should be in a few
days. With that in place, you can allocate registers as you like,
with the caveat that rules in PDD03 are used.

> Thanks to all who made it this far,
> Bill Coffman

leo

Dan Sugalski

unread,
Oct 27, 2004, 1:50:19 PM10/27/04
to l...@toetsch.at, Bill Coffman, perl6-i...@perl.org
At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:

>Bill Coffman <bill.c...@gmail.com> wrote:
> > 1) In the existing parrot code, when a register is assigned, it uses
>> the following code:
>
>> int c = (color + MAX_COLOR/2) % MAX_COLOR;
>
>> Thus, it seems to prefer to use register #16 and up, first, before it
>> uses 0-15. This is mistifying to me, since it's not so trivial to
>> code it this way,
>
>Well, some time ago, register allocation started at zero. The split of
>register frames in upper and lower halfs *plus* the premature
>optimization to save only the upper half of registers made it necessary
>to allocate from 16 up.
>
>But things are a bit more compilcated. For function calls, we are
>passing arguments from register x5 ..x15 and I0..I4 plus some more have
>a special meaning. See docs/ppds/pdd03_calling_conventions for all the
>details. The same convention is used for function returns.
>
>So, if you want that really super efficient, you would allocate
>registers around function calls directly to that wanted register number,
>which should be in the SymReg's want_regno.

While true, in the general case leaving 0-15 as non-preferred
registers will probably make things easier. Those registers,
especially the PMC ones, are going to see a lot of thrash as function
calls are made, and it'll probably be easier to have them as scratch
registers.

It's distinctly possible, of course, that there'll be very little
pressure to actually *use* them for most code, as we've got plenty of
registers in general. That's the hope, at least.

Leopold Toetsch

unread,
Oct 27, 2004, 3:36:21 PM10/27/04
to Dan Sugalski, Bill Coffman, perl6-i...@perl.org
Dan Sugalski wrote:
> At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:

>> So, if you want that really super efficient, you would allocate
>> registers around function calls directly to that wanted register number,
>> which should be in the SymReg's want_regno.
>
> While true, in the general case leaving 0-15 as non-preferred registers
> will probably make things easier. Those registers, especially the PMC
> ones, are going to see a lot of thrash as function calls are made, and
> it'll probably be easier to have them as scratch registers.

Yep, that's the easy part ;) OTOH when the register allocator is doing
register renaming anyway, the most inner loop with a function call
should get registers assigned already matching the calling convemtions.
With more then one call at that loop level, you have to move around
registers anyway.

> It's distinctly possible, of course, that there'll be very little
> pressure to actually *use* them for most code, as we've got plenty of
> registers in general. That's the hope, at least.

Yes, 16 regs are plenty and do suffice for all "normal"[1] code.
Assigning to wanted reg numbers for a function is a nice optimization.

[1] all except Dan's 6000 lines subroutines :) Did you start creating
real subs for your code already?

leo

Dan Sugalski

unread,
Oct 28, 2004, 9:07:05 AM10/28/04
to Leopold Toetsch, Bill Coffman, perl6-i...@perl.org
At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
>Dan Sugalski wrote:
>>At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
>
>>>So, if you want that really super efficient, you would allocate
>>>registers around function calls directly to that wanted register number,
>>>which should be in the SymReg's want_regno.
>>
>>While true, in the general case leaving 0-15 as non-preferred
>>registers will probably make things easier. Those registers,
>>especially the PMC ones, are going to see a lot of thrash as
>>function calls are made, and it'll probably be easier to have them
>>as scratch registers.
>
>Yep, that's the easy part ;) OTOH when the register allocator is
>doing register renaming anyway, the most inner loop with a function
>call should get registers assigned already matching the calling
>convemtions. With more then one call at that loop level, you have to
>move around registers anyway.

Oh, sure, but keeping your scratch PMCs out of the way makes life a
lot easier for the register coloring algorithms. Might not be
optimal, but if it makes life simpler to start, optimal can come
later.

>>It's distinctly possible, of course, that there'll be very little
>>pressure to actually *use* them for most code, as we've got plenty
>>of registers in general. That's the hope, at least.
>
>Yes, 16 regs are plenty and do suffice for all "normal"[1] code.
>Assigning to wanted reg numbers for a function is a nice
>optimization.
>
>[1] all except Dan's 6000 lines subroutines :) Did you start
>creating real subs for your code already?

I wish. :( Unfortunately not, outside some simple stuff, and I doubt
I will. The language just doesn't lend itself to that sort of thing.
We're going to add actual real subroutines to the language after we
roll out into production, but that doesn't help now, alas.

Matt Fowles

unread,
Oct 28, 2004, 6:17:57 PM10/28/04
to Bill Coffman, Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org
Bill~

I have to say that I am really impressed by all of the work that you
are doing, and if you can make the internals of imcc a little more
approachable, you would be doing a great service.

Thanks,
Matt


On Thu, 28 Oct 2004 15:08:23 -0700, Bill Coffman <bill.c...@gmail.com> wrote:
> Hi all,
>
> Thanks for your continued comments. Btw, I usually read all the
> parrot list, so don't think I'm not paying attention.
>
> Currently, here's how the register allocator is doing.
>
> Failed Test Stat Wstat Total Fail Failed List of Failed
> -------------------------------------------------------------------------------
> t/library/dumper.t 5 1280 13 5 38.46% 1-2 5 8 13
> 4 tests and 51 subtests skipped.
> Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.
>
> I recall Leo, or someone, saying that the data dumper routines are not
> following the calling convention properly. So I've decided not to
> worry about it too much. It passes the other tests, plus the
> randomized tests that I created, up to 150 symbols. At that range, it
> still takes about 20x longer than g++ -O2, for equivalent programs to
> compile (see gen4.pl).
>
> Also, it is currently running about O(n^2) for n symbols, where the
> old one was running about O(n^3) from my analysis. The spill code is
> still very expensive, and has a large constant associate. I also have
> data, which is attached. The difference doesn't show up until a lot
> of spilling is going on, around 80 symbols or so.
>
> I've learned a lot about how the compiler works at this point, and I'd
> like to contribute more :)
>
> Would you like a patch? Should I fix the data dumper routines first?
> What is all this talk about deferred registers? What should I do
> next?
>
> Well, I'm making some comments on the below stuff.


>
> On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski <d...@sidhe.org> wrote:
> > At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:

> > >Dan Sugalski wrote:
> > >>At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
> > >
> > >>>So, if you want that really super efficient, you would allocate
> > >>>registers around function calls directly to that wanted register number,
> > >>>which should be in the SymReg's want_regno.
>

> Yes, I think we are kind of doing this. It's best to pass the
> registers straight through though. Like when a variable will be used
> as a parameter, give it the appropriate reg num. Sort of outside the
> immediate scope of register coloring, but as I've learned, one must go
> a little beyond, to see the input and output for each sub.


>
> > >>While true, in the general case leaving 0-15 as non-preferred
> > >>registers will probably make things easier. Those registers,
> > >>especially the PMC ones, are going to see a lot of thrash as
> > >>function calls are made, and it'll probably be easier to have them
> > >>as scratch registers.
>

> I guess I don't agree. I'd like to pack down the number of registers
> used to a minimum. Then when a function is called, only those needed
> registers are copied in/out. Don't think the functionality exists.
> But the idea is to have each sub declare how many registers to
> save/restore. This would then save 0-k such registers. Where k is
> the number of registers used by the sub. Pack 'em down, minimize the
> number needed.
>
> We can also minimize this number to match the physical architecture
> that parrot is running on (for an arch specific optimization). The
> imc_reg_alloc function does not have 32 hard coded in there (well a
> little bit, but can be easily changed). It's pretty dynamic.


>
> > >Yep, that's the easy part ;) OTOH when the register allocator is
> > >doing register renaming anyway, the most inner loop with a function
> > >call should get registers assigned already matching the calling
> > >convemtions. With more then one call at that loop level, you have to
> > >move around registers anyway.
>

> Yes, yes, renaming! I want to do register renaming!


>
> > Oh, sure, but keeping your scratch PMCs out of the way makes life a
> > lot easier for the register coloring algorithms. Might not be
> > optimal, but if it makes life simpler to start, optimal can come
> > later.
>

> p31 holds all the spill stuff. It's a pain. Maybe I'll move that
> around, but if p31 is used, it means that there is no more room for
> symbols, in at least one of the reg sets.


>
> > >[1] all except Dan's 6000 lines subroutines :) Did you start
> > >creating real subs for your code already?
> >

> > I wish. :( Unfortunately not, outside some simple stuff, and I doubt
> > I will. The language just doesn't lend itself to that sort of thing.
> > We're going to add actual real subroutines to the language after we
> > roll out into production, but that doesn't help now, alas.
>

> Interesting. I'd like to test on something like that. Maybe SPEC99 as well.
>
> - Bill Coffman
>
>
>


--
"Computer Science is merely the post-Turing Decline of Formal Systems Theory."
-???

Dan Sugalski

unread,
Oct 28, 2004, 9:58:19 PM10/28/04
to Bill Coffman, Leopold Toetsch, perl6-i...@perl.org
At 3:08 PM -0700 10/28/04, Bill Coffman wrote:
> It passes the other tests, plus the
>randomized tests that I created, up to 150 symbols. At that range, it
>still takes about 20x longer than g++ -O2, for equivalent programs to
>compile (see gen4.pl).

Still, that's not bad.

>Also, it is currently running about O(n^2) for n symbols, where the
>old one was running about O(n^3) from my analysis. The spill code is
>still very expensive, and has a large constant associate. I also have
>data, which is attached. The difference doesn't show up until a lot
>of spilling is going on, around 80 symbols or so.

I'm curious to see how it behaves once the spilling gets up into the
1000+ symbol range. Dropping from cubic to quadratic time ought to
make a not-insignificant change in the running time, even if that
constant's pretty big. :)

>I've learned a lot about how the compiler works at this point, and I'd
>like to contribute more :)
>
>Would you like a patch?

Yes! Oh, yeah, definitely.

>Well, I'm making some comments on the below stuff.
>
>On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski <d...@sidhe.org> wrote:
>> At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:

>> >Dan Sugalski wrote:
> > >>At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
> > >>While true, in the general case leaving 0-15 as non-preferred
>> >>registers will probably make things easier. Those registers,
>> >>especially the PMC ones, are going to see a lot of thrash as
>> >>function calls are made, and it'll probably be easier to have them
>> >>as scratch registers.
>

>I guess I don't agree. I'd like to pack down the number of registers
>used to a minimum. Then when a function is called, only those needed
>registers are copied in/out. Don't think the functionality exists.
>But the idea is to have each sub declare how many registers to
>save/restore. This would then save 0-k such registers. Where k is
>the number of registers used by the sub. Pack 'em down, minimize the
>number needed.
>
>We can also minimize this number to match the physical architecture
>that parrot is running on (for an arch specific optimization). The
>imc_reg_alloc function does not have 32 hard coded in there (well a
>little bit, but can be easily changed). It's pretty dynamic.

By all means, go for it. I certainly don't want to curb your
enthusiasm. It's the right thing to do, ultimately. I didn't want to
presume on your time. Happy to have it, of course. :)

> > >[1] all except Dan's 6000 lines subroutines :) Did you start
>> >creating real subs for your code already?
>>

>> I wish. :( Unfortunately not, outside some simple stuff, and I doubt
>> I will. The language just doesn't lend itself to that sort of thing.
>> We're going to add actual real subroutines to the language after we
>> roll out into production, but that doesn't help now, alas.
>
>Interesting. I'd like to test on something like that. Maybe SPEC99 as well.

If you've got a patch, I'd be more than happy to give it a whirl, and
I can likely get you a copy of the code in question to give a run on.

Bill Coffman

unread,
Oct 29, 2004, 1:07:14 AM10/29/04
to Dan Sugalski, Perl 6 Internals
When I cvs up'd, cleared and reConfigure'd I got these stats:

Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------

t/library/streams.t 1 256 21 1 4.76% 11
t/op/gc.t 1 256 18 1 5.56% 13
4 tests and 66 subtests skipped.
Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay.

> I'm curious to see how it behaves once the spilling gets up into the
> 1000+ symbol range. Dropping from cubic to quadratic time ought to
> make a not-insignificant change in the running time, even if that
> constant's pretty big. :)

I think it's a bit more complicated. M=number lines in code, N=number
variables.
- Time = O(M^2+N^2)
- old time = O(M^2 +N^3)
Not quite sure of this either. But the N^3 eventually dominates, I
think. The data seems to bear this out.

There are more fixes I'd like to make as well. I spotted several
things that could be fixed. And I think the spill code can be
optimized a lot to reduce the big-O time as well.

More statistics #vars v. time in seconds:
#vars gcc parrot2
200 7.92 89.20
201 11.86 146.31
202 18.11 246.37
203 9.54 107.88
204 11.81 134.60
205 14.75 190.95
206 13.25 161.83
207 10.63 138.83
208 11.02 117.73
209 7.14 88.29
210 15.14 176.69

I am also running gen3.pl with 1000 vars. It's still on gcc. We'll
see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight.

> >Would you like a patch?
>
> Yes! Oh, yeah, definitely.

> [...]


> If you've got a patch, I'd be more than happy to give it a whirl, and
> I can likely get you a copy of the code in question to give a run on.

Soon, I'll send one the proper way.


> > The
> >imc_reg_alloc function does not have 32 hard coded in there (well a
> >little bit, but can be easily changed). It's pretty dynamic.
> By all means, go for it. I certainly don't want to curb your
> enthusiasm. It's the right thing to do, ultimately. I didn't want to
> presume on your time. Happy to have it, of course. :)

Thanks. I've had a great time doing this. Remembering graph
algorithms and compilers. Great fun! I'd also like to contribute to
getting Parrot out there, sooner rather than later. So if I can help
with that, I'd like to hear suggestions.

-Bill

Bill Coffman

unread,
Oct 29, 2004, 1:42:14 AM10/29/04
to Matt Fowles, Perl 6 Internals
Thanks Matt,

I hope I can help out. The patch I am submitting actually does
simplify register coloring a bit. I've been waiting for perl6 with so
much anticipation, I just couldn't stand it any more, and I had to
participate.

-Bill

Leopold Toetsch

unread,
Oct 29, 2004, 3:08:36 AM10/29/04
to Bill Coffman, perl6-i...@perl.org
Bill Coffman <bill.c...@gmail.com> wrote:

> Currently, here's how the register allocator is doing.

> Failed Test Stat Wstat Total Fail Failed List of Failed
> -------------------------------------------------------------------------------


> t/library/dumper.t 5 1280 13 5 38.46% 1-2 5 8 13
> 4 tests and 51 subtests skipped.
> Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.

> I recall Leo, or someone, saying that the data dumper routines are not
> following the calling convention properly.

I didn't look too close, but it's probably only the entry points:

.sub _dumper
_global_dumper()

That's missing C<.param> statements, so there are none.

> I've learned a lot about how the compiler works at this point, and I'd
> like to contribute more :)

Great. Thanks.

> Would you like a patch? Should I fix the data dumper routines first?

Definitely - dumper.t tests are currently disabled, don't worry.

> What is all this talk about deferred registers? What should I do
> next?

"deferred registers" doesn't make bells ring. What do you mean with
that? - Send patch with explanation of algorithm.

> Yes, I think we are kind of doing this. It's best to pass the
> registers straight through though. Like when a variable will be used
> as a parameter, give it the appropriate reg num. Sort of outside the
> immediate scope of register coloring, but as I've learned, one must go
> a little beyond, to see the input and output for each sub.

Well, it's not really outside of register coloring. It's part of
parrot's calling conventions. You can think of it as part of the Parrot
machine ABI. When you write a compiler for darwin-PPC, you have to pass
function arguments in r3, r4, ... and you get a return value in r3. If
you don't do that, you'll not be able to make any C library call.

In Parrot we have similar calling conventions and the register allocator
must be aware of that. E.g. when you have:

some_function() # (i, j) = some_function()
$I0 = I5
$I1 = I6

you know that I5 and I6 are return results. The live range or the
previous usage of I5 and I6 is cut by the function call.

Using the return values directly is of course an optimization and not
strictly necessary, nethertheless the allocator has to be aware that the
function call invalidates previous I5 and I6.

> But the idea is to have each sub declare how many registers to
> save/restore.

Don't worry about save/restore. That's already changed. imcc doesn't
emit any savetop/restoretop or similar opcodes any more. Registers are
preserved now by allocating a new register frame for the subroutine.

> We can also minimize this number to match the physical architecture
> that parrot is running on (for an arch specific optimization).

Yes. I did that some time ago in imcc/jit.c, which produced register
mapping for the underlying hardware CPU. Parrot registers 0.. n-1 were
given negative numbers and src/jit.c used these directly as mapping for
CPU registers. This vastly reduced JIT startup time.

> Yes, yes, renaming! I want to do register renaming!

Go for it please.

> p31 holds all the spill stuff. It's a pain. Maybe I'll move that
> around, but if p31 is used, it means that there is no more room for
> symbols, in at least one of the reg sets.

I'd say that with register renaming, spilling will be very rare. But
there is of course no need to use P31 for it. If we really have to spill
we can optimize that a bit.

> - Bill Coffman

leo

Leopold Toetsch

unread,
Oct 29, 2004, 4:28:52 AM10/29/04
to perl6-i...@perl.org
Leopold Toetsch <l...@toetsch.at> wrote:
> Bill Coffman <bill.c...@gmail.com> wrote:

>> t/library/dumper.t 5 1280 13 5 38.46% 1-2 5 8 13

> I didn't look too close, but it's probably only the entry points:

> .sub _dumper
> _global_dumper()

Fixed.

leo

0 new messages