>I have been hard at work, trying to grok the reg_alloc.c code, and >with some success. My code is assigning registers, so that none are >conflicting (which I double-verify), and I'm getting to the end of >"make".
>1) In the existing parrot code, when a register is assigned, it uses >the following code: > for (color = 0; color < MAX_COLOR; color++) { > int c = (color + MAX_COLOR/2) % MAX_COLOR; > if (!colors[c]) { > reglist[x]->color = c;
>Thus, it seems to prefer to use register #16 and up, first, before it >uses 0-15. This is mistifying to me, since it's not so trivial to >code it this way, and yet I cannot figure out why this color >preference. It seems like maybe there's a reason which I need to >understand here. My new code doesn't do this behaviour.
Registers 0-15 are the registers used for sub/method/function calls. Leaving them empty makes calls easier -- if the low-registers are in use you need to shuffle the contents somewhere else.
[Snippage]
Looks like we've got some... incidental behavior. This isn't particularly good. Feel free to rip out anything you want -- this is all internals stuff, much of it's first-generation code, and nobody's got any attachment to it. (I hope :)
>which seems like a lot to me, but I guess it's compiling a lot of stuff.
That doesn't seem right. Leo may well have an idea of why. On the other hand, I'm fine with ripping a lot of this stuff out and centralizing it.
-- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
Bill Coffman <bill.coff...@gmail.com> wrote: > Hello All, > I have been hard at work, trying to grok the reg_alloc.c code, and > with some success. My code is assigning registers, so that none are > conflicting (which I double-verify), and I'm getting to the end of > "make".
Wow.
> 1) In the existing parrot code, when a register is assigned, it uses > the following code: > int c = (color + MAX_COLOR/2) % MAX_COLOR; > Thus, it seems to prefer to use register #16 and up, first, before it > uses 0-15. This is mistifying to me, since it's not so trivial to > code it this way,
Well, some time ago, register allocation started at zero. The split of register frames in upper and lower halfs *plus* the premature optimization to save only the upper half of registers made it necessary to allocate from 16 up.
But things are a bit more compilcated. For function calls, we are passing arguments from register x5 ..x15 and I0..I4 plus some more have a special meaning. See docs/ppds/pdd03_calling_conventions for all the details. The same convention is used for function returns.
So, if you want that really super efficient, you would allocate registers around function calls directly to that wanted register number, which should be in the SymReg's want_regno.
> 2) When function imc_reg_alloc (the main register allocation thingy) > is called, some of the variables have already expressed a preference > as to which register they want. I understand that this can optimize > certain parameter passing, etc. The question that arrises is, what if > two conflicting variables want the same color (reg#)? Obviously, they > don't get their way. But what are the consequences, and what must be > done to fix this.
Ah, ok. For each function call registers are moved around to the correct number, if they aren't there. This is done by code in imcc/pcc.c. But the problem still is there that your code really shouldn't assign these registers in a conflicting way.
> ... Incidentally, reg > allocation is done on the following subs: > _main __library_dumper_onload _dumper _register_dumper _global_dumper
[ ...]
Sure. Register allocation code is done per .sub/.end always. So that's fine. But you might start with simpler source code having one or two function calls only.
> Maybe someone can point me at something to look at to fix this. I'll > provide the patch if someone would like to play with it.
The main problem probably is to follow register usage in calling conventions.
BTW: preserving the upper half of registers will be tossed soon. To simulate this behavior in current CVS, you could run the code with -Oc, which should preserve all allocated registers. I don't know, when the indirect register frame patch whill hit CVS, but it should be in a few days. With that in place, you can allocate registers as you like, with the caveat that rules in PDD03 are used.
> Thanks to all who made it this far, > Bill Coffman
>Bill Coffman <bill.coff...@gmail.com> wrote: > > 1) In the existing parrot code, when a register is assigned, it uses >> the following code:
>> int c = (color + MAX_COLOR/2) % MAX_COLOR;
>> Thus, it seems to prefer to use register #16 and up, first, before it >> uses 0-15. This is mistifying to me, since it's not so trivial to >> code it this way,
>Well, some time ago, register allocation started at zero. The split of >register frames in upper and lower halfs *plus* the premature >optimization to save only the upper half of registers made it necessary >to allocate from 16 up.
>But things are a bit more compilcated. For function calls, we are >passing arguments from register x5 ..x15 and I0..I4 plus some more have >a special meaning. See docs/ppds/pdd03_calling_conventions for all the >details. The same convention is used for function returns.
>So, if you want that really super efficient, you would allocate >registers around function calls directly to that wanted register number, >which should be in the SymReg's want_regno.
While true, in the general case leaving 0-15 as non-preferred registers will probably make things easier. Those registers, especially the PMC ones, are going to see a lot of thrash as function calls are made, and it'll probably be easier to have them as scratch registers.
It's distinctly possible, of course, that there'll be very little pressure to actually *use* them for most code, as we've got plenty of registers in general. That's the hope, at least.
-- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
Dan Sugalski wrote: > At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote: >> So, if you want that really super efficient, you would allocate >> registers around function calls directly to that wanted register number, >> which should be in the SymReg's want_regno.
> While true, in the general case leaving 0-15 as non-preferred registers > will probably make things easier. Those registers, especially the PMC > ones, are going to see a lot of thrash as function calls are made, and > it'll probably be easier to have them as scratch registers.
Yep, that's the easy part ;) OTOH when the register allocator is doing register renaming anyway, the most inner loop with a function call should get registers assigned already matching the calling convemtions. With more then one call at that loop level, you have to move around registers anyway.
> It's distinctly possible, of course, that there'll be very little > pressure to actually *use* them for most code, as we've got plenty of > registers in general. That's the hope, at least.
Yes, 16 regs are plenty and do suffice for all "normal"[1] code. Assigning to wanted reg numbers for a function is a nice optimization.
[1] all except Dan's 6000 lines subroutines :) Did you start creating real subs for your code already?
>Dan Sugalski wrote: >>At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
>>>So, if you want that really super efficient, you would allocate >>>registers around function calls directly to that wanted register number, >>>which should be in the SymReg's want_regno.
>>While true, in the general case leaving 0-15 as non-preferred >>registers will probably make things easier. Those registers, >>especially the PMC ones, are going to see a lot of thrash as >>function calls are made, and it'll probably be easier to have them >>as scratch registers.
>Yep, that's the easy part ;) OTOH when the register allocator is >doing register renaming anyway, the most inner loop with a function >call should get registers assigned already matching the calling >convemtions. With more then one call at that loop level, you have to >move around registers anyway.
Oh, sure, but keeping your scratch PMCs out of the way makes life a lot easier for the register coloring algorithms. Might not be optimal, but if it makes life simpler to start, optimal can come later.
>>It's distinctly possible, of course, that there'll be very little >>pressure to actually *use* them for most code, as we've got plenty >>of registers in general. That's the hope, at least.
>Yes, 16 regs are plenty and do suffice for all "normal"[1] code. >Assigning to wanted reg numbers for a function is a nice >optimization.
>[1] all except Dan's 6000 lines subroutines :) Did you start >creating real subs for your code already?
I wish. :( Unfortunately not, outside some simple stuff, and I doubt I will. The language just doesn't lend itself to that sort of thing. We're going to add actual real subroutines to the language after we roll out into production, but that doesn't help now, alas. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
I have to say that I am really impressed by all of the work that you are doing, and if you can make the internals of imcc a little more approachable, you would be doing a great service.
On Thu, 28 Oct 2004 15:08:23 -0700, Bill Coffman <bill.coff...@gmail.com> wrote: > Hi all,
> Thanks for your continued comments. Btw, I usually read all the > parrot list, so don't think I'm not paying attention.
> Currently, here's how the register allocator is doing.
> Failed Test Stat Wstat Total Fail Failed List of Failed > --------------------------------------------------------------------------- ---- > t/library/dumper.t 5 1280 13 5 38.46% 1-2 5 8 13 > 4 tests and 51 subtests skipped. > Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.
> I recall Leo, or someone, saying that the data dumper routines are not > following the calling convention properly. So I've decided not to > worry about it too much. It passes the other tests, plus the > randomized tests that I created, up to 150 symbols. At that range, it > still takes about 20x longer than g++ -O2, for equivalent programs to > compile (see gen4.pl).
> Also, it is currently running about O(n^2) for n symbols, where the > old one was running about O(n^3) from my analysis. The spill code is > still very expensive, and has a large constant associate. I also have > data, which is attached. The difference doesn't show up until a lot > of spilling is going on, around 80 symbols or so.
> I've learned a lot about how the compiler works at this point, and I'd > like to contribute more :)
> Would you like a patch? Should I fix the data dumper routines first? > What is all this talk about deferred registers? What should I do > next?
> Well, I'm making some comments on the below stuff.
> On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski <d...@sidhe.org> wrote: > > At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote: > > >Dan Sugalski wrote: > > >>At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
> > >>>So, if you want that really super efficient, you would allocate > > >>>registers around function calls directly to that wanted register number, > > >>>which should be in the SymReg's want_regno.
> Yes, I think we are kind of doing this. It's best to pass the > registers straight through though. Like when a variable will be used > as a parameter, give it the appropriate reg num. Sort of outside the > immediate scope of register coloring, but as I've learned, one must go > a little beyond, to see the input and output for each sub.
> > >>While true, in the general case leaving 0-15 as non-preferred > > >>registers will probably make things easier. Those registers, > > >>especially the PMC ones, are going to see a lot of thrash as > > >>function calls are made, and it'll probably be easier to have them > > >>as scratch registers.
> I guess I don't agree. I'd like to pack down the number of registers > used to a minimum. Then when a function is called, only those needed > registers are copied in/out. Don't think the functionality exists. > But the idea is to have each sub declare how many registers to > save/restore. This would then save 0-k such registers. Where k is > the number of registers used by the sub. Pack 'em down, minimize the > number needed.
> We can also minimize this number to match the physical architecture > that parrot is running on (for an arch specific optimization). The > imc_reg_alloc function does not have 32 hard coded in there (well a > little bit, but can be easily changed). It's pretty dynamic.
> > >Yep, that's the easy part ;) OTOH when the register allocator is > > >doing register renaming anyway, the most inner loop with a function > > >call should get registers assigned already matching the calling > > >convemtions. With more then one call at that loop level, you have to > > >move around registers anyway.
> Yes, yes, renaming! I want to do register renaming!
> > Oh, sure, but keeping your scratch PMCs out of the way makes life a > > lot easier for the register coloring algorithms. Might not be > > optimal, but if it makes life simpler to start, optimal can come > > later.
> p31 holds all the spill stuff. It's a pain. Maybe I'll move that > around, but if p31 is used, it means that there is no more room for > symbols, in at least one of the reg sets.
> > >[1] all except Dan's 6000 lines subroutines :) Did you start > > >creating real subs for your code already?
> > I wish. :( Unfortunately not, outside some simple stuff, and I doubt > > I will. The language just doesn't lend itself to that sort of thing. > > We're going to add actual real subroutines to the language after we > > roll out into production, but that doesn't help now, alas.
> Interesting. I'd like to test on something like that. Maybe SPEC99 as well.
> - Bill Coffman
-- "Computer Science is merely the post-Turing Decline of Formal Systems Theory." -???
> It passes the other tests, plus the >randomized tests that I created, up to 150 symbols. At that range, it >still takes about 20x longer than g++ -O2, for equivalent programs to >compile (see gen4.pl).
Still, that's not bad.
>Also, it is currently running about O(n^2) for n symbols, where the >old one was running about O(n^3) from my analysis. The spill code is >still very expensive, and has a large constant associate. I also have >data, which is attached. The difference doesn't show up until a lot >of spilling is going on, around 80 symbols or so.
I'm curious to see how it behaves once the spilling gets up into the 1000+ symbol range. Dropping from cubic to quadratic time ought to make a not-insignificant change in the running time, even if that constant's pretty big. :)
>I've learned a lot about how the compiler works at this point, and I'd >like to contribute more :)
>Well, I'm making some comments on the below stuff.
>On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski <d...@sidhe.org> wrote: >> At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote: >> >Dan Sugalski wrote: > > >>At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote: > > >>While true, in the general case leaving 0-15 as non-preferred >> >>registers will probably make things easier. Those registers, >> >>especially the PMC ones, are going to see a lot of thrash as >> >>function calls are made, and it'll probably be easier to have them >> >>as scratch registers.
>I guess I don't agree. I'd like to pack down the number of registers >used to a minimum. Then when a function is called, only those needed >registers are copied in/out. Don't think the functionality exists. >But the idea is to have each sub declare how many registers to >save/restore. This would then save 0-k such registers. Where k is >the number of registers used by the sub. Pack 'em down, minimize the >number needed.
>We can also minimize this number to match the physical architecture >that parrot is running on (for an arch specific optimization). The >imc_reg_alloc function does not have 32 hard coded in there (well a >little bit, but can be easily changed). It's pretty dynamic.
By all means, go for it. I certainly don't want to curb your enthusiasm. It's the right thing to do, ultimately. I didn't want to presume on your time. Happy to have it, of course. :)
> > >[1] all except Dan's 6000 lines subroutines :) Did you start >> >creating real subs for your code already?
>> I wish. :( Unfortunately not, outside some simple stuff, and I doubt >> I will. The language just doesn't lend itself to that sort of thing. >> We're going to add actual real subroutines to the language after we >> roll out into production, but that doesn't help now, alas.
>Interesting. I'd like to test on something like that. Maybe SPEC99 as well.
If you've got a patch, I'd be more than happy to give it a whirl, and I can likely get you a copy of the code in question to give a run on. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
When I cvs up'd, cleared and reConfigure'd I got these stats:
Failed Test Stat Wstat Total Fail Failed List of Failed --------------------------------------------------------------------------- ---- t/library/streams.t 1 256 21 1 4.76% 11 t/op/gc.t 1 256 18 1 5.56% 13 4 tests and 66 subtests skipped. Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay.
> I'm curious to see how it behaves once the spilling gets up into the > 1000+ symbol range. Dropping from cubic to quadratic time ought to > make a not-insignificant change in the running time, even if that > constant's pretty big. :)
I think it's a bit more complicated. M=number lines in code, N=number variables. - Time = O(M^2+N^2) - old time = O(M^2 +N^3) Not quite sure of this either. But the N^3 eventually dominates, I think. The data seems to bear this out.
There are more fixes I'd like to make as well. I spotted several things that could be fixed. And I think the spill code can be optimized a lot to reduce the big-O time as well.
I am also running gen3.pl with 1000 vars. It's still on gcc. We'll see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight.
> >Would you like a patch?
> Yes! Oh, yeah, definitely. > [...] > If you've got a patch, I'd be more than happy to give it a whirl, and > I can likely get you a copy of the code in question to give a run on.
Soon, I'll send one the proper way.
> > The > >imc_reg_alloc function does not have 32 hard coded in there (well a > >little bit, but can be easily changed). It's pretty dynamic. > By all means, go for it. I certainly don't want to curb your > enthusiasm. It's the right thing to do, ultimately. I didn't want to > presume on your time. Happy to have it, of course. :)
Thanks. I've had a great time doing this. Remembering graph algorithms and compilers. Great fun! I'd also like to contribute to getting Parrot out there, sooner rather than later. So if I can help with that, I'd like to hear suggestions.
I hope I can help out. The patch I am submitting actually does simplify register coloring a bit. I've been waiting for perl6 with so much anticipation, I just couldn't stand it any more, and I had to participate.
On Thu, 28 Oct 2004 18:17:57 -0400, Matt Fowles <uberm...@gmail.com> wrote: > Bill~
> I have to say that I am really impressed by all of the work that you > are doing, and if you can make the internals of imcc a little more > approachable, you would be doing a great service.
Bill Coffman <bill.coff...@gmail.com> wrote: > Currently, here's how the register allocator is doing. > Failed Test Stat Wstat Total Fail Failed List of Failed > --------------------------------------------------------------------------- ---- > t/library/dumper.t 5 1280 13 5 38.46% 1-2 5 8 13 > 4 tests and 51 subtests skipped. > Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay. > I recall Leo, or someone, saying that the data dumper routines are not > following the calling convention properly.
I didn't look too close, but it's probably only the entry points:
.sub _dumper _global_dumper()
That's missing C<.param> statements, so there are none.
> I've learned a lot about how the compiler works at this point, and I'd > like to contribute more :)
Great. Thanks.
> Would you like a patch? Should I fix the data dumper routines first?
Definitely - dumper.t tests are currently disabled, don't worry.
> What is all this talk about deferred registers? What should I do > next?
"deferred registers" doesn't make bells ring. What do you mean with that? - Send patch with explanation of algorithm.
> Yes, I think we are kind of doing this. It's best to pass the > registers straight through though. Like when a variable will be used > as a parameter, give it the appropriate reg num. Sort of outside the > immediate scope of register coloring, but as I've learned, one must go > a little beyond, to see the input and output for each sub.
Well, it's not really outside of register coloring. It's part of parrot's calling conventions. You can think of it as part of the Parrot machine ABI. When you write a compiler for darwin-PPC, you have to pass function arguments in r3, r4, ... and you get a return value in r3. If you don't do that, you'll not be able to make any C library call.
In Parrot we have similar calling conventions and the register allocator must be aware of that. E.g. when you have:
you know that I5 and I6 are return results. The live range or the previous usage of I5 and I6 is cut by the function call.
Using the return values directly is of course an optimization and not strictly necessary, nethertheless the allocator has to be aware that the function call invalidates previous I5 and I6.
> But the idea is to have each sub declare how many registers to > save/restore.
Don't worry about save/restore. That's already changed. imcc doesn't emit any savetop/restoretop or similar opcodes any more. Registers are preserved now by allocating a new register frame for the subroutine.
> We can also minimize this number to match the physical architecture > that parrot is running on (for an arch specific optimization).
Yes. I did that some time ago in imcc/jit.c, which produced register mapping for the underlying hardware CPU. Parrot registers 0.. n-1 were given negative numbers and src/jit.c used these directly as mapping for CPU registers. This vastly reduced JIT startup time.
> Yes, yes, renaming! I want to do register renaming!
Go for it please.
> p31 holds all the spill stuff. It's a pain. Maybe I'll move that > around, but if p31 is used, it means that there is no more room for > symbols, in at least one of the reg sets.
I'd say that with register renaming, spilling will be very rare. But there is of course no need to use P31 for it. If we really have to spill we can optimize that a bit.