- Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. - Remove several of the functions, which are incorporated into the new algorithm.
- Shortcomming: doesn't use score anymore, but the algorithm is smart enough that I hope it's okay to do that. - Failed 2 tests for latest CVS. (See earlier posting.)
WANT TO DO: - Apparently, there's a memory leak which prevents from coloring graphs with more than a few hundred registers. I suspect this is in the spill, or update_life routine. Not sure if it's mine or pre-existing. - Interference graph is using 8 times the memory it needs to use. This is still trivial compared to lost data in above bug. - Smarten up algorithm to use score again. A good way to do so is commented in the code. - Create spilling score, that prints out with a debug option. This can be a metric to compare various algorithms. - Improve spill to spill all registers at once, adding speed. - Introduce proper analysis of flow graph, to create less conservative interference graph. - Color each of the four register types separately. Be sure to compare gains with losses for this, as it is not entirely cear. - Introduce register renaming. When variable is reassigned, it might as well be considered a new symbol... well, much of the time, anyway. - Introduce variable register size, in coordination with subroutine calls, to reduce copy cost. Coordinate with Dan and Leo on this. - Improve flow-graph, basic block calculation, etc. Make it all a little easier to understand, and more efficient. Knuth style, literate programming. Well, just good comments, and a couple of decent pods.
> - Applied Matula/Chaitin/Briggs algorithm for register allocation. > - Color the graph all at once, and spill all symbols with high colors. > Spill all at once to speed things up.
Good. Hopefully Dan can provide some compile number compares.
> - Shortcomming: doesn't use score anymore, but the algorithm is smart > enough that I hope it's okay to do that. > - Failed 2 tests for latest CVS. (See earlier posting.)
I've fixed dumper.t in CVS. Only streams_11 is currently failing here.
> WANT TO DO: > - Apparently, there's a memory leak which prevents from coloring > graphs with more than a few hundred registers. I suspect this is in > the spill, or update_life routine. Not sure if it's mine or > pre-existing.
There probably were already some leaks. But we really have to get rid of memory leaks alltogether.
> - Interference graph is using 8 times the memory it needs to use. > This is still trivial compared to lost data in above bug.
That might kill Dan's 6000-liner.
> - Color each of the four register types separately. Be sure to > compare gains with losses for this, as it is not entirely cear.
That would reduce memory, wouldn't it?
> - Introduce register renaming. When variable is reassigned, it might > as well be considered a new symbol... well, much of the time, anyway.
Number 1 in my priority list.
> - Introduce variable register size, in coordination with subroutine > calls, to reduce copy cost. Coordinate with Dan and Leo on this.
Not needed. We don't copy registers anymore.
> - Improve flow-graph, basic block calculation, etc.
Yeah. And create some means to test it.
Some more notes WRT the patch:
* the Dbg and Dbg2 debug macros aren't needed. Just use the existing debug(interp, level, ...) function in src/debug.c. If you need some extra levels, you can use some more bits in imcc/debug.h
* The global G is a no no, and I don't think you need it for qsort (If you need it you should just use the global around the qsort). We finally have to have a reentrant compiler. Yes I know, there are still some other globals around, they are being reduced ...
* all functions should have an Interp* and a IMC_Unit* argument to allow reentrancy. I.e. all state should be in the unit structure.
* Variable names should be a bit more verbose, G.V is to terse.
* alloca() isn't portable and not available everywhere
At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote:
>Bill Coffman (via RT) wrote:
>>Patch does the following:
>>- Applied Matula/Chaitin/Briggs algorithm for register allocation. >>- Color the graph all at once, and spill all symbols with high colors. >> Spill all at once to speed things up.
>Good. Hopefully Dan can provide some compile number compares.
I'll give it a shot as soon as I can.
>>WANT TO DO: >>- Apparently, there's a memory leak which prevents from coloring >>graphs with more than a few hundred registers. I suspect this is in >>the spill, or update_life routine. Not sure if it's mine or >>pre-existing.
>There probably were already some leaks. But we really have to get >rid of memory leaks alltogether.
>>- Interference graph is using 8 times the memory it needs to use. >>This is still trivial compared to lost data in above bug.
>That might kill Dan's 6000-liner.
I should point out, for the folks following along at home, that it's 6K lines of source in the original language (DecisionPlus). The actual PIR code generated runs to 84k lines in the biggest sub.
>Some more notes WRT the patch:
>* the Dbg and Dbg2 debug macros aren't needed. Just use the existing >debug(interp, level, ...) function in src/debug.c. If you need some >extra levels, you can use some more bits in imcc/debug.h
>* The global G is a no no, and I don't think you need it for qsort >(If you need it you should just use the global around the qsort). We >finally have to have a reentrant compiler. Yes I know, there are >still some other globals around, they are being reduced ...
I'd like to get us down to a single global for all of Parrot. I don't think it's possible to safely go any lower than that, though I suppose we could if we really, really tried, and didn't mind things crashing and burning in some really odd fringe edge cases.
>* all functions should have an Interp* and a IMC_Unit* argument to >allow reentrancy. I.e. all state should be in the unit structure.
Definitely.
>* Variable names should be a bit more verbose, G.V is to terse.
Yeah. This stuff is abstruse enough as it is -- take pity on those uf us with Very Little Brain. :)
>* alloca() isn't portable and not available everywhere
Yep. This is a gcc-ism. Use Parrot's memory allocation functions instead.
>I'm waiting for Dan's comments on usability.
I'd like the code issues cleaned up before it gets committed. I'll let you know the timing as soon as I can, though it'll probably take a few hours. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote:
>Bill Coffman (via RT) wrote:
>>Patch does the following:
>>- Applied Matula/Chaitin/Briggs algorithm for register allocation. >>- Color the graph all at once, and spill all symbols with high colors. >> Spill all at once to speed things up.
>Good. Hopefully Dan can provide some compile number compares.
The numbers are... not good.
I took one of the mid-sized programs and threw it at the new code. Parrot in CVS takes about 10 minutes to run through this program. The main sub's about 30Klines of code, and the stat from a parrot -v is:
sub _MAIN: registers in .imc: I2875, N0, S868, P7615 0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch 0 used once deleted 0 invariants_moved registers needed: I2883, N0, S873, P7741 registers in .pasm: I31, N0, S31, P32 - 37 spilled 5845 basic_blocks, 47622 edges
I applied the patch to a copy of parrot and ran it. After 37 minutes I killed the thing. It had 1.6G of RAM allocated at the time of death, too. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
On Fri, 29 Oct 2004 10:15:34 -0400, Dan Sugalski <d...@sidhe.org> wrote: > At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote: > >Bill Coffman (via RT) wrote:
> >>Patch does the following:
> >>- Applied Matula/Chaitin/Briggs algorithm for register allocation. > >>- Color the graph all at once, and spill all symbols with high colors. > >> Spill all at once to speed things up.
> >Good. Hopefully Dan can provide some compile number compares.
> The numbers are... not good.
> I took one of the mid-sized programs and threw it at the new code. > Parrot in CVS takes about 10 minutes to run through this program. The > main sub's about 30Klines of code, and the stat from a parrot -v is:
> I applied the patch to a copy of parrot and ran it. After 37 minutes > I killed the thing. It had 1.6G of RAM allocated at the time of > death, too. > --
> Dan
> --------------------------------------it's like this------------------- > Dan Sugalski even samurai > d...@sidhe.org have teddy bears and even > teddy bears get drunk
[Leo]* alloca() isn't portable and not available everywhere [Dan] Yep. This is a gcc-ism. Use Parrot's memory allocation functions instead.
reg_alloc.c uses malloc(). I saw alloca() used in the CVS tree earlier, but it's gone now. What is the correct way to allocate memory in Parrot?
[Leo]* the Dbg and Dbg2 debug macros aren't needed. Just use the existing debug(interp, level, ...) function in src/debug.c. If you need some extra levels, you can use some more bits in imcc/debug.h
I notice an IMC_TRACE macro, as well as the associated debug stuff. Debug looks like it has a lot of overhead, even when turned off. I didn't really get Dbg cleaned up either, because I couldn't figure out how to handle variable number of arguments, but something like assert() would be much nicer. I'll go ahead and follow the convention of adding debug calls. Should the IMC_TRACE stuff be removed?
[Leo] * all functions should have an Interp* and a IMC_Unit* argument to allow [Leo] reentrancy. I.e. all state should be in the unit structure.
I'm not quite sure I understand this. I get that each function call has (interpreter,unit) as parameters. But why is this? Parrot handles interrupts, I understand, so this probably has something to do with interrupt processing. Does this also mean that there should be no other data structures, except those pointed to by interpreter or unit?
[Dan] I'd like the code issues cleaned up before it gets committed.
I want to fix the memory leak, which involves rewriting a few functions. This might go fast, or it could become difficult. It's probably best to send an intermediate patch that has the code issues fixed. That way we can maximize code review, and make sure I'm writing parrot compliant code.
>[Leo]* alloca() isn't portable and not available everywhere >[Dan] Yep. This is a gcc-ism. Use Parrot's memory allocation >functions instead.
>reg_alloc.c uses malloc(). I saw alloca() used in the CVS tree >earlier, but it's gone now. What is the correct way to allocate >memory in Parrot?
That depends. If it's GC-able memory (that is, hanging off a buffer) then use Parrot_allocate or Parrot_reallocate (depending on whether you're doing a malloc or realloc). For memory you track by hand, use mem_sys_allocate, mem_sys_reallocate, and mem_sys_free. The first set are in resources.c with docs, the second in memory.c.
>[Leo] * all functions should have an Interp* and a IMC_Unit* argument to allow >[Leo] reentrancy. I.e. all state should be in the unit structure.
>I'm not quite sure I understand this. I get that each function call >has (interpreter,unit) as parameters. But why is this? Parrot >handles interrupts, I understand, so this probably has something to do >with interrupt processing. Does this also mean that there should be >no other data structures, except those pointed to by interpreter or >unit?
To allocate anything in parrot you need an interpreter pointer. (To do a lot of other stuff too) It's worth slinging them around everywhere, since it's a big pain to add them all in after the fact, piecemeal. (Been there, done that :)
>[Dan] I'd like the code issues cleaned up before it gets committed.
>I want to fix the memory leak, which involves rewriting a few >functions. This might go fast, or it could become difficult. It's >probably best to send an intermediate patch that has the code issues >fixed. That way we can maximize code review, and make sure I'm >writing parrot compliant code.
Cool. And you've got that monstrosity of mine to test on :) -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
[ Dan answered already, so I'll provide just a few additional notes ]
> reg_alloc.c uses malloc().
Yep, there are still a few malloc()s around, they should be mem_sys_allocate(), which is just a macro for malloc() mostly, but leaves us the chance to do our own implementation.
> ...I saw alloca() used in the CVS tree > earlier, but it's gone now. What is the correct way to allocate > memory in Parrot?
Not inside Parrot. Oh well:
$ find . -name '*.c' | xargs grep -w alloca ./ast/astparser.c:/* The parser invokes alloca or malloc; \ define the necessary symbols. */ ...
The yacc, bison, whatever defines some macros for extending internal tables that might get triggered or not. I trust that piece of code to use a different strategy, if alloca isn't available.
Inside the Parrot tree, we don't use alloca().
> [Leo]* the Dbg and Dbg2 debug macros aren't needed. > I notice an IMC_TRACE macro, as well as the associated debug stuff. > Debug looks like it has a lot of overhead, even when turned off.
There is a bit of overhead if useed inside a loop or such. You can always wrap #ifndef NDEBUG around it.
> ... , because I couldn't figure out > how to handle variable number of arguments,
The debug() function handles variable arguments as well as different debugging verbosity levels.
> assert() would be much nicer.
assert() is usable as is, just include <assert.h>. But assert() is of course a different thing. First is: make it working correctly. We'll optimize out the possible debug stuff later. Inside loops place NDEBUG macros around it.
> [Leo] * all functions should have an Interp* and a IMC_Unit* argument > I'm not quite sure I understand this. I get that each function call > has (interpreter,unit) as parameters. But why is this? Parrot > handles interrupts,
It's not because of interrupts. We'll never run that code from a signal handler. But imagine, we have not only one parser that calls the register allocation stuff, we have more parsers handling even different languages.
E.g. You are creating code for
BEGIN { bar = 2; foo() };
or some such. During compiling that, a different compiler is invoked that compiles the "foo" function. This compiler may run in a separate thread and starts overwriting global state inmidst of compiling the first piece of code.
Having the state in an interpreter variable prevents from that. C<interpreter> is not really needed inside most of these functions currently, but is e.g. badly needed, if we change mem_sys_allocate() to use interpreter-internal memory allocation. C<IMC_Unit> holds the state of the current function or unit. So these are needed function arguments.
We already had tons of patches, to bring these arguments to the most functions used inside imcc, but that's not finished and the reason for some nasty issues with state in globals. It's a PITA.
So for new code just use these 2 additional function arguments.
> ... Does this also mean that there should be > no other data structures, except those pointed to by interpreter or > unit?
No. You can use all possible data structures you need. Register allocation is done per unit, just hang your needed structure(s) off IMC_Unit defined in F<imcc/unit.h>.