Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols

0 views
Skip to first unread message

Bill Coffman

unread,
Nov 12, 2004, 7:55:02 AM11/12/04
to bugs-bi...@rt.perl.org
# New Ticket Created by Bill Coffman
# Please include the string: [perl #32418]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32418 >


Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/op/gc.t 1 256 18 1 5.56% 13
t/pmc/sub.t 1 256 78 1 1.28% 78
4 tests and 59 subtests skipped.
Failed 2/124 test scripts, 98.39% okay. 2/1977 subtests failed, 99.90% okay.

I'll wager that these failed because of the improper coloring. Eg,
conflicting symbols wanting the same register. Since I disabled the
color test, one of them fails by segfault. The other in a similarly
weird way. Not sure why they worked for the other allocator, but if
the test is designed to fit the peculiarities of a the system, then
it's not too surprising that it succeeds.

THINGS DONE:

* fixed adjacency matrix with Bits_per_int(). So it's eight times
more space efficient.

* Eliminate Dbg{,2} (previous patch), replace with DEBUG_REG{,2} and
DEBUG_SPILL in debug.h. DEBUG_SPILL is supposed to produce only spill
info, for evaluating how well algorithms do.

* use (interpreter,unit) as params in new functions -- threadsafer.

* eliminated global several variables, except for a few, which are part of
a suite of globabls used for debugging and other nefarious purposes.

* Use mem_sys_allocate_zeroed and mem_sys_free instead of calloc/free.


WANT
don't use 'G', and 'V'. These seem very natural to me. G for graph,
and V for it's vertices... Oh well, I guess not everyone studies graph
theory for several years. ;)

* move data structures to hang off the IMC_Unit type. imcc/unit.h

* Fix memory leak, if indeed it is. Or just use memory better.

* Do full DU-chain analysis, register renaming, etc. This should
speed it up a lot.

* Need to implement scoring better, to minimize spilling.


On Thu, 28 Oct 2004 23:18:53 -0700, Bill Coffman <bill.c...@gmail.com> wrote:
> Patch does the following:
>
> - Applied Matula/Chaitin/Briggs algorithm for register allocation.
> - Color the graph all at once, and spill all symbols with high colors.
> Spill all at once to speed things up.
> - Remove several of the functions, which are incorporated into the new
> algorithm.
>
> - Shortcomming: doesn't use score anymore, but the algorithm is smart
> enough that I hope it's okay to do that.
> - Failed 2 tests for latest CVS. (See earlier posting.)
>
> WANT TO DO:
> - Apparently, there's a memory leak which prevents from coloring
> graphs with more than a few hundred registers. I suspect this is in
> the spill, or update_life routine. Not sure if it's mine or
> pre-existing.
> - Interference graph is using 8 times the memory it needs to use.
> This is still trivial compared to lost data in above bug.
done

> - Smarten up algorithm to use score again. A good way to do so is
> commented in the code.
> - Create spilling score, that prints out with a debug option. This
> can be a metric to compare various algorithms.
> - Improve spill to spill all registers at once, adding speed.
Done.

> - Introduce proper analysis of flow graph, to create less conservative
> interference graph.
High priority, along with register renaming.

> - Color each of the four register types separately. Be sure to
> compare gains with losses for this, as it is not entirely cear.
> - Introduce register renaming. When variable is reassigned, it might
> as well be considered a new symbol... well, much of the time, anyway.
> - Introduce variable register size, in coordination with subroutine
> calls, to reduce copy cost. Coordinate with Dan and Leo on this.
> - Improve flow-graph, basic block calculation, etc. Make it all a
> little easier to understand, and more efficient. Knuth style,
> literate programming. Well, just good comments, and a couple of
> decent pods.
Documenation is a lower priority, but some explanation is waranted.
Especially for flow analysis stuff, which is rather complex.

>
>
>

x4.patch

Leopold Toetsch

unread,
Nov 15, 2004, 10:08:37 AM11/15/04
to perl6-i...@perl.org, Dan Sugalski

[ x4.patch ]

The register allocator seems to do a great jub. It does e.g. color a
diamond-like interference graph correctly with two colors only:

x
/ \
w z
\ /
y

(the lines denote an interference - BTW you might create some PIR could
that has exactly this interference graph - not quite easy). The old
allocator took three colors.

Anyway: are there already numbers from the big evil subs?

leo

Dan Sugalski

unread,
Nov 15, 2004, 10:57:26 AM11/15/04
to Leopold Toetsch, perl6-i...@perl.org
At 4:08 PM +0100 11/15/04, Leopold Toetsch wrote:
>[ x4.patch ]
>
>The register allocator seems to do a great jub. It does e.g. color a
>diamond-like interference graph correctly with two colors only:
>
> x
> / \
> w z
> \ /
> y
>
>(the lines denote an interference - BTW you might create some PIR
>could that has exactly this interference graph - not quite easy).
>The old allocator took three colors.

Sweet.

>Anyway: are there already numbers from the big evil subs?

I'd love to see 'em. (Or if you're asking if I've applied this and
tried it, the answer's no. I can if that's what's needed)
--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,
Nov 15, 2004, 11:32:33 AM11/15/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:
> At 4:08 PM +0100 11/15/04, Leopold Toetsch wrote:

>>The old allocator took three colors.

> Sweet.

To be precise, it used 3 colors with the pre-allocation hack turned on
that colored temps. W/o that it also used 2 colors.

>>Anyway: are there already numbers from the big evil subs?

> I'd love to see 'em. (Or if you're asking if I've applied this and
> tried it, the answer's no. I can if that's what's needed)

Yes please - the one hunk in debug.h will fail, it's aready applied.

leo

Dan Sugalski

unread,
Nov 15, 2004, 11:46:18 AM11/15/04
to l...@toetsch.at, perl6-i...@perl.org

Okay. I'll apply it and take a shot. May take a few hours to get a real number.

Leopold Toetsch

unread,
Nov 17, 2004, 5:35:32 AM11/17/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski wrote:

> Okay. I'll apply it and take a shot. May take a few hours to get a real
> number.

How does it look like? Any results already?

Thanks,
leo

Dan Sugalski

unread,
Nov 17, 2004, 9:15:54 AM11/17/04
to Leopold Toetsch, perl6-i...@perl.org

Nope, haven't had time, unfortunately. Work's been busy. Today, if I get lucky.

Dan Sugalski

unread,
Nov 23, 2004, 9:27:12 AM11/23/04
to Leopold Toetsch, perl6-i...@perl.org
At 11:35 AM +0100 11/17/04, Leopold Toetsch wrote:

Okay, got some time this morning. Two of the patch hunks were already
in, so I skipped 'em. The results... not good.

The parrot I have, which is a day or two out of date, takes 7m to
churn through one of my pir files. With this patch, I killed the run
at 19.5 minutes.

Leopold Toetsch

unread,
Nov 23, 2004, 9:54:13 AM11/23/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski wrote:
>
> The parrot I have, which is a day or two out of date, takes 7m to churn
> through one of my pir files. With this patch, I killed the run at 19.5
> minutes.

Sh... That's one of the smaller ones I presume. How many basic blocks
and variables are listed with -v?

Thanks,
leo


Dan Sugalski

unread,
Nov 23, 2004, 10:40:45 AM11/23/04
to Leopold Toetsch, perl6-i...@perl.org
At 4:02 PM +0100 11/23/04, Leopold Toetsch wrote:
>Dan Sugalski wrote:
>>The parrot I have, which is a day or two out of date, takes 7m to
>>churn through one of my pir files. With this patch, I killed the
>>run at 19.5 minutes.
>
>One more note: be sure to compile Parrot optimized - the new
>reg_alloc.c has some very expensive sanity checks in debug mode, or
>better if NDEBUG isn't defined.

I can't. My dev machine's running gcc 2.95.4, and gcc throws lisp
error messages compiling the switch core if I turn on optimizations.

Dan Sugalski

unread,
Nov 23, 2004, 10:39:35 AM11/23/04
to Leopold Toetsch, perl6-i...@perl.org
At 3:54 PM +0100 11/23/04, Leopold Toetsch wrote:
>Dan Sugalski wrote:
>>
>>The parrot I have, which is a day or two out of date, takes 7m to
>>churn through one of my pir files. With this patch, I killed the
>>run at 19.5 minutes.
>
>Sh... That's one of the smaller ones I presume.

Nope, one of the biggest. Sixth largest, at 800KB of pir code.

> How many basic blocks and variables are listed with -v?

sh-2.05a$ ~/src/parrot/parrot -v -o forms/shipper.pbc forms/shipper.imc
debug = 0x0
Reading forms/shipper.imc
using optimization '0' (0)
Starting parse...


build_reglist: 9941 symbols
allocate_non_interfering, now: 3167 symbols
sub _MAIN:
registers in .imc: I2891, N0, S912, P6115
0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch
0 used once deleted
0 invariants_moved
registers needed: I3597, N0, S962, P6207
registers in .pasm: I32, N0, S32, P32 - 271 spilled
5679 basic_blocks, 47459 edges
build_reglist: 1342 symbols
allocate_non_interfering, now: 1003 symbols
sub __Internal_Startup:
registers in .imc: I25, N0, S0, P1290
0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch
0 used once deleted
0 invariants_moved
registers needed: I34, N0, S3, P2828
registers in .pasm: I31, N0, S7, P32 - 476 spilled
650 basic_blocks, 663 edges


Past that there are a mass of very small subs that don't spill.

This is interesting, though. I'd not looked at the numbers too
closely, and I've been assuming that the _MAIN sub was the real cause
of the register coloring code going insane, but the internal_startup
code doesn't look too good either there. I think I'll go see what
things look like for the big evil program.

Bill Coffman

unread,
Nov 23, 2004, 11:25:12 AM11/23/04
to Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org
On Tue, 23 Nov 2004 09:27:12 -0500, Dan Sugalski <d...@sidhe.org> wrote:
> The parrot I have, which is a day or two out of date, takes 7m to
> churn through one of my pir files. With this patch, I killed the run
> at 19.5 minutes.

I'd be interested in getting this code for testing. The big one you
sent is hard to get a sense of how it's doing. A smaller one could be
helpful.

This is interesting, since the new one pretty much beat the old one on
my test data. However, I did add more validation since that test.
Adding the -DNDEBUG (defining NDEBUG, not turning it off) turns off
those tests, as Leo pointed out. I've found that lot's of assertions
save me time from chasing bugs in the long run.

Lately, I've been rewriting the spill code. I think that can be
optimized to a much greater extent. As I was reading the cfg code,
and learning how this all worked, I found that the spill code could be
done much better. Been a bit busy lately, so it's been hard to find
time to finish this. Also, unresolved issues make it a little
difficult.

Bill

Dan Sugalski

unread,
Nov 23, 2004, 11:27:44 AM11/23/04
to Bill Coffman, Leopold Toetsch, perl6-i...@perl.org
At 8:25 AM -0800 11/23/04, Bill Coffman wrote:
>On Tue, 23 Nov 2004 09:27:12 -0500, Dan Sugalski <d...@sidhe.org> wrote:
>> The parrot I have, which is a day or two out of date, takes 7m to
>> churn through one of my pir files. With this patch, I killed the run
>> at 19.5 minutes.
>
>I'd be interested in getting this code for testing. The big one you
>sent is hard to get a sense of how it's doing. A smaller one could be
>helpful.

Sure. I'll send it off-list in a minute.

Leopold Toetsch

unread,
Nov 23, 2004, 11:40:37 AM11/23/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:

> I can't. My dev machine's running gcc 2.95.4, and gcc throws lisp
> error messages compiling the switch core if I turn on optimizations.

You could try:
- perl Configure.pl --optimize
- make -s
- wait a bit until first files start compiling
- interrupt it
- put #ifdef 0 / #endif around the big switch
- make -s
- don't run parrot -S ;)

*But*, I've looked again at the new reg_alloc.c code. It seems to have a
piece of code with qubic order in registers, which is for sure killing
all performance advantage it has for a few hundreds of symbols.

So the "scales better to more symbols" has some limits when "more"
reaches 10K ;)

I've sent that to Bill too.

leo

Dan Sugalski

unread,
Nov 23, 2004, 11:55:47 AM11/23/04
to l...@toetsch.at, perl6-i...@perl.org

I'll hold off then. I can't picture anything that -O3 could do that
wouldn't get swamped by a cubic time algorithm.

Bill Coffman

unread,
Nov 23, 2004, 12:17:11 PM11/23/04
to Dan Sugalski, l...@toetsch.at, perl6-i...@perl.org
Wait, I just thought of a huge change.

Dan, Does the patch you have implement Leo's U_NON_VOLATILE patch? If
so, that restricts from 32 to 16 registers, in various cases for
*non-volatile* symbols (did I get that right?). Anyway, the symbols
that cross sub calls can only use 16 registers, where they used 32
before. That could have a huge effect in that more variables are
spilling.

Well, if you don't have that patch, then back to the drawing board.

~Bill

On Tue, 23 Nov 2004 11:55:47 -0500, Dan Sugalski <d...@sidhe.org> wrote:
> At 5:40 PM +0100 11/23/04, Leopold Toetsch wrote:

> >*But*, I've looked again at the new reg_alloc.c code. It seems to have a
> >piece of code with qubic order in registers, which is for sure killing
> >all performance advantage it has for a few hundreds of symbols.
> >
> >So the "scales better to more symbols" has some limits when "more"
> >reaches 10K ;)
>

Dan Sugalski

unread,
Nov 23, 2004, 12:27:20 PM11/23/04
to Bill Coffman, l...@toetsch.at, perl6-i...@perl.org
At 9:17 AM -0800 11/23/04, Bill Coffman wrote:
>Wait, I just thought of a huge change.
>
>Dan, Does the patch you have implement Leo's U_NON_VOLATILE patch?

It was the patch originally attached to this ticket, over a stock
parrot from CVS. If there's something else to try let me know -- I'm
all for it. :)

Dan Sugalski

unread,
Nov 23, 2004, 4:49:38 PM11/23/04
to perl6-i...@perl.org
At 12:27 PM -0500 11/23/04, Dan Sugalski wrote:
>At 9:17 AM -0800 11/23/04, Bill Coffman wrote:
>>Wait, I just thought of a huge change.
>>
>>Dan, Does the patch you have implement Leo's U_NON_VOLATILE patch?
>
>It was the patch originally attached to this ticket, over a stock
>parrot from CVS. If there's something else to try let me know -- I'm
>all for it. :)

I should point out that stock parrot does, amazingly, manage to
compile the Evil Program. And from the -v output for the first sub:

build_reglist: 33064 symbols
allocate_non_interfering, now: 8312 symbols
sub _MAIN:
registers in .imc: I9980, N0, S2895, P20164


0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch
0 used once deleted
0 invariants_moved

registers needed: I9988, N0, S2900, P20290
registers in .pasm: I31, N0, S31, P32 - 37 spilled
14722 basic_blocks, 271989 edges


That's quite a feat. And yes, that's 270K edges. It's no wonder this
thing takes 3G of RAM to build... (I really need to see about
reducing the edge count there)

Leopold Toetsch

unread,
Nov 23, 2004, 10:02:39 AM11/23/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski wrote:
> The parrot I have, which is a day or two out of date, takes 7m to churn
> through one of my pir files. With this patch, I killed the run at 19.5
> minutes.

One more note: be sure to compile Parrot optimized - the new reg_alloc.c

has some very expensive sanity checks in debug mode, or better if NDEBUG
isn't defined.

leo


Leopold Toetsch

unread,
Nov 24, 2004, 12:42:49 PM11/24/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski <d...@sidhe.org> wrote:

> I can't. My dev machine's running gcc 2.95.4, and gcc throws lisp
> error messages compiling the switch core if I turn on optimizations.

I've checked in a hook in ops2c.pl that splits the switched core all 300
ops. If that works, we can provide some config support.
You might also vary the 300 a bit.

leo

0 new messages