IMCC temp optimizations...

Dan Sugalski

unread,

Apr 21, 2004, 6:50:17 PM4/21/04

to perl6-i...@perl.org

As I sit here and wait for parrot to churn on the output from
compiling a relatively small program, I'm reminded again that imcc's
got some degenerate behaviour when it comes to register coloring and
.locals.

I think it may be a handy thing if someone'd go through and draw up a
set of rules for the use of temps, and things that'll cause the
register coloring algorithm to go mad. (I'd like to avoid 30 minute
compile sessions--it's a tad tedious :)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,

Apr 22, 2004, 1:55:29 AM4/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> I think it may be a handy thing if someone'd go through and draw up a
> set of rules for the use of temps, and things that'll cause the
> register coloring algorithm to go mad. (I'd like to avoid 30 minute
> compile sessions--it's a tad tedious :)

Are you still sticking everything in one big _main?

leo

Leopold Toetsch

unread,

Apr 22, 2004, 7:22:00 AM4/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski wrote:
> ... (I'd like to avoid 30 minute compile

> sessions--it's a tad tedious :)

Should be faster now by some factor.

How many symbols are in the biggest compilation unit (parrot -v
"registers in .imc")?

leo

Dan Sugalski

unread,

Apr 22, 2004, 8:30:37 AM4/22/04

to Leopold Toetsch, perl6-i...@perl.org

At 1:22 PM +0200 4/22/04, Leopold Toetsch wrote:
>Dan Sugalski wrote:
>>... (I'd like to avoid 30 minute compile sessions--it's a tad tedious :)
>
>Should be faster now by some factor.

Cool, thanks. I've an optimized build of parrot going now, and I'll
see what things look like when it's dine.

>How many symbols are in the biggest compilation unit (parrot -v
>"registers in .imc")?

Dunno what parrot thinks--it's not done yet. grep says 569 .locals
and 473 temp PMC registers. ($Px) I think that can reasonably be
considered A Lot. I'm rejigging the compiler to cut down on the
number of .local declarations, but that'll increase the temp pmc
usage, at least with the relatively simple temp system I've got now.
(I can throw dummy labels in to create fake basic blocks if that'll
help the register coloring code)

Dan Sugalski

unread,

Apr 22, 2004, 8:30:56 AM4/22/04

to l...@toetsch.at, perl6-i...@perl.org

Except for the library code, yup. No way around that.

Leopold Toetsch

unread,

Apr 22, 2004, 9:46:42 AM4/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski wrote:
> As I sit here and wait for parrot to churn on the output from compiling
> a relatively small program

I've put in another factor ~2.5 change for a unit with 2000 temps.

leo

Leopold Toetsch

unread,

Apr 22, 2004, 10:03:20 AM4/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> Dunno what parrot thinks--it's not done yet. grep says 569 .locals
> and 473 temp PMC registers.

I've now enabled some more code that speeds up temp allocation more
(~2.5 for 2000 temps in a unit). This needs that the usage range is
limited to 10 lines. If the code from a compiler looks like the output
from the program below, this works fine.

> ... I'm rejigging the compiler to cut down on the

> number of .local declarations, but that'll increase the temp pmc
> usage, at least with the relatively simple temp system I've got now.

Temps are fine. ".local"s ranging from top of the program (not counting
the declaration) down do hurt. Many small short ranged temps are much
better then long ranged vars because they have more interferences on
each other.

> (I can throw dummy labels in to create fake basic blocks if that'll
> help the register coloring code)

That makes it worse. More blocks, more life ranges to compare.

If you can emit code similar to "gen.pl" it could take advantage of my
last change.

leo

$ cat gen.pl
#!/usr/bin/perl
use strict;

# a = 1 + 1 + 1 ... + 1

my ($i, $t, $N);
$N = $ARGV[0] || 299;
print ".sub _main \@MAIN\n";
$t++;
print "\t\$P$t = new PerlInt\n";
print "\t\$P$t = 1\n";
for $i (0..$N) {
$t++;
print "\t\$P$t = new PerlInt\n";
print "\t\$P$t = 1\n";
$t++;
print "\t\$P$t = new PerlInt\n";
print "\t\$P$t = \$P@{[$t-1]} + \$P@{[$t-2]}\n";
}
print "\tprint \$P$t\n";
print "\tprint \"\\n\"\n";
print ".end\n";

Angel Faus

unread,

Apr 22, 2004, 10:45:30 AM4/22/04

to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org

A good register allocation algorithm will always have problems with
big subs, there is nothing that we can do about it.

I think that what "real compilers" do to solve this problem is
implement two different algorithms: one for normal subs which tries
to generate optimal code, and a naive one for very large subs with
many virtual registers.

That makes compilation much faster, and the execution penalty doesn't
hurt too much.

Actually, it's (for me) an open question whether the "good" register
allocation should be the default one. Perl (and python and..) users
expect blazing compilation times, so maybe we should reserve it for
higher -O levels.

But then, we won't know how bad are our compilation times until there
are "real" programs written in perl6/parrot.

-angel

Dan Sugalski

unread,

Apr 22, 2004, 11:08:20 AM4/22/04

to l...@toetsch.at, perl6-i...@perl.org

At 4:03 PM +0200 4/22/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>> Dunno what parrot thinks--it's not done yet. grep says 569 .locals
>> and 473 temp PMC registers.
>
>I've now enabled some more code that speeds up temp allocation more
>(~2.5 for 2000 temps in a unit). This needs that the usage range is
>limited to 10 lines. If the code from a compiler looks like the output
>from the program below, this works fine.

This sped it up a lot. The output is:

Starting parse...
sub _MAIN:
registers in .imc: I34, N0, S7, P1014
0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch
0 used once deleted
0 invariants_moved
registers needed: I43, N0, S12, P3327
registers in .pasm: I25, N0, S20, P32 - 464 spilled
2007 basic_blocks, 2079 edges
13691 lines compiled.
Writing sample.pbc
packed code 348208 bytes
sample.pbc written.

Which actually came out in reasonable time, rather than me giving up
after 45 minutes. :) Still takes ages, so I've a lot of work to do
here.

> > ... I'm rejigging the compiler to cut down on the
>> number of .local declarations, but that'll increase the temp pmc
>> usage, at least with the relatively simple temp system I've got now.
>
>Temps are fine. ".local"s ranging from top of the program (not counting
>the declaration) down do hurt. Many small short ranged temps are much
>better then long ranged vars because they have more interferences on
>each other.

Hrm. Does the code currently consider something like:

$P0 = foo

to start a new lifetime for $P0?

Leopold Toetsch

unread,

Apr 22, 2004, 12:04:56 PM4/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> registers needed: I43, N0, S12, P3327
> registers in .pasm: I25, N0, S20, P32 - 464 spilled
> 2007 basic_blocks, 2079 edges

Ouch. Register allocation is spending huge times during spilling.
Something is definetely wrong with your code - wrong in the sense of: it
doesn't play with, what imcc expects ;)

You must have too much pseudo-globals in that unit, spanning a huge
range and interfering with one another. Do you use lexical vars or
Parrot globals?

> Hrm. Does the code currently consider something like:

> $P0 = foo

> to start a new lifetime for $P0?

If you don't use $P0 beyond that point yes. Do you name all
temps $P0 or some such? Or are you giving them unique names. You should
do the latter.

leo

Dan Sugalski

unread,

Apr 22, 2004, 12:16:25 PM4/22/04

to l...@toetsch.at, perl6-i...@perl.org

At 6:04 PM +0200 4/22/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> registers needed: I43, N0, S12, P3327
>> registers in .pasm: I25, N0, S20, P32 - 464 spilled
>> 2007 basic_blocks, 2079 edges
>
>Ouch. Register allocation is spending huge times during spilling.
>Something is definetely wrong with your code - wrong in the sense of: it
>doesn't play with, what imcc expects ;)

Yep, the 45 minute compile times were the hint that maybe imcc and I
were fighting a bit. :)

>You must have too much pseudo-globals in that unit, spanning a huge
>range and interfering with one another. Do you use lexical vars or
>Parrot globals?

I was using .locals for the actual variables in the source program,
and $Px for all the temps the compiler generated. I've been migrating
a lot of the code to use a few .local-defined hashes and indexing
into them, and it looks like that's the way to go. (This'd be easier
if this language had, y'know, actual subroutines and stuff...)

> > Hrm. Does the code currently consider something like:
>
>> $P0 = foo
>
>> to start a new lifetime for $P0?
>
>If you don't use $P0 beyond that point yes. Do you name all
>temps $P0 or some such? Or are you giving them unique names. You should
>do the latter.

I've a lot of 'constant' temps named $P0. I'll go fix that and see where we go.

Leopold Toetsch

unread,

Apr 22, 2004, 1:35:18 PM4/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> I was using .locals for the actual variables in the source program,

Well, you know it: .locals aren't vars.

> and $Px for all the temps the compiler generated. I've been migrating
> a lot of the code to use a few .local-defined hashes and indexing
> into them, and it looks like that's the way to go.

If you use a hash anyway, just use globals - this seems to match the
target language's POV:

$Px = global "foo" # fetch
global "foo" = $Py # store

If spilling is still needed, the spill code can be improved, if the
variable already has a store (like a lexical or a global). Needs only
cutting down the life range, and a refetch in the other locations of
usage. This saves the store and simplifies spilling.

But if you treat all your variables like that, there will be no spilling
at all.

> ... (This'd be easier

> if this language had, y'know, actual subroutines and stuff...)

Yep.

leo