Philipp Klaus Krause <
p...@spth.de> writes:
>See "Optimal Register Allocation in Polynomial Time". A graph-coloring
>approach that can handle irregularities well (as long as there are not
>too many registers). SDCC uses such a register allocator for some
>backends, including z80.
Interesting paper. I had trouble following the theoretical part, but
looking at the practical part and taking the part that I understood
about the theory, you try out all possible assignments of all
registers to the maximum clique (live ranges that are alive at the
same time), and because the number of registers is a constant (for a
given architecture), this is a polynomial.
You discuss splitting live ranges at control-flow-graph boundaries,
but it seems to me that in some cases you want to split live ranges
between instructions within a control-flow node; this does not change
the complexity-theoretic result, but of course the implementation.
If only the maximum clique size plays a role, it's as if the
assignments to the cliques are independent; but then you have to
reconcile the assignments for different cliques by introducing reg-reg
move instructions, and take these costs into account, and optimize
them, too; so I don't see that the cliques can be treated as
independent. And you probably don't, because there are additional
factors in the complexity. In any case, I am missing something, and
maybe you have an idea what it is.
I am wondering about one thing in the empirical results in your paper:
Why is the code size not monotonously falling with increased numbers
of assignments? Are these independent runs with different
(pseudo-random) assignments?
I had some questions which were mostly answered by the paper, but
maybe you can offer additional insights:
* Am I right that earlier register allocators were bad for irregular
register sets, and that's why general-purpose registers won once
compilers became dominant? Why did general-purpose registers become
dominant?
The advantage of general-purpose registers is given in that paper as
reducing the complexity of the algorithm by a factor of:
(2*(tree-width(G)+1)*#registers)!
E.g., for tree-width 2 and 9 registers, it's 54! = 2.30843697*10^71
Which poses the question: In your empirical work you stopped at 10^8
assignments (in some cases, less). How did you get provably optimal
assignments on the Z80 with its 9 registers?
* What are the key points why your work can deal with irregular
register sets, and earlier approaches are pretty bad at that?
It seems to me that you use the CPU power available now to try out
many different assignments, while earlier work has balked at that.
* Do you have any idea why no good approach for dealing with irregular
instruction sets has been found in, say, the 1970s and 1980s when
irregular register sets were more common (e.g. on the Z80 and the
8086).
Your approach is an (ideally exhaustive) search that uses more CPU
power (and memory?) than was available then. At the time, one would
have resorted to heuristics, but apparently no general effective
heuristics have been found.