Predict register usage

Tim Frink

unread,

Apr 25, 2008, 9:56:25 AM4/25/08

to

Hi,

When the optimization inline extension is performed on a high-level
intermediate representation or on a low-level IR before register
allocation, the final result for the inlined code after the register
allocation is not known. Thus, it might happen that due to the
increased register pressure after inlining, spill code must be added
heavily leading to a worse performance than before the
optimization. Do you know of any approaches which try to predict the
resulting spill code added by a register allocation when inlining is
done for a particular function?

My idea was that lifetime analyses might be used for that purpose. One
could find out how many variables (in high-level) or register are life
across a call. This is an indication for spill code candidates. Based
on this one could develop some heuristics that omit inlining for
functions that are supposed to add too much spilling. What do you
think about this idea?

Are there in general any approaches that try to predict the register
usage/spill code generation in the generated assembly code exclusively
based on the consideration of the source code (or high-level code) and
the knowledge of the underlying hardware the compiler produces code
for? This is probably very hard since it requires a deep knowledge of
the register allocator.

Regards,
Tim

Chris F Clark

unread,

Apr 25, 2008, 11:36:26 PM4/25/08

to

Tim Frink <plf...@yahoo.de> writes:

> When the optimization inline extension is performed on a high-level
> intermediate representation or on a low-level IR before register
> allocation, the final result for the inlined code after the register
> allocation is not known. Thus, it might happen that due to the
> increased register pressure after inlining, spill code must be added
> heavily leading to a worse performance than before the
> optimization. Do you know of any approaches which try to predict the
> resulting spill code added by a register allocation when inlining is
> done for a particular function?

> My idea was that lifetime analyses might be used for that
> purpose. One could find out how many variables (in high-level) or
> register are life across a call. This is an indication for spill
> code candidates. Based on this one could develop some heuristics
> that omit inlining for functions that are supposed to add too much
> spilling. What do you think about this idea?

While this seems like it would be useful, to understand the tradeoff
you have to consider what happens if you don't inline the function and
in fact call the function.

If the variable is held in a caller-save register, then the caller
must spill the register before the call, thus you haven't saved
executing any spill code. Similarly, if the variable is held in a
callee-save register and the possibly-to-be-inlined function uses the
register, the function needs to save the register even if called and
no spill code has been saved.

The one calling (register based) convention that I'm aware of where
you might not need spill code by doing a call, is if the architecture
supports a rotating register window. In that case, the hardware
itself will take care of the spills.

Now, it wouldn't surprise me to learn of other calling conventions
where a call doesn't require spills, but caller and callee-save
registers are the most popular, and in both of those cases you won't
save spills by not inlining the call. In fact, when I was doing code
generator and optimizer work, it was often worth inlining the call
even if the only improvement was the elimination of superfluous spills
that the calling convention required.

Hope this helps,
-Chris

******************************************************************************
Chris Clark Internet: christoph...@compiler-resources.com
Compiler Resources, Inc. or: com...@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)

Inderaj Bains

unread,

Apr 26, 2008, 1:01:47 PM4/26/08

to

Function call has overheads including calling convention spill/reload
or maybe you do global register allocation. Finally the code is
executed at the same place, right? call or no call there is no magic,
you need the resources. Though inlining reduces call overhead,
subsequent code motion can increase register usage. Maybe by not
inlining you are limiting the ability of downstream optimizations to
mess up and inlining is not the issue. So what you can do here might
be driven less by the particular goodness of a particular approach and
more by what your downstream optimizations do, including how good
allocation is. For a cheap solution, you could try putting code motion
barriers around your inlined function when you see high register
pressure.

For a general solution, to me it seems that my ability to consider all
inter-optimization tradeoffs, architectural tradeoffs, fiddle
endlessly with switches to produce good results is limited. Using
machine learning or autotuners and letting machine do a good part of
this work might be the long term solution in coming time.

Inderaj

Sid Touati

unread,

Apr 28, 2008, 6:11:26 AM4/28/08

to

Before instruction scheduling, predicting the register usage is
difficult. If you target a computer engineering purpose, you can
imagine plenty of heuristics and techniques; in general, no guarantee
is required for engineering solutions, practical efficiency on a set
of benchmarks would be sufficient.

If you target a computer science purpose, you would require to think
about some formal guarantees. There are some fundamental answers to
your question. For instance, what you can predict is a interval for
register usage: say an exact maximal register usage, or an exact
minimal register usage.

For the exact minimal register usage, it is commonly called register
sufficiency. It has been proved that this minimal register usage depends
on the target architecture, that is, this minimal register usage is NOT
a reachable minimal bound for any instruction schedule.

For the exact maximal register usage, it is called register
saturation. Contrary to the register sufficiency, the register
saturation does not depend on the target architecture. If the
instruction selection pass has been done, it is proved that the
register saturation IS a reachable upper bound for register usage for
any target architecture and for any instruction schedule. You can then
use register saturation to have a safe estimation of your maximum
register usage, independently on the final instruction schedule. And
this maximum register usage is always reachable (not over-estimated).

For more information:
Register Saturation in Instruction Level Parallelism. International
Journal of Parallel Programming, Springer-Verlag, Volume 33, Issue 4,
August 2005. 57 pages.

andreyb...@gmail.com

unread,

May 2, 2008, 3:33:54 AM5/2/08

to

My practical experience suggests that if you have a proper optimizing
compiler with lots of phases, you *can't* predict effects of inlining
on register allocation. There are too many phases between -- many of
them able to radically impact register allocation. You can't predict
all these phases.

I saw cases when a single additional strength reduction blew up
register allocation.

Andrey

andreyb...@gmail.com

unread,

May 2, 2008, 3:35:46 AM5/2/08

to

On Apr 26, 7:36 am, Chris F Clark <c...@shell01.TheWorld.com> wrote:
> The one calling (register based) convention that I'm aware of where
> you might not need spill code by doing a call, is if the architecture
> supports a rotating register window. In that case, the hardware
> itself will take care of the spills.

And even then spills are not for free. Sure, they are more effective
if done en masse, but still very costly.

Andrey

Björn Franke

unread,

May 9, 2008, 10:33:49 AM5/9/08

to

> Do you know of any approaches which try to predict the resulting
> spill code added by a register allocation when inlining is done for
> a particular function?

Have a look at these two papers, they may contain some useful
information for you:

Automatic Tuning of Inlining Heuristics for Java JIT Compilation.
John Cavazos and Michael F. P. O'Boyle.
CPC 2006

Automatic Tuning of Inlining Heuristics.
John Cavazos and Michael F. P. O'Boyle.
Supercomputing 2005

Found at: http://www.cis.udel.edu/~cavazos/index.php?page=publications

Bjoern