At a high-level, I would like the PGO harness to contain the following modules:
Profile generators
These modules represent sources of profile. Mostly, they work by instrumenting the user program to make it produce profile information. However, other sources of profile information (e.g., samples, hardware counters, static predictors) would be supported.
Profile Analysis Oracles
Profile information is loaded into the compiler and translated into analysis data which the optimizers can use. These oracles become the one and only source of profile information used by transformations. Direct access to the raw profile data generated externally is not allowed.
Translation from profile information into analysis can be done by adding IR metadata or altering compiler internal data structures directly. I prefer IR metadata because it simplifies debugging, unit testing and bug reproduction.
Analyses should be narrow in the specific type of information they provide (e.g., branch probability) and there should not be two different analyses that provide overlapping information. We could later provide broader analyses types by aggregating the existing ones.
Transformations
Transformations should naturally take advantage of profile information by consulting the analyses. The better information they get from the analysis oracles, the better their decisions.
My plan is to start by making sure that the infrastructure exists and provides the basic analyses.
I have two primary goals in this first phase:
Augment the PGO infrastructure where required.
Fix existing transformations that are not taking advantage of profile data.
In evaluating and triaging the existing infrastructure, I will use test cases taken from GCC’s own testsuite, a collection of Google’s internal applications and any other code base folks consider useful.
In using GCC’s testsuite, my goal is not to mimic how GCC does its work, but make sure that the two compilers implement functionally equivalent transformations. That is, make sure that LLVM is not leaving optimization opportunities behind.
This may require implementing missing profile functionality. From a brief inspection of the code, most of the major ones seem to be there (edge, path, block). But I don’t know what state they are in.
Some of the properties I would like to maintain or add to the current framework:
Profile data is never accessed directly by analyses and transformations. Rather, it is translated into IR metadata.
Graceful degradation in the presence of stale profiles. Old profile data should only result in degraded optimization opportunities. It should neither confuse the compiler nor cause erroneous code generation.
After the basic profile-based transformations are working, I would like to add new sources of profile. Mainly, I am thinking of implementing Auto FDO. FDO stands for Feedback Directed Optimization (both PGO and FDO tend to be used interchangeably in the GCC community). In this scheme, the compiler does not instrument the code. Rather, it uses an external sample collection tool (e.g., perf) to collect samples from the program’s execution. These samples are then converted to the format that the instrumented program would’ve emitted.
In
terms of optimizations, our (Google) experience is that
inlining is the key beneficiary of profile information.
Particularly, in big C++ applications. I expect to focus most
of my attention on the inliner.
I have started looking at the state of PGO (Profile Guided Optimization) in LLVM. I want to discuss my high-level plan and make sure I'm not missing anything interesting out. I appreciate any feedback on this, pointers to existing work, patches and anything related to PGO in LLVM.
I will be keeping changes to this plan in this web document
https://docs.google.com/document/d/1b2XFuOkR2K-Oao4u5fR3a9Ok83IB_W4EJWVmNak4GRE/pub
It doesn't use MachineBlockFrequency?
It predates the block frequency interface. It just needs to be hooked up, patches welcome. It would also be nice to remove the floating point computations from the spill placement code.
After the basic profile-based transformations are working, I would like to add new sources of profile. Mainly, I am thinking of implementing Auto FDO.
_______________________________________________
Apple folks are also gearing up to push on the PGO front. We are primarily interested in using instrumentation, rather than sampling, to collect profile info. However, I suspect the way profile ended up being used in the various optimization and codegen passes would be largely similar.
There is also some interests in pursuing profile directed specialization. But that can wait. I think it makes sense for us to get together and discuss our plans to make sure there won't be duplication of efforts.
>> This computation can overflow.Yes, that should be fine.
>
> Yep, I went down the easy route and converted it to floating point arithmetic. Is that OK here?