[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

vivek pandya via llvm-dev

unread,

May 28, 2016, 10:01:24 AM5/28/16

to Hal Finkel, llvm-dev, Tim Amini Golling

Dear community,

This is to brief you the progress of Interprocedural Register Allocation, for those who are interested to see the progress in terms of code please consider http://reviews.llvm.org/D20769

This patch contains simple infrastructure to propagate register usage information of callee to caller in call graph. The code generation order is changed to follow bottom up order on call graph , Thanks to Mehdi Amini for the patch for the same ! I will write a blog on this very soon.

So during this week as per the schedule proposed it should be study related infrastructure in LLVM and finalizing an approach for IPRA, but instead I am able to implement a working (may not be fully correct) prototype because I have used community bonding period to discuss and learn related stuffs from the mentors and also due to patch for CodeGen reordering was provided by dear mentor Mehdi Amini.

So I conclude the work done during this week as follows:

Implementation :

============

Following passes have been implemented during this week: An immutable pass to store competed RegMask, a machine function pass that iterates through each registers and check if it is used or not and based on that details create a RegMask and a target specific machine function pass that uses the RegMask created by second pass and propagates information by updating call instructions RegMask. To update the RegMask of MI , setRegMask() function has been added to MachineOperand, a command line option -enable-ipra and debug type -debug-only=“ipra" has been added to control the optimization through llc.

Testing:

=====

The above mentioned implementation has been tested over SNU-Real-Time benchmark suit (http://www.cprover.org/goto-cc/examples/snu.html) and some simple programs that uses library function ( for a library function register allocation is not done by LLVM so this optimization will simply skip them)

Study and Other:

=============

I have learned following things in LLVM, how it stores reg clobbering information? how it is used by Reg allocators through LivePhysRegs, LiveRegMatrix and other related passes? How to schedule a pass using TargetPassConfig and TargetMachine? What are called callee saved registers? What is an Immutable Pass?  Apart from that I have also learned how to use phabricator to send review request.  I have also read some related literatures.

During this week though task was to schedule the passes in proper order so that dependencies of related passes are satisfied.

Plan for next week:

1) Perform more testing and debug any known issue

2) Fine ture the implementation so as to eliminate any unnecessary work

3) During the testing from the stats I have observed that IPRA does not always improve the work of IntraProcedural register allocators and it is also observer that the amount of benefit (in terms of spilled live ranges ) is not deterministic. So I would like to find reasons for this behavior.

4) Start implementing target specific pass for other targets if review passes properly with no major bugs.

Please provide any feedback/suggestion including for format of this email.

I would also like to thanks my mentors Mehdi Amini , Hal Finkel, Quentin Colombet, Matthias Braun and other community members for providing quick help every time when I asked ( I have got replies even after 8 PM ( PDT) ! ) .

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jun 4, 2016, 11:18:55 PM6/4/16

to Hal Finkel, llvm-dev, Tim Amini Golling

Dear Community,

This week I got my patch reviewed by mentors and based on that I have done changes. Also we have identified a problem with callee-saved registers being marked as clobbered registers so we fixed that bug. I have described other minor changes in following section.

It was expected to get the patch committed by end of this week but due to unexpected mistake I was not able to complete writing test cases. Sorry for that.

I had build llvm with ipra enable by default and that build files were on my path ! Due to that next time I tried to build llvm it was terribly slow (almost 1 hour for 10% build ). I spend to much time on fixing this by playing around with environment variables, cmake options etc.

But I think this is a serious concern, we need to think verify this time complexity other wise building a large software with IPRA enable would be very time consuming.

The toughest part for this week was to get lit and FileCheck work as you expect them to work, specially when analysis pass prints info on stdio and there is also a output file generated by llc or opt command.

So here is brief summary :

Implementation:

============

RegUsageInfoCollector is now Calling Convention aware so that RegMask does not mark callee saved register as clobbered register. Due to this register allocator can use callee saved register for caller.
PhysicalRegisterUsageInfo.cpp renamed to RegisterUsageInfo.cpp.
StringMap used in RegisterUsageInfo.cpp is replaced by DenseMap of GlobalVariable * to RegMask.
DummyCGSCCPass moved from TargetPassConfig.cpp to CallGraphSCCPass.h.
Minor correction in comments, changes to adhere coding standards of LLVM.

Testing:

=====

The above mentioned changes has been tested with SNU-Realtime benchmarks.

Studied lit and FileCheck tool and written simple test to verify functionality of coding.

Study and other:

============

Studied some examples of lit compatible llvm IR with comments to RUN test cases, FileCheck tool syntax and how to use it with in lit infrastructure.

I also understand X86 calling convention in more details.

I also studied basic concepts in llvm IR language while reading .ll files written for lit.

I learned about rvalue references and move semantics introduced in C++11.

Plan for next week:

1) Get the patch committed along with proper tets cases.

2) Analyse time complexity of the approach.

3) Make target specific pass to CodeGen as it seems it is not required to be target specific.

4) If possible build a large application with ipra enable and analyze the impact.

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jun 12, 2016, 12:49:53 AM6/12/16

to Hal Finkel, llvm-dev, Tim Amini Golling

Dear Community,

The patch for Interprocedural Register Allocation has been committed now , thanks to Mehdi Amini for that. We would like you to play with it and let us know your views and more importantly ideas to improve it.

The test-suite run has indicated some non trivial issue that results in run time failure of the programs, we will be investigating it more. Here are some stats :

test-suite has been tested with IPRA enabled and overall results are not much encouraging. On average 30% increase in compile time. Many programs have also increase in execution time ( average 20%) that is really serious concern for us. About 60 tests have failed on run time this indicates error in compilation. how ever 3 tests have improvement in their runtime and that is 7% average.

This week I think good thing for me to learn is to setup llvm development environment properly other wise one can end up wasting too much time building the llvm it self.

So here is brief summary:

Implementation:

============

The patch has been split into analysis and transformation passes. The pass responsible for register usage propagation has been made target independent. A print method and command line option -print-regusage has been added so that RegMaks details can be printed in Release builds also, this enables lit test case to be testable in Release build too. Other minor changes to adhere coding and naming conventions.

Testing:

======

test-suite has been tested with IPRA enabled.

Study and other:

=============

Learned about LNT, test-suite for LLVM, Inline assembly in LLVM IR, fastcc, local functions, MCStream class. In C++ I leaned about emplace family of methods in STL and perfect forwarding introduced in C++11.

Plan for next week:

1) Investigate issue related to functional correctness that leads to run time failures

2) profile the compilation process to verify increase in time due to IPRA

3) Improve IPRA by instructing codegen to not save register for local function.

4) Make the pass emit asm comments to indicate register clobbered by function call at call site in generated ASM file.

Sincerely,

Vivek

Quentin Colombet via llvm-dev

unread,

Jun 14, 2016, 8:47:03 PM6/14/16

to vivek pandya, llvm-dev

Hi Vivek,

How much of the slow down on runtime comes from the different layout of the function in the asm file? (I.e., because of the dummy scc pass.)

Cheers,

Q

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

vivek pandya via llvm-dev

unread,

Jun 14, 2016, 11:10:35 PM6/14/16

to Quentin Colombet, llvm-dev, Matthias Braun

On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcol...@apple.com> wrote:

Hi Vivek,

How much of the slow down on runtime comes from the different layout of the function in the asm file? (I.e., because of the dummy scc pass.)

Hello Quentin,

Please do not consider previous results as there was a major bug in RegMask calculation due to not considering RegMasks of callee in MF body while calculating register usage information, that has been fixed now ( as discussed with Matthias Braun and Mehdi Amini ) and after this bugfix I have run test-suite with and without IPRA. Yes there is runtime slow down for some test cases ranging from 1% to 64% similarly compile time slow down is ranging from 1% to 48%. The runtime performance improvement is ranging from 1% to 35% and surprisingly there is also compile time improvement in a range from 1% to 60% . I would request you to go through complete results at https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing

Also there is not extra failure due to IPRA now so in the result above I have removed failures.

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jun 14, 2016, 11:15:50 PM6/14/16

to Quentin Colombet, llvm-dev, Matthias Braun

On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekv...@gmail.com> wrote:

On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcol...@apple.com> wrote:
Hi Vivek,

How much of the slow down on runtime comes from the different layout of the function in the asm file? (I.e., because of the dummy scc pass.)

Hello Quentin,

Please do not consider previous results as there was a major bug in RegMask calculation due to not considering RegMasks of callee in MF body while calculating register usage information, that has been fixed now ( as discussed with Matthias Braun and Mehdi Amini ) and after this bugfix I have run test-suite with and without IPRA. Yes there is runtime slow down for some test cases ranging from 1% to 64% similarly compile time slow down is ranging from 1% to 48%. The runtime performance improvement is ranging from 1% to 35% and surprisingly there is also compile time improvement in a range from 1% to 60% . I would request you to go through complete results at https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing

In above result baseline is IPRA and current is without IPRA. So actually data with background red is actual improvement and green is regression.

-Vivek

vivek pandya via llvm-dev

unread,

Jun 19, 2016, 7:29:37 AM6/19/16

to Quentin Colombet, llvm-dev, Matthias Braun

Dear Community,

Please find summary of work done during this week as follow:

Implementation:

============

During this week we have identified a bug in IPRA due to not considering RegMask of function calls in given machine function. The same bug on AArch64 has been reported by Chad Rosier and more detailed description can be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this bug RegMask calculation have been modified to consider RegMask of function call in a Machine Function. The patch is here http://reviews.llvm.org/D21395.

AsmPrinter.cpp is modified to print call preserved registers in comments at call site in generated assembly file. This suggestion was by Quentin Colombet to improve readability of asm files while experimenting RegMask and calling convention etc. This simple patch can be found here http://reviews.llvm.org/D21490.

We have also experimented a simple improvement to IPRA by setting callee saved registers to none for local function and we have found some performance improvement.

Testing:

======

After bug 28144 fix there is no runtime failures in test suite. Also due to bug 28144 there was about 60 run time failures and total time taken for test suite compilation was 30% more compare to with out IPRA. After bug fix with IPRA total compile time improvement compare to without IPRA is about 4 to 8 minutes.

Study:

=====

This week I study code responsible for adding spill and restore for callee saved registers. Also studied how calling convention is defined in target specific .td files. I studied AsmPrinter.cpp and specifically emitComments() method which is responsible for adding comments in llvm generated assembly files. I also studied about some linkage type in LLMV IR like ‘internal’ which represent local function in module.

Plan for next week:

1) Submit patch related to local function optimization for review

2) Find more possible improvements

3) Get active patches committed

4) Compile large software with IPRA enabled

Sincerely,

Vivek

Adve, Vikram Sadanand via llvm-dev

unread,

Jun 19, 2016, 1:26:55 PM6/19/16

to llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org

Hi Vivek,

I have one question (and I apologize if I missed this in your previous messages): Do you handle, or do you expect to handle, indirect function calls? If so, how exactly are you going about doing that?

For context, I’m interested because I’ve been working a set of passes to do profile-based devirtualization and IPO, and I’m wondering if this pass could benefit from that. Thanks,

-—Vikram

// Vikram S. Adve
// Professor, Department of Computer Science
// University of Illinois at Urbana-Champaign
// va...@illinois.edu
// http://llvm.org

> On Jun 19, 2016, at 7:46 AM, via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Date: Sun, 19 Jun 2016 16:59:27 +0530
> From: vivek pandya via llvm-dev <llvm...@lists.llvm.org>
> To: Quentin Colombet <qcol...@apple.com>
> Cc: llvm-dev <llvm...@lists.llvm.org>, Matthias Braun
> <ma...@braunis.de>
> Subject: Re: [llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural
> Register Allocation
> Message-ID:
> <CAHYgpoL+gZmmyTzujXhfcM6i...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"

> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160619/178770c2/attachment-0001.html>

vivek pandya via llvm-dev

unread,

Jun 19, 2016, 3:24:14 PM6/19/16

to va...@illinois.edu, llvm-dev, Matthias Braun

Dear Professor,

Thanks to bring this to notice, I tried out a simple test case with indirect function call:

int foo() {

return 12;

}

int bar(int a) {

return foo() + a;

}

int (*fp)() = 0;

int (*fp1)(int) = 0;

int main() {

fp = foo;

fp();

fp1 = bar;

fp1(15);

return 0;

}

and currently IPRA skips optimization for indirect call. But I think this can be handled and I will inform you if I update implementation to cover this. Currently IPRA uses Function * to hold register usage information across the passes, so my hunch is that if from the call instruction for the indirect call, Function * can be derived then it should be able to handle indirection function calls for procedures defined in a current module.

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jun 20, 2016, 2:59:06 PM6/20/16

to Hal Finkel, Tim Amini Golling, llvm-dev, Matthias Braun, Vikram Adve

On Mon, Jun 20, 2016 at 12:54 AM, vivek pandya <vivekv...@gmail.com> wrote:

Dear Professor,

Thanks to bring this to notice, I tried out a simple test case with indirect function call:

int foo() {
return 12;
}

int bar(int a) {
return foo() + a;
}

int (*fp)() = 0;
int (*fp1)(int) = 0;

int main() {
fp = foo;
fp();
fp1 = bar;
fp1(15);
return 0;
}

I have experimented with indirect call specially which are due to use of function pointers as shown in above example:

Following code in RegUsageInfoPropagate.cpp handles this kind of indirect calls :

for (MachineBasicBlock &MBB : MF) {

for (MachineInstr &MI : MBB) {

if (!MI.isCall())

continue;

DEBUG(dbgs()

<< "Call Instruction Before Register Usage Info Propagation : \n");

DEBUG(dbgs() << MI << "\n");

auto UpdateRegMask = [&](const Function *F) {

const auto *RegMask = PRUI->getRegUsageInfo(F);

if (!RegMask)

return;

setRegMask(MI, &(*RegMask)[0]);

Changed = true;

};

MachineOperand &Operand = MI.getOperand(0);

if (Operand.isGlobal())

UpdateRegMask(cast<Function>(Operand.getGlobal()));

else if (Operand.isSymbol())

UpdateRegMask(M->getFunction(Operand.getSymbolName()));

else if(Operand.isReg()){

// changes starts here

unsigned VReg = Operand.getReg();

MachineBasicBlock::iterator CallInstIterator(&MI);

MachineBasicBlock *MBB = MI.getParent();

while(CallInstIterator != MBB->begin() && !CallInstIterator->definesRegister(VReg))

--CallInstIterator;

DEBUG(dbgs() << "Candidate for indirect call \n");

if (CallInstIterator != MBB->begin()) {

for (MachineOperand &MO : (*CallInstIterator).operands()) {

if (MO.isGlobal()){

UpdateRegMask(cast<Function>(MO.getGlobal()));

break;

}

else if (Operand.isSymbol()) {

UpdateRegMask(M->getFunction(MO.getSymbolName()));

break;

}

DEBUG(dbgs() << *CallInstIterator);

}

DEBUG(dbgs()

<< "Call Instruction After Register Usage Info Propagation : \n");

DEBUG(dbgs() << MI << "\n");

}

So I would like to have mentors' review/suggestions on this

For virtual function kind of case we have to think differently, Is this a valid approach to deal with indirect calls ?

Please let me know your thoughts.

-Vivek

Sanjoy Das via llvm-dev

unread,

Jun 20, 2016, 3:52:59 PM6/20/16

to vivek pandya, llvm-dev, Matthias Braun, Vikram Adve

Hi Vivek,

vivek pandya via llvm-dev wrote:
> int foo() {
> return 12;
> }
>
> int bar(int a) {
> return foo() + a;
> }
>
> int (*fp)() = 0;
> int (*fp1)(int) = 0;
>
> int main() {
> fp = foo;
> fp();
> fp1 = bar;
> fp1(15);
> return 0;
> }

IMO it is waste of time trying to do a better job at the IPRA level on
IR like the above ^. LLVM should be folding the indirect calls to
direct calls at the IR level, and if it isn't that's a bug in the IR
level optimizer.

The interesting cases are when you have a call like:

fnptr target = object->callback;
target(foo, bar);

and the IR level optimizer has failed to optimize the indirect call to
`target` to a direct call.

-- Sanjoy

Matthias Braun via llvm-dev

unread,

Jun 20, 2016, 4:15:46 PM6/20/16

to Sanjoy Das, llvm-dev, Vikram Adve, vivek pandya

> On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi Vivek,
>
> vivek pandya via llvm-dev wrote:
> > int foo() {
> > return 12;
> > }
> >
> > int bar(int a) {
> > return foo() + a;
> > }
> >
> > int (*fp)() = 0;
> > int (*fp1)(int) = 0;
> >
> > int main() {
> > fp = foo;
> > fp();
> > fp1 = bar;
> > fp1(15);
> > return 0;
> > }
>
> IMO it is waste of time trying to do a better job at the IPRA level on
> IR like the above ^. LLVM should be folding the indirect calls to
> direct calls at the IR level, and if it isn't that's a bug in the IR
> level optimizer.

+1 from me.

The interesting cases are the non-obvious ones (assumeing foo/bar have the same parameters). Things gets interesting once you have uncertainty in the mix. The minimally interesting case would look like this:

int main() {
int (*fp)();
if (rand()) {
fp = foo;
} else {
fp = bar;
}
fp(42);
}

However predicting the possible targets of a call is IMO a question of computing a call graph datastructure and improving upon that. We should be sure that we discuss and implement this independently of the register allocation work!

- Matthias

vivek pandya via llvm-dev

unread,

Jun 20, 2016, 11:56:13 PM6/20/16

to Matthias Braun, llvm-dev, Vikram Adve

On Tue, Jun 21, 2016 at 1:45 AM, Matthias Braun <ma...@braunis.de> wrote:

> On Jun 20, 2016, at 12:53 PM, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi Vivek,
>
> vivek pandya via llvm-dev wrote:
> > int foo() {
> > return 12;
> > }
> >
> > int bar(int a) {
> > return foo() + a;
> > }
> >
> > int (*fp)() = 0;
> > int (*fp1)(int) = 0;
> >
> > int main() {
> > fp = foo;
> > fp();
> > fp1 = bar;
> > fp1(15);
> > return 0;
> > }
>
> IMO it is waste of time trying to do a better job at the IPRA level on
> IR like the above ^. LLVM should be folding the indirect calls to
> direct calls at the IR level, and if it isn't that's a bug in the IR
> level optimizer.
+1 from me.

Yes at -O3 level simple indirect calls including virtual functions are getting optimized to direct call.

The interesting cases are the non-obvious ones (assumeing foo/bar have the same parameters). Things gets interesting once you have uncertainty in the mix. The minimally interesting case would look like this:

int main() {
int (*fp)();
if (rand()) {
fp = foo;
} else {
fp = bar;
}
fp(42);
}

I tried this case and my simple hack fails to optimize it :-) . This requires discussion on IRC.

Sincerely,

-Vivek

vivek pandya via llvm-dev

unread,

Jun 26, 2016, 7:48:40 AM6/26/16

to llvm-dev, Hal Finkel, Tim Amini Golling

Hello LLVM Developers,

Please follow summary of work done during this week.

Implementation:

============

During this week patch for bug fix 28144 is updated after finding more refinement in remarks calculation. As per suggestion from Matthias Braun and Hal Finkel regmask calculation code is same as MachineRegisterInfo::isPhysRegModified() except no check of isNoReturnDef() is required. So we proposed to add a bool argument SkipNoReturnDef with default value false to isPhysRegModified method so that with out breaking current use of isPhysRegModified we can reuse that code for the purpose of IPRA. The patch can be found here : http://reviews.llvm.org/D21395

With IPRA to improve code quality, call site with local functions are forced to have caller saved registers ( more improved heuristics will be implemented ) I have been experimenting this on my local machine and I discovered that tail call optimization is getting affected due to this optimization and some test case in test-suite fails with segmentation fault or infinite recursion due to counter value gets overwritten. Please find more details and example bug at https://groups.google.com/d/msg/llvm-dev/TSoYxeMMzxM/rb9e_M2iEwAJ

I have also tried a very simple method to handle indirect function in IPRA but at higher optimization level, indirect function calls are getting converted to direct function calls so I request interested community member to guide me. We can have discussion about this on Monday morning (PDT). More discussion on this can be found at here : https://groups.google.com/d/msg/llvm-dev/dPk3lKwH1kU/GNfhD_jKEQAJ

Testing:

======

During this week I think that IPRA optimization is more stabilized after having bug fix so have run test-suite with that and also as per suggestion form Quentin Colombet I tested test-suite with only codegen order changed to bottom up on call graph. Overall this codegen order improves runtime and compile time. I have shared results here:

https://docs.google.com/document/d/1At3QqEWmeDEXnDVz-CGh2GDlYQR3VRz3ipIfcXoLC3c/edit?usp=sharing

https://docs.google.com/document/d/1hS-Cj3mEDqUCTKTYaJpoJpVOBk5E2wHK9XSGLowNPeM/edit?usp=sharing

Plan for next week:

==============

1) Rebase pending patches and get the review process completed.

2) Solve tail call related bug.

3) Discuss some ideas and heuristics for improving IPRA.

4) Discuss how to handle indirect function call with in IPRA.

5) More testing with llvm test-suite

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jul 3, 2016, 8:13:17 AM7/3/16

to llvm-dev, Hal Finkel, Tim Amini Golling

Hello LLVM Developers.

This week much of my time is consumed in debugging IPRA's effect on higher level optimization specifically due to not having callee saved registers. I think it was hard but I learned a lot and LLDB helped me a lot.

Here is summary for this week:

Implementation:

============

Implemented a very simple check to prevent no callee saved registers optimization to functions which are recursive or may be optimized as tail call. A simple statistic added to count number of functions optimized for not having callee saved registers.

Testing:

======

Debugged failing test cases due to no callee saved registers optimization. More details with examples can be found here https://groups.google.com/d/topic/llvm-dev/TSoYxeMMzxM/discussion . Now all test in llvm test-suite pass.

Study:

=====

To find some ideas to improve current IPRA I read 2 papers namely “Minimizing Register Usage Penalty at Procedure Calls” by Fred C. Chow and “Register Allocation Across Procedure and Module Boundaries” by Santhanam and Odnert.

1) From the first paper I like the idea of shrink wrap analysis and LLVM currently have this optimization but the approach is completely different. I have initiated a discussion for that, it can be found here https://groups.google.com/d/topic/llvm-dev/_mZoGUQDMGo/discussion I would like to talk to Quentin Colombet more about this.

2) From the second paper I like the idea of spill code motion, in this optimization spill due to callee saved register is pushed to less frequently called caller, but the approach mentioned in that paper requiems call frequency details and also it differs register allocation to very late, the optimization it self requires register usage details but it operates on register usage estimation done in earlier stage. This optimization also requires help from intra-procedural register allocators. I would like to have more discussion on this over IRC this Monday with my mentors.

Plan for next week:

==============

1) Rebase pending patches and get the review process completed.

2) Discuss how can identified ideas can be implemented with in current infrastructure.

3) Discuss how to handle indirect function call with in IPRA.

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jul 10, 2016, 12:42:29 AM7/10/16

to llvm-dev, Hal Finkel, Tim Amini Golling

Hello LLVM Developers,

Please feel free to send any ideas that you can think to improve current IPRA. I will work on it and if possible I will implement that.

Please consider summary of work done during this week.

Implementation:

============

The reviews requests has been updated to reflect the reviews.

Testing:

=====

To get more benefit from IPRA I experimented it with LTO and results were positive. For the SPASS application (one of the multi source benchmark in test suite) execution time is reduce by 0.02s when LTO+IPRA compare to LTO only. The current IPRA works at compile time and its optimization scope is limited to a module so LTO produces a large module from small modules then generates machine code for that. So it will help IPRA by providing a huge module with very less external functions. to use IPRA with LTO one can pass following arguments to clang : -flto -Wl,-mllvm,-enable-ipra . A more detailed discussion can be found here https://groups.google.com/d/topic/llvm-dev/Vkd-NOytdcA/discussion

Study:

=====

Majority of time I have spent on finding new ideas to improve current IPRA. As current approach can only see information with in current module it is very hard to improve it further. Most of the approaches described in literatures requirers help from a program analyzer and intra-procedural register allocation and register allocation for the whole module is differed until IPRA is completed. Also these approaches requires a heavy data flow analysis so that is totally orthogonal to current approach. But still we am able to identified two possible improvements and many thanks to Peter Lawrence for his suggestions and questions.

First improvement is to help IPRA by using __attribute__. This can be particularly use full when working with a library or external code which is written completely in assembly and a user is able to provide accurate register usage information. So idea is to supply regmask details for such external function with __attribute__ in function declaration and let IPRA propagate it to improve register allocation. I will be working on this next week.
Second improvement is to make less frequently executed function save every register it clobbers thus making it preserving all registers and propagating this information to more frequently executed callers to improve its register allocation. This leads us to PGO driven IPRA. More details can be found here. https://groups.google.com/d/topic/llvm-dev/jhC7L50el8k/discussion

Plan for Next Week:

===============

1) Start implementing above two improvements.

2) Run test-suite with --benchmark option so that more precisely improvement can be calculated.

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jul 17, 2016, 11:03:31 AM7/17/16

to llvm-dev, Hal Finkel, Tim Amini Golling

Dear Community,

Sorry for being late for weekly status.

Please find summary for this week as below:

Implementation

============

This week I have implemented support for __attribute__((regmask(“clobbered list here”))). This currently applicable to function declaration only and it provides user a chance to help IPRA by specifying actual register usage by a function which is currently not declared in the module. One such case is when functions written in pure assembly is used inside current module because in such a case if this attribute is not present IPRA will use CC so it will limit performance benefits from IPRA. Alternatively in this particular case one can use preserve_all or preserve_most attribute specified with clang to help IPRA but I believe in some case user may not be able to describe register usage with such CC then attribute “regmask” can help.

For this support I needed to hack clang and LLVM both. How ever it seems that applicability of this kind of attribute is not limited to IPRA so we have initiated discussion on mailing list https://groups.google.com/d/topic/llvm-dev/w70_WljNCHE/discussion.

I have also implemented a patch which fixes a very subtle bug in regmask calculation. Thanks to zan jyu Wong <zyf...@gmail.com> for bringing this to notice.

For example if CL is only clobbered than CH should not be marked clobbered but CX, RCX and ECX should be mark clobbered. Previously for each modified register all of its aliases are marked clobbered by markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because when CL is clobbered then MRI::isPhysRegModified() will return true for CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we mark CH also clobbered as CH is aliased to CX,ECX,RCX so markRegClobbred() is not required because isPhysRegModified already take cares of proper aliasing register. A very simple test case has been added to verify this change.

Testing

=====

This week I run test-suite with —benchmark-only flag and on average 4% improvement is noted in execution time how ever on average 5% increment is noted in compile time.

Study

====

For PGO driven IPRA I am able to access profile summary info in LLVM pass to decide if function is hot or cold but to implement it target independently I also need to determine if function is hot or cold inside a TargetFrameLowringImpl but my attempts failed I will look for other ways to solve this problem.

Plan for next week

==============

1) Work to improve attribute regmask support as per suggestion on RFC.

2) Implement experiment PGO driven IPRA

Sincerely,

Vivek

vivek pandya via llvm-dev

unread,

Jul 25, 2016, 12:42:51 PM7/25/16

to llvm-dev, Hal Finkel, Tim Amini Golling

Dear Community,

Sorry for being late for weekly status report but I was traveling back to my college.

Please consider this week's summary as follows:

Implementation:

============

This week I tried to get experimental PGO driven IPRA work. The idea is to save all register in prolog and restore it in epilog for cold function so that IPRA can propagate some free register to upper region of call graph. For this I changed spill callee saved regs related functions in PrologEpilogInserter pass to pass ProfileSummaryInfo object as parameter so that ultimately I can access it at TargetFrameLowringImpl and override callee saved register details of default regmask. But this also required to change function signature to in target specific code of TargetFrameLowring.

This is not sufficient when running with -g flag or function is a error handler function because for such function at time of object code emission debug register to LLVM register mapping is required. This mapping is not done for each register. For example RCX is not defined.

Testing:

======

This week I also did benchmark for some selected test cases from LLVM test suite. I have compiler and run each program 10 times and measure the average for compile time and execution time impact also code size reduction. I have compile each time with O1 level. The results are good but it is also clear that IPRA may not beneficial for every program but run time increment is not much significant for such cases. Please find details at https://docs.google.com/spreadsheets/d/1d34UcGuUK36B3AY8HN8fvgZZnF4k5KPI1snfYqIIZxU/edit?usp=sharing

Other:

====

A small patch to fix regmask calculation so that alias register are considered properly it committed. Thanks to Matthias Braun for committing it. The details can be found here https://reviews.llvm.org/D22400

Sincerely,

Vivek

Mehdi Amini via llvm-dev

unread,

Jul 25, 2016, 2:26:10 PM7/25/16

to vivek pandya, llvm-dev

Hi Vivek,

On Jul 25, 2016, at 9:42 AM, vivek pandya <vivekv...@gmail.com> wrote:

Dear Community,

Sorry for being late for weekly status report but I was traveling back to my college.

Please consider this week's summary as follows:

Implementation:
============
This week I tried to get experimental PGO driven IPRA work. The idea is to save all register in prolog and restore it in epilog for cold function so that IPRA can propagate some free register to upper region of call graph. For this I changed spill callee saved regs related functions in PrologEpilogInserter pass to pass ProfileSummaryInfo object as parameter so that ultimately I can access it at TargetFrameLowringImpl and override callee saved register details of default regmask. But this also required to change function signature to in target specific code of TargetFrameLowring.
This is not sufficient when running with -g flag or function is a error handler function because for such function at time of object code emission debug register to LLVM register mapping is required. This mapping is not done for each register. For example RCX is not defined.

It is not clear to me how the problem you mention relates to the PGO-driven part only, and not IPRA.

IPRA in general is already changing the calling-convention of functions, and I was expecting the “PGO driven” part to just change “how much” it is changing it (ideally gradually and not “all or nothing”).

—

Mehdi

vivek pandya via llvm-dev

unread,

Jul 31, 2016, 11:19:54 AM7/31/16

to llvm-dev, Hal Finkel, Tim Amini Golling

Dear Community,

I hope you have gone through the results for IPRA. Please feel free to share your thoughts on the spreadsheet.

Please consider this week's work as follow:

Implementation:

============

This week I found the exact problems with PGO driven improvement to IPRA. I seek help from community, the discussion can be found here https://groups.google.com/d/msg/llvm-dev/jhC7L50el8k/-_rCqdn5BgAJ . The previous approach requires changes in mapping of LLVM Regs to Dwarf Regs. In this discussion we concluded that what I want to experiment is similar to setting “preserve_all” CC on cold function. Thanks to Mehdi Amini for this idea!

So now I am doing similar optimization with help of “preserve_all” CC. For this purpose I have changed RegisterUsageInfoPropagate.cpp this pass runs before code generation so changing CC has effect on generated code. How ever I have found out that this optimization can not be applied to all cold function for example functions which are modifying data through pointer passed should not save and restore register which contains parameter data. I am working on this bug. Once this bug is solved I will measure performance impact on several programs and based on this community can decide for this change.

Study:

=====

To understand what was going wrong I read about Call Frame Information and how it used for exception handling. I also learned Dwarf Register Mapping for i386 and x86_64 architectures and how LLVM maps this with help of tablegen. I have also learned about how MC framework is used for register emission in assembly or object file.