[llvm-dev] IPRA, interprocedural register allocation, question

134 views
Skip to first unread message

Lawrence, Peter via llvm-dev

unread,
Jul 6, 2016, 12:50:59 PM7/6/16
to llvm-dev

Vivek,

          I have an application where many of the leaf functions are

Hand-coded assembly language,  because they use special IO instructions

That only the assembler knows about.  These functions typically don’t

Use any registers besides the incoming argument registers, IE they don’t

Need to use any additional callee-save nor caller-save registers.

 

Is there any way in your IPRA interprocedural register allocation project that

The user can supply this information for external functions ?

Perhaps using some form of __attribute__ ?

Maybe __attribute__ ((registermask = ….))  ?

 

 

--Peter Lawrence.

vivek pandya via llvm-dev

unread,
Jul 6, 2016, 5:09:25 PM7/6/16
to llvm-dev, llvm-dev...@lists.llvm.org, Peter Lawrence
Hello Peter,

Thanks to pointing out this interesting case. 
Vivek,
          I have an application where many of the leaf functions are
Hand-coded assembly language,  because they use special IO instructions
That only the assembler knows about.  These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.

If inline asm template has specified clobbered list properly than IPRA is able to use that information and it propagates correct register mask (and that also means that skipping clobbers list while IPRA enabled may broke executable)
For example in following code:
int gcd( int a, int b ) {
    int result ;
    /* Compute Greatest Common Divisor using Euclid's Algorithm */
    __asm__ __volatile__ ( "movl %1, %%r15d;"
                          "movl %2, %%ecx;"
                          "CONTD: cmpl $0, %%ecx;"
                          "je DONE;"
                          "xorl %%r13d, %%r13d;"
                          "idivl %%ecx;"
                          "movl %%ecx, %%r15d;"
                          "movl %%r13d, %%ecx;"
                          "jmp CONTD;"
                          "DONE: movl %%r15d, %0;" : "=g" (result) : "g" (a), "g" (b) : "ecx" ,"r13", "r15"
    );

    return result ;
}
IPRA calculates and propagates correct regmask in which it marks CH, CL, ECX .. clobbered and R13, R15 is not marked clobbered as it is callee saved and LLVM code generators also insert spill/restores code for them.

Is there any way in your IPRA interprocedural register allocation project that
The user can supply this information for external functions ?
By external word do you here mean function defined in other module than being used?  In that case as IPRA can operate on only one module at time register usage propagation is not possible. But there is a work around for this problem. You can use IPRA with link time optimization enabled because the way LLVM LTO works it creates a big IR modules out of source files and them optimize and codegen it so in that case IPRA can have actual register usage info (if function will be compiled in current module). 

In case you want to experiment with IPRA please apply http://reviews.llvm.org/D21395 this patch before you begin.

-Vivek
 
Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ....))  ?


--Peter Lawrence.

Lawrence, Peter via llvm-dev

unread,
Jul 8, 2016, 12:18:12 AM7/8/16
to vivek pandya, llvm-dev, llvm-dev...@lists.llvm.org

Vivek,

             I am looking into these function attributes in the clang docs

                Preserve_most

                Preserve_all

They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8

 

These should provide enough info to solve my problem,

at the MC level calls to functions with these attributes

with be code-gen’ed  through different “calling conventions”,

and CALL instructions to them should have different register USE and DEF info,

 

This CALL instruction register USE and DEF info should already be useful

to the intra-procedural register allocator (allowing values live across these

calls to be in what are otherwise caller-save registers),

at least that’s how I read the MC dumps, every call instruction seems to have

every caller-save register flagged as “imp-def”, IE implicitly-defined by the instruction,

and hopefully what is considered a caller-save register at a call-site is defined by the callee.

And this should be the information that IPRA takes advantage of in its bottom-up analysis.

 

 

Which leads me to this question, when compiling an entire whole program at one time,

so there is no linking and no LTO, will there ever be IPRA that works within LLC for this scenario,

and is this an objective of your project, or are you focusing only on LTO ?

 

 

I know this is not the typical “linux” scenario (dynamic linking of not only standard libraries,

but also sometimes even application libraries, and lots of static linking because of program

size), but it is a typical “embedded” scenario, which is where I am currently.

 

 

Other thoughts or comments ?

 

 

--Peter Lawrence.

vivek pandya via llvm-dev

unread,
Jul 8, 2016, 4:12:55 AM7/8/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org
On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Vivek,

             I am looking into these function attributes in the clang docs

                Preserve_most

                Preserve_all

They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8

 

These should provide enough info to solve my problem,

at the MC level calls to functions with these attributes

with be code-gen’ed  through different “calling conventions”,

and CALL instructions to them should have different register USE and DEF info,

 

Yes I believe that preserve_most or preserve_all should help you even with out IPRA. But just to note IPRA can even help further for example on X86 preserve_most cc will not preserve R11 (this can be verified from X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask based on the actual register usage and if procedure with preserve_most cc does not use R11 and none callsite inside of function body then IPRA will mark R11 as preserved. Also IPRA produces RegMask which is super set of RegMask due to calling convention.

This CALL instruction register USE and DEF info should already be useful

to the intra-procedural register allocator (allowing values live across these

calls to be in what are otherwise caller-save registers),

at least that’s how I read the MC dumps, every call instruction seems to have

every caller-save register flagged as “imp-def”, IE implicitly-defined by the instruction,

and hopefully what is considered a caller-save register at a call-site is defined by the callee.

And this should be the information that IPRA takes advantage of in its bottom-up analysis.

 

Yes that is expected help from IPRA. 

 

Which leads me to this question, when compiling an entire whole program at one time,

so there is no linking and no LTO, will there ever be IPRA that works within LLC for this scenario,

and is this an objective of your project, or are you focusing only on LTO ?

The current IPRA infrastructure works at compile time so it's scope of optimization is restricted to a compilation unit. So IPRA can only construct correct register usage information if the procedure's code is generated by same compiler instance that means we can't optimize library calls or procedure defined in other module. This is because we can't keep register usage information data across two different compiler instance. 

Now if we consider LTO, it eliminates above limitation by making a large IR module from smaller modules before generating code and thus we can have register usage information (at lest) for procedure which was previously defined in other module, because now with LTO every thing is in one module. So that also clarifies that IPRA does not do anything at link time.

Now coming to LLC, it can use IPRA and optimize for functions defined in current module. So yes while compiling whole program ( a single huge .bc file) IPRA can be used with LLC. Also just note that if a software is written in separate files per module (which is very common) and still you want to maximize benefits of IPRA, then we can use llvm-link tool to combine several .bc files to produce a huge .bc file and use that with LLC to get maximum benefits. 

 

I know this is not the typical “linux” scenario (dynamic linking of not only standard libraries,

but also sometimes even application libraries, and lots of static linking because of program

size), but it is a typical “embedded” scenario, which is where I am currently.

 

I don't understand this use case but we can have further improvement in IPRA for example if you have several libraries which has already compiled and codegen, but you are able to provide information of register usage for the functions of that libraries than we can think about an approach were we can store register usage information into a file (which will obviously increase compile time) and use that information across different compiler instances so that we can provide register usage information with out having actual code while compiling.

 

Other thoughts or comments ?

 

I am looking for some ideas that can improve current IPRA. So if you feel anything relevant please let me know we can discuss and implement feasible ideas.

Thanks,
Vivek  

vivek pandya via llvm-dev

unread,
Jul 8, 2016, 1:24:24 PM7/8/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org
On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekv...@gmail.com> wrote:


On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Vivek,

             I am looking into these function attributes in the clang docs

                Preserve_most

                Preserve_all

They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8

 

These should provide enough info to solve my problem,

at the MC level calls to functions with these attributes

with be code-gen’ed  through different “calling conventions”,

and CALL instructions to them should have different register USE and DEF info,

 

Yes I believe that preserve_most or preserve_all should help you even with out IPRA. But just to note IPRA can even help further for example on X86 preserve_most cc will not preserve R11 (this can be verified from X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask based on the actual register usage and if procedure with preserve_most cc does not use R11 and none callsite inside of function body then IPRA will mark R11 as preserved. Also IPRA produces RegMask which is super set of RegMask due to calling convention.

I believe that __attribute__ ((registermask = ....))  can provide more flexibility compare to preserve_all or preserve_most CC in some case. So believe that we should try it out.

-Vivek

Mehdi Amini via llvm-dev

unread,
Jul 8, 2016, 1:58:02 PM7/8/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
On Jul 7, 2016, at 9:17 PM, Lawrence, Peter via llvm-dev <llvm...@lists.llvm.org> wrote:

Vivek,
             I am looking into these function attributes in the clang docs
                Preserve_most
                Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8
 
These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed  through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,
 
This CALL instruction register USE and DEF info should already be useful
to the intra-procedural register allocator (allowing values live across these
calls to be in what are otherwise caller-save registers),
at least that’s how I read the MC dumps, every call instruction seems to have
every caller-save register flagged as “imp-def”, IE implicitly-defined by the instruction,
and hopefully what is considered a caller-save register at a call-site is defined by the callee.
And this should be the information that IPRA takes advantage of in its bottom-up analysis.

The idea of IPRA is to *produce* more accurate list of clobbered register by a functions, so that at call site the caller needs to only save/restore the minimum amount of registers across the call.

  
Which leads me to this question, when compiling an entire whole program at one time,
so there is no linking and no LTO, will there ever be IPRA that works within LLC for this scenario,
and is this an objective of your project, or are you focusing only on LTO ?


LTO is just a way of exposing more to the analysis: IPRA can only “optimize” calls to function that are codegen’d during the same compilation.  With LTO since you codegen the full program at once you can basically optimize “every” call.

So IPRA works well without LTO, but will be able to operate only on calls to function that are part of the current compilation.

 
I know this is not the typical “linux” scenario (dynamic linking of not only standard libraries,
but also sometimes even application libraries, and lots of static linking because of program
size), but it is a typical “embedded” scenario, which is where I am currently.
 
 
Other thoughts or comments ?

Any reason *not* to use LTO in your case?

— 
Mehdi

Lawrence, Peter via llvm-dev

unread,
Jul 8, 2016, 10:45:50 PM7/8/16
to vivek pandya, llvm-dev, llvm-dev...@lists.llvm.org

Vivek,

           IIUC it seems that we need two pieces of information to do IPRA,

1. what registers the callee clobbers

2. what the callee does to the call-graph

 

And it is #2 that we are missing when we define an external function,

Even when we declare it with a preserves or a regmask attribute,

 

So what I / we need is another attribute that says this is a leaf function,

At least in my case all I’m really concerned with are leaf functions

 

 

Thoughts ?

vivek pandya via llvm-dev

unread,
Jul 9, 2016, 12:26:15 AM7/9/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org
On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Vivek,

           IIUC it seems that we need two pieces of information to do IPRA,

1. what registers the callee clobbers

2. what the callee does to the call-graph

Yes I think this is enough, but in your case we don't require #2 

 

And it is #2 that we are missing when we define an external function,

Even when we declare it with a preserves or a regmask attribute,

 

Because I think  once we have effect of attribute at IR/MI level then we can just parse it and populate register usage information vector for declared function and then we can propagate reg mask on each call site encountered.
But I am not user will it be easy to get new attribute working or we may need to hack clang for that too.

I would also like to have thoughts from my mentors (Mehdi Amini and Hal Finkel) about this.

So what I / we need is another attribute that says this is a leaf function,

At least in my case all I’m really concerned with are leaf functions

 

I am stating with a simple function  declaration which have a custom attribute.

-Vivek

vivek pandya via llvm-dev

unread,
Jul 11, 2016, 2:27:39 PM7/11/16
to Lawrence, Peter, Tim Amini Golling, Hal Finkel, llvm-dev, cfe-dev@lists.llvm.org Developers, llvm-dev...@lists.llvm.org
Dear Peter, Hal and Mehdi, 

I did some hack around clang so that I can attach a string attribute to function declaration.
So I think instead of adding new regmask attribute it would be better to use existing annotate attribute for example we can use it as follows:

extern void foo() __attribute__((annotate("REGMASK:R11,R8")));  // here R11, R8 are clobbered regs

this will add string REGMASK:R11,R8 into llvm.metadata section and it will be tied to function foo via llvm.global.annotations. ( This currently works with function definitions only, work is needed to make this work with function declaration ) . The llvm.metadata should be accessed at IR level and then it can be parsed to create a regmask out of it.

The parsing will need to access Module object, and I hope when parsing for all such function reconnecting global annotation for function and string value would be simple.

An other approach would be adding a new attribute regmask and while codegen to IR this attribute should get lowered to corresponding string attribute in LLVM IR ( which should also be added) and then a pass would iterate through all such function which has such an attribute and populate register usage container.

But any idea to simplify is welcomed. Please share your views.

I have cced clang mailing list so that clang developers can correct me if I have make any mistake in context of clang.

Sincerely,
Vivek

Lawrence, Peter via llvm-dev

unread,
Jul 11, 2016, 9:45:49 PM7/11/16
to vivek pandya, llvm-dev, llvm-dev...@lists.llvm.org

Vivek,

          Here’s the way I see it, let me know if you agree or disagree,

 

You cannot optimize a function’s calling convention (register-usage) unless

You can see and change every caller, and you only know this for non-static functions

if you know that all calls to external functions cannot call back into the current

compilation unit.

 

#1 gives you the info necessary to change the call-site to the external function

 

So you don’t need #2 to do RA around the call-site to the external function, instead

You need #2 before you can change any non-static function’s calling convention

within the current compilation unit, assuming you have this information for all

external functions.                                                                                                                                              

 

To be more concrete, let foo() be a non-static function in the current compilation

Unit,  any calls to foo() from external functions will have to use the “default”

Calling convention, so foo’s calling convention cannot be changed.  We have to

Know that none of the external functions can call-back to the compilation unit

(they are “leaf” functions relative to the compilation unit) before we can change

Foo()’s calling convention.

 

 

Also, the issue of escaping-pointer-to-function is made clear by the example

Of the atexit() and exit() library functions,  IE even static functions can end up

Being called by external functions.  So exit() can never be declared “leaf”, and

To get the benefit of IPRA it needs to be within the compilation unit, either

By whole-program compilation or by LTO, if it is used.

 

 

--Peter.

Lawrence, Peter via llvm-dev

unread,
Jul 11, 2016, 9:51:28 PM7/11/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

           The external functions I need to call are all hand-written assembly language,

How would/could LTO handle that ?

 

--Peter Lawrence.

 

 

Sent: Friday, July 08, 2016 10:58 AM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>

Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question

 

 

On Jul 7, 2016, at 9:17 PM, Lawrence, Peter via llvm-dev <llvm...@lists.llvm.org> wrote:

Mehdi Amini via llvm-dev

unread,
Jul 11, 2016, 9:53:58 PM7/11/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
On Jul 11, 2016, at 6:51 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Mehdi,
           The external functions I need to call are all hand-written assembly language,
How would/could LTO handle that ?

I thought about inline asm function, not pure .s files.

— 
Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 11, 2016, 10:06:02 PM7/11/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
On Jul 11, 2016, at 6:45 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Vivek,
          Here’s the way I see it, let me know if you agree or disagree,
 
You cannot optimize a function’s calling convention (register-usage) unless
You can see and change every caller,

That’s true only if you want to “downgrade” the guarantees, i.e. if you want to reduce the callee-saved registers.
You can freely provide more information to limit the amount of caller-saved registers to a partial list of call-sites, which is in practice changing the “local" calling convention while keeping it compatible with the public one.


and you only know this for non-static functions
if you know that all calls to external functions cannot call back into the current
compilation unit.

I’m not sure why you consider calls to external functions and call back? If you don’t see main() (the common case) you don’t need a call to an external function to have a possible call to an externally visible function in the current module.

 
#1 gives you the info necessary to change the call-site to the external function
 
So you don’t need #2 to do RA around the call-site to the external function, instead
You need #2 before you can change any non-static function’s calling convention
within the current compilation unit, assuming you have this information for all
external functions.

If I understand the case you have in mind, it is only when you see the main() function in the current module and you’re trying to prove that an externally visible function could not be called from outside the module basically?

It seems to me that this is a bit orthogonal to IPRA: multiple optimizations (IPRA included) work best when functions are deduced local, non-recursive, are not tail called (for IPRA in particular), and don’t have their address taken. 
The “infer-func-attr” and “globalopt” passes try to do their best to make this happen, especially during LTO.

The attribute case that Vivek is adding seems more murky though.

— 
Mehdi

Lawrence, Peter via llvm-dev

unread,
Jul 11, 2016, 10:48:53 PM7/11/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

            I’m compiling embedded applications which are small enough to do

whole-program-compilation. There’s no advantage in breaking them up into

separate compilation pieces and linking them, even though in source form

they are composed of a couple of separate source files.

 

So for me the compilation unit is always the entire program (and includes main())

Except for some hand-coded-assembly-language support functions that are “external”

to the compilation unit and in my case never call back into the compilation unit,

IE they are always “leaf” functions from the point of view of the compilation unit’s call-graph.

 

Hence I would like a clang function attribute that says this function is “leaf”

So that IPRA can know that none of the functions it is compiling is ever called

From outside this compilation unit.

 

And I apologize to everyone for confusingly using the term “compilation unit”

When I meant “whole program”.

 

 

Yes I am aware of the fact that if you change a function’s calling convention

By converting some scratch regs into save regs (for example because they aren’t even touched)

Then you are safe to call it from either the default calling convention or the

Optimized calling convention.   This is the safe thing to do, and is why I will

Only use “preserves_most” and “preserves_all” optimized calling conventions,

As those will have been implemented by a back-end writer who is aware of

All these compilations (as opposed to the “registermask=” calling convention

Which is much less safe)

 

I do however feel that IPRA in the whole-program case should not be restricted to

Only scratch-becoming-save changes, I don’t have any data to support the notion,

But it begs to be investigated, unless someone can somehow prove that it can’t help

Performance.

 

 

--Peter.

Mehdi Amini via llvm-dev

unread,
Jul 11, 2016, 11:41:23 PM7/11/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya


Sent from my iPhone

On Jul 11, 2016, at 7:48 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Mehdi,

            I’m compiling embedded applications which are small enough to do

whole-program-compilation. There’s no advantage in breaking them up into

separate compilation pieces and linking them, even though in source form

they are composed of a couple of separate source files.


Ok, so LTO case basically.


 

So for me the compilation unit is always the entire program (and includes main())

Except for some hand-coded-assembly-language support functions that are “external”

to the compilation unit and in my case never call back into the compilation unit,

IE they are always “leaf” functions from the point of view of the compilation unit’s call-graph.

 

Hence I would like a clang function attribute that says this function is “leaf”

So that IPRA can know that none of the functions it is compiling is ever called

From outside this compilation unit.


I believe the usual (and best way from the compiler point of view) way to address your particular scenario is to have a proper export list and use LTO.
For instance if you never call into the program from one of your hand-coded assembly routines, LTO should be able to turn every global functions/variables into local ones.



 

And I apologize to everyone for confusingly using the term “compilation unit”

When I meant “whole program”.

 

 

Yes I am aware of the fact that if you change a function’s calling convention

By converting some scratch regs into save regs (for example because they aren’t even touched)

Then you are safe to call it from either the default calling convention or the

Optimized calling convention.   This is the safe thing to do, and is why I will

Only use “preserves_most” and “preserves_all” optimized calling conventions,

As those will have been implemented by a back-end writer who is aware of

All these compilations (as opposed to the “registermask=” calling convention

Which is much less safe)

 

I do however feel that IPRA in the whole-program case should not be restricted to

Only scratch-becoming-save changes, I don’t have any data to support the notion,

But it begs to be investigated, unless someone can somehow prove that it can’t help

Performance.


Beside an attribute on declarations, what do you suggest exactly?


-- 
Mehdi

vivek pandya via llvm-dev

unread,
Jul 12, 2016, 1:08:37 AM7/12/16
to Mehdi Amini, llvm-dev, llvm-dev...@lists.llvm.org, Lawrence, Peter
On Tue, Jul 12, 2016 at 9:11 AM, Mehdi Amini <mehdi...@apple.com> wrote:


Sent from my iPhone

On Jul 11, 2016, at 7:48 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Mehdi,

            I’m compiling embedded applications which are small enough to do

whole-program-compilation. There’s no advantage in breaking them up into

separate compilation pieces and linking them, even though in source form

they are composed of a couple of separate source files.


Ok, so LTO case basically.


 

So for me the compilation unit is always the entire program (and includes main())

Except for some hand-coded-assembly-language support functions that are “external”

to the compilation unit and in my case never call back into the compilation unit,

IE they are always “leaf” functions from the point of view of the compilation unit’s call-graph.

 

Hence I would like a clang function attribute that says this function is “leaf”

So that IPRA can know that none of the functions it is compiling is ever called

From outside this compilation unit.

@Peter How do you want this information to be used by IPRA, i.e if a function (probably function declaration) is marked as "leaf" then how it should impact IPRA from your point of view?  


I believe the usual (and best way from the compiler point of view) way to address your particular scenario is to have a proper export list and use LTO.
For instance if you never call into the program from one of your hand-coded assembly routines, LTO should be able to turn every global functions/variables into local ones.

I am agree with Mehdi here. Also as he has summarized with adding attribute regmask we don't want to change calling convention for callee (i.e reducing callee saved registers) but just it can help intra-procedural register allocators at call sites inside the caller of particular hand written assembly function. If you have very frequent uses of such assembly functions and preserve_all or preserve_most is not able help you describing exact register usage in that case using regmask attribute will help a lot.
I believe that propagating actual register usage is totally safe provided user has provided correct information ( compiler can only help to detect semantical error i.e for example R45 is specified but such register is not available on given platform )

-Vivek

Lawrence, Peter via llvm-dev

unread,
Jul 12, 2016, 3:22:35 PM7/12/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

             I am looking for an understanding of   1) IPRA in general,   2) IPRA in LLVM.

Whether I want to use LTO or not is a separate issue.

 

1)  I currently believe it is a true statement that:

                If all external functions are known to not call back into the “whole-program”

                Being compiled, then IPRA is free to do anything at all to the functions being

                Compiled, not limited to only “upgrades” calling convention changes, but

                Also allowing “downgrades” calling convention changes as well.

 

Do you think my current belief #1 is correct ?

 

 

2) it seems that LLVM currently limits itself to “upgrades” calling convention changes,

The reason being so that not all call sites are required to be changed,

therefore calls through function pointers can use the default calling convention

If for example there is insufficient analysis to know for sure what functions can be

called from that site.

 

Is my understanding #2 of IPRA in LLVM correct ?

 

 

--Peter.

 

 

“whole-program” here is a misnomer since there are external functions, but I don’t

Have a better term for this.

 

“upgrades” means some scratch regs are converted to save

(the callee either doesn’t touch them at all, or does do save/restore)

“downgrades” means some save regs are converted to scratch

                (the callee no longer does save/restore to some registers, and does clobber them)

Mehdi Amini via llvm-dev

unread,
Jul 12, 2016, 3:31:08 PM7/12/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
On Jul 12, 2016, at 12:20 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Mehdi,
             I am looking for an understanding of   1) IPRA in general,   2) IPRA in LLVM.
Whether I want to use LTO or not is a separate issue.
 
1)  I currently believe it is a true statement that:
                If all external functions are known to not call back into the “whole-program”
                Being compiled, then IPRA is free to do anything at all to the functions being
                Compiled, not limited to only “upgrades” calling convention changes, but
                Also allowing “downgrades” calling convention changes as well.
 
Do you think my current belief #1 is correct ?

Yes, with some extra assumptions (you don’t use dlsym for instance, and you won’t link to another file with a global initializer that can call any of these).

I expressed this earlier (which include the other issues I mentioned just before) as “we can turn the linkage of every function into local” (or private, or static, whatever denomination you prefer).


2) it seems that LLVM currently limits itself to “upgrades” calling convention changes,
The reason being so that not all call sites are required to be changed,
therefore calls through function pointers can use the default calling convention
If for example there is insufficient analysis to know for sure what functions can be
called from that site.
 
Is my understanding #2 of IPRA in LLVM correct ?


I don’t believe this is correct, currently IPRA will limit itself to this for function that can be called from another module.
I will freely change the calling convention, including downgrades, when it knows that it can see all call sites (+ extra conditions, like no recursion being involved I think).


 
“whole-program” here is a misnomer since there are external functions, but I don’t
Have a better term for this.

I believe you can talk about “main module”, i.e. the module defines the entry point for the program.
Note LLVM can’t make assumption about the lack of dlsym() or global initializer in other module for example, so the linkage type of functions is what tells us about the possibility to call back or not.


— 
Mehdi

Lawrence, Peter via llvm-dev

unread,
Jul 12, 2016, 3:55:09 PM7/12/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

            In my mind at least, “whole program” means no dynamic libraries, so the only

external functions are simple runtime support, do you have a suggested term for that ?

 

--Peter.

 

 

 

From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Tuesday, July 12, 2016 12:31 PM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question

 

 

On Jul 12, 2016, at 12:20 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

vivek pandya via llvm-dev

unread,
Jul 13, 2016, 2:46:40 PM7/13/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org
Hello Peter,

Are you still interested in __attribute__(regmask) ? 
I have done some hack ( both clang+IPRA)  to get it working if you want to play around it I can send a patch by tomorrow.

Sincerely,
Vivek

Lawrence, Peter via llvm-dev

unread,
Jul 13, 2016, 3:26:59 PM7/13/16
to vivek pandya, llvm-dev, llvm-dev...@lists.llvm.org

Vivek,

             I apologize if you took my original email as a request for implementation,

I meant to be asking what is already available, I think the answer to that

is the ‘preserves_most’ and ‘preserves_all’ attributes, but I will also

Use ‘regmask’ if those prove to be too sub-optimal.

 

I am still interested in figuring out the necessary and sufficient conditions

For LLC to do optimal IPRA when given a “whole program”

(as per my previous definition of “whole program”),

As opposed to how to accomplish this with LTO,

 

If you are open to having such discussions, even though your focus

IIUC is supposed to be LTO, then great.   I think Mehdi is stuck trying

To convince me to use LTO, but given all the changes I’ve had to make

To CodeGen (IE outside of my Target sub-dir) for having separate Data and Address

register sets, I think using LTO is a long term solution that I can’t take

On just now (IE the svn branch merge problem)

 

As one of my old math professors used to say “don’t use a sledge hammer

To crush a pea”,  to wit  I am only compiling a single source file as an entire whole

Program and I don’t do any linking, why should I have to use a linker.

 

--Peter Lawrence

 

 

Vivek,
          I have an application where many of the leaf functions are
Hand-coded assembly language,  because they use special IO instructions
That only the assembler knows about.  These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.

Perhaps using some form of __attribute__ ?


Maybe __attribute__ ((registermask = ....))  ?


--Peter Lawrence.

 

 

 

 

 

From: vivek pandya [mailto:vivekv...@gmail.com]
Sent: Wednesday, July 13, 2016 11:47 AM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: mehdi...@apple.com; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question

 

Hello Peter,

Mehdi Amini via llvm-dev

unread,
Jul 13, 2016, 3:40:34 PM7/13/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
On Jul 13, 2016, at 12:26 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Vivek,
             I apologize if you took my original email as a request for implementation,
I meant to be asking what is already available, I think the answer to that
is the ‘preserves_most’ and ‘preserves_all’ attributes, but I will also
Use ‘regmask’ if those prove to be too sub-optimal.
 
I am still interested in figuring out the necessary and sufficient conditions
For LLC to do optimal IPRA when given a “whole program”
(as per my previous definition of “whole program”),
As opposed to how to accomplish this with LTO,

Easy: mark *all* of your function “static” (or “internal” in LLVM denomination).  

If you are open to having such discussions, even though your focus
IIUC is supposed to be LTO, then great.   I think Mehdi is stuck trying
To convince me to use LTO, but given all the changes I’ve had to make
To CodeGen (IE outside of my Target sub-dir) for having separate Data and Address
register sets, I think using LTO is a long term solution that I can’t take
On just now (IE the svn branch merge problem)
 
As one of my old math professors used to say “don’t use a sledge hammer
To crush a pea”,  to wit  I am only compiling a single source file as an entire whole
Program and I don’t do any linking, why should I have to use a linker.

Just semantic issue: you need to tell the optimizer what it can and can’t do. In general we can’t assume that the code being optimized or generated won’t be dlopen/dlsym for instance.
Unfortunately I’d prefer everything to be hidden/private by default and the user having to explicitly export symbols, but that’s not the current model.

The LTO API is here to circumvent this issue: by delaying the optimizations/codegen to the link time, we have more information about what function can / can’t be called from another module. 
One of the key point of LTO is the linker telling us “I don’t need to export this symbol” and we turn it into an “internal” one.

— 
Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 13, 2016, 3:42:31 PM7/13/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
What I meant here is: you don’t *have to* use LTO (especially if you don’t use a linker right now), but it is an existing mechanism to achieve what you want. So somehow you need to accomplish the same thing (I.e. mark as many functions as possible “internal”).
The best way to implement it is very dependent on your workflow.

— 
Mehdi

Lawrence, Peter via llvm-dev

unread,
Jul 13, 2016, 7:24:20 PM7/13/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

               I am perusing the 3.8 trunk sources, and don’t find evidence where I

would expect it for LLVM “downgrading” a function’s calling convention.

 

PrologEpilogEmitter() {         “CodeGen/”

     ...

     TFI->determineCalleeSaves() {        “Target/XYZ/”

           TargetFrameLowering::determineCalleeSaves() {   “CodeGen/”

                Return <<< some object derived from “*CallingConv.td” >>>;     “build/lib/Target/XYX/”

           }

           ...

           SavedRegs.set(Reg);  // to “add” a reg, EG for ‘hasFP’, ETC

           ...

     }

}

 

The SavedRegs set always starts out with a predefined calling-convention value

That comes typically from “*CallingConv.td” hence is not function-specific.

 

The only time SavedRegs.reset() is ever called (which is rarely to begin with)

are for target-specific, calling-conventions-specific reasons, never function-specific.

 

Perhaps I’m looking in the wrong place ?

 

But I think while we both agree that in principle LLVM could “downgrade” a function,

Given that it can provably see every call-site to it, it does not seem like this is actually

Happening, unless I’m missing something ???

 

 

(even if true I’m not claiming we’re missing an important case, I don’t have any

Logical arguments either way and don’t have any evidence either way.  I’m just

Trying to understand what LLVM actually does or does not do).

 

 

--Peter Lawrence.

Mehdi Amini via llvm-dev

unread,
Jul 13, 2016, 7:25:59 PM7/13/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
On Jul 13, 2016, at 4:24 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Mehdi,
               I am perusing the 3.8 trunk sources, and don’t find evidence where I
would expect it for LLVM “downgrading” a function’s calling convention.

IPRA is a project started ~ 2 months ago, there is nothing like that in 3.8 (neither downgrading, nor upgrading).

— 
Mehdi

Lawrence, Peter via llvm-dev

unread,
Jul 13, 2016, 7:29:40 PM7/13/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

                I’m seeing lots of “upgrading” logic,

 

                If (UseIPRA)

                                createPass(new DummyCGSCCPass);

 

                if (UseIPRA)

                                addPass(createRegUsageInfoPropPass());

 

                if (UseIPRA)

                                addPass(createRegUsageInfoCollector());

 

???

 

 

--Peter.

 

 

From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Wednesday, July 13, 2016 4:26 PM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question

 

 

On Jul 13, 2016, at 4:24 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Mehdi Amini via llvm-dev

unread,
Jul 13, 2016, 7:48:32 PM7/13/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
You mentioned 3.8 earlier, and I doubt the code you are referring to here is in there, it is probably in trunk.

Now, anyway the patch wasn’t actually committed yet, it is as of r275347.


— 
Mehdi

Lawrence, Peter via llvm-dev

unread,
Jul 13, 2016, 7:51:16 PM7/13/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

                 My bad,   I said  “3.8  trunk”   when I should have said “trunk”

vivek pandya via llvm-dev

unread,
Jul 14, 2016, 12:51:04 AM7/14/16
to Mehdi Amini, llvm-dev, llvm-dev...@lists.llvm.org, Lawrence, Peter
On Thu, Jul 14, 2016 at 1:10 AM, Mehdi Amini <mehdi...@apple.com> wrote:

On Jul 13, 2016, at 12:26 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:

Vivek,
             I apologize if you took my original email as a request for implementation,
I meant to be asking what is already available, I think the answer to that
is the ‘preserves_most’ and ‘preserves_all’ attributes, but I will also
Use ‘regmask’ if those prove to be too sub-optimal.
Peter there is no need to apologize as we want to get most benefits out of this work ( this is our aim for  GSoC project ). 
Yes 'regmask' can be useful when you can't exactly describe register usage with preserve_most/ preserve_all.  I just ask before sending because to have this feature in truck will take some time (review process).

As far as LLC is concerned what Mehdi has suggested should be enough. Also I have mentioned already even you want to compile multiple source file and get benefits with LLC I believe you can use llvm-link to combine all .bc files to create one module and use resulting .bc file with LLC to get most benefits of IPRA.

-Vivek 

vivek pandya via llvm-dev

unread,
Jul 14, 2016, 7:19:17 AM7/14/16
to zan jyu Wong, llvm-dev


On Thu, Jul 14, 2016 at 4:01 PM, zan jyu Wong <zyf...@gmail.com> wrote:
Vivek,

First of all, I'd like to thank you for you hard work. Your work really helps me a lot.
 
Thanks Zan Jyu Wong I am glad that this helped, but I would like to share credits with my mentors and llvm community who have helped me for this.

I am adding llvm dev list here because if my reasons are wrong then some can help both of us understanding it correctly.

But I have a question about regmask collector.
In lib/CodeGen/RegUsageInfoCollector.cpp, there's a for-loop to iterator over all registers to check
if they are modified:
  for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg)
    if (MRI->isPhysRegModified(PReg, true))
      markRegClobbered(TRI, &RegMask[0], PReg);

void RegUsageInfoCollector::markRegClobbered(const TargetRegisterInfo *TRI,
                                             uint32_t *RegMask, unsigned PReg) {
  // If PReg is clobbered then all of its alias are also clobbered.
  for (MCRegAliasIterator AI(PReg, TRI, true); AI.isValid(); ++AI) {
    DEBUG(dbgs() << "mark: " << TRI->getName(*AI) << "\n");
    RegMask[*AI / 32] &= ~(1u << (*AI % 32));
  }
}

Suppose that r0, r1 is sub-regs of d0. And function use only r0. Then both r0, d0 will return true
when call with MRI->isPhysRegModified. When call `markRegClobbered' using d0, r1 will mark as clobbered, too.
But I don't think that r1 should marked as clobbered.

I'm wondering that if this is expcted behavior? Thanks again.
No I don't think that r1 will be clobbered here. My reasons are as follow with slightly different example :
Consider AL | AH | AX | EAX | RAX   and the way LLVM models this register so that AL is aliased to AX, EAX and RAX similar for AH. This can be verified from lib/Target/X86/X86RegisterInfo.td file consider following comments from the file :
// In the register alias definitions below, we define which registers alias
// which others.  We only specify which registers the small registers alias,
// because the register file generator is smart enough to figure out that
// AL aliases AX if we tell it that AX aliased AL (for example).

see definitions of registers like
def AL : X86Reg<"al", 0>;
def AH : X86Reg<"ah", 4>;
def AX : X86Reg<"ax", 0, [AL,AH]>;
...
So if we mark AX as used/modified than obviously we can't use AL or AH but yes if only AL is clobbered than AH can still be used.
I hope this helps, and llvm devs please correct me if necessary.

-Vivek

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



Lawrence, Peter via llvm-dev

unread,
Jul 14, 2016, 1:48:17 PM7/14/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

               Bravo, Well done Mr Amini !

 

Ordinarily I would find adding “static” to all functions objectionable,

We’re doing whole-program compilation and optimization, and don’t

use a linker, so “static” currently doesn’t appear anywhere in our sources.

And I view “static” as not really part of the language, rather more

of a linker directive.   But we really only have a small handful of real-time

performance critical functions, it will be trivial to declare them static.

 

 

--Peter Lawrence.

 

 

From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Wednesday, July 13, 2016 12:42 PM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question

 

 

On Jul 13, 2016, at 12:40 PM, Mehdi Amini <mehdi...@apple.com> wrote:

Mehdi Amini via llvm-dev

unread,
Jul 14, 2016, 1:58:20 PM7/14/16
to Lawrence, Peter, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya
Just to clarify, because I don't know your exact build flow, LLVM has the internalize pass that can turn all (controlled with a whitelist optionally) symbols static without having to change the source.

-- 
Mehdi

Sent from my iPhone

Lawrence, Peter via llvm-dev

unread,
Jul 15, 2016, 1:49:19 PM7/15/16
to mehdi...@apple.com, llvm-dev, llvm-dev...@lists.llvm.org, vivek pandya

Mehdi,

               Many thanks, this really helps me understand “the llvm way”

Reply all
Reply to author
Forward
0 new messages