Vivek,
I have an application where many of the leaf functions are
Hand-coded assembly language, because they use special IO instructions
That only the assembler knows about. These functions typically don’t
Use any registers besides the incoming argument registers, IE they don’t
Need to use any additional callee-save nor caller-save registers.
Is there any way in your IPRA interprocedural register allocation project that
The user can supply this information for external functions ?
Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ….)) ?
--Peter Lawrence.
Vivek,
I have an application where many of the leaf functions are
Hand-coded assembly language, because they use special IO instructions
That only the assembler knows about. These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.
Is there any way in your IPRA interprocedural register allocation project that
The user can supply this information for external functions ?
Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ....)) ?
--Peter Lawrence.
Vivek,
I am looking into these function attributes in the clang docs
Preserve_most
Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8
These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,
This CALL instruction register USE and DEF info should already be useful
to the intra-procedural register allocator (allowing values live across these
calls to be in what are otherwise caller-save registers),
at least that’s how I read the MC dumps, every call instruction seems to have
every caller-save register flagged as “imp-def”, IE implicitly-defined by the instruction,
and hopefully what is considered a caller-save register at a call-site is defined by the callee.
And this should be the information that IPRA takes advantage of in its bottom-up analysis.
Which leads me to this question, when compiling an entire whole program at one time,
so there is no linking and no LTO, will there ever be IPRA that works within LLC for this scenario,
and is this an objective of your project, or are you focusing only on LTO ?
I know this is not the typical “linux” scenario (dynamic linking of not only standard libraries,
but also sometimes even application libraries, and lots of static linking because of program
size), but it is a typical “embedded” scenario, which is where I am currently.
Other thoughts or comments ?
--Peter Lawrence.
Vivek,
I am looking into these function attributes in the clang docs
Preserve_most
Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8
These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,
This CALL instruction register USE and DEF info should already be useful
to the intra-procedural register allocator (allowing values live across these
calls to be in what are otherwise caller-save registers),
at least that’s how I read the MC dumps, every call instruction seems to have
every caller-save register flagged as “imp-def”, IE implicitly-defined by the instruction,
and hopefully what is considered a caller-save register at a call-site is defined by the callee.
And this should be the information that IPRA takes advantage of in its bottom-up analysis.
Which leads me to this question, when compiling an entire whole program at one time,
so there is no linking and no LTO, will there ever be IPRA that works within LLC for this scenario,
and is this an objective of your project, or are you focusing only on LTO ?
I know this is not the typical “linux” scenario (dynamic linking of not only standard libraries,
but also sometimes even application libraries, and lots of static linking because of program
size), but it is a typical “embedded” scenario, which is where I am currently.
Other thoughts or comments ?
On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:Vivek,
I am looking into these function attributes in the clang docs
Preserve_most
Preserve_all
They are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8
These should provide enough info to solve my problem,
at the MC level calls to functions with these attributes
with be code-gen’ed through different “calling conventions”,
and CALL instructions to them should have different register USE and DEF info,
Yes I believe that preserve_most or preserve_all should help you even with out IPRA. But just to note IPRA can even help further for example on X86 preserve_most cc will not preserve R11 (this can be verified from X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask based on the actual register usage and if procedure with preserve_most cc does not use R11 and none callsite inside of function body then IPRA will mark R11 as preserved. Also IPRA produces RegMask which is super set of RegMask due to calling convention.
On Jul 7, 2016, at 9:17 PM, Lawrence, Peter via llvm-dev <llvm...@lists.llvm.org> wrote:Vivek,I am looking into these function attributes in the clang docsPreserve_mostPreserve_allThey are not available in the 3.6.2 that I am currently using, but I hope they exist in 3.8These should provide enough info to solve my problem,at the MC level calls to functions with these attributeswith be code-gen’ed through different “calling conventions”,and CALL instructions to them should have different register USE and DEF info,This CALL instruction register USE and DEF info should already be usefulto the intra-procedural register allocator (allowing values live across thesecalls to be in what are otherwise caller-save registers),at least that’s how I read the MC dumps, every call instruction seems to haveevery caller-save register flagged as “imp-def”, IE implicitly-defined by the instruction,and hopefully what is considered a caller-save register at a call-site is defined by the callee.And this should be the information that IPRA takes advantage of in its bottom-up analysis.
Which leads me to this question, when compiling an entire whole program at one time,so there is no linking and no LTO, will there ever be IPRA that works within LLC for this scenario,and is this an objective of your project, or are you focusing only on LTO ?
I know this is not the typical “linux” scenario (dynamic linking of not only standard libraries,but also sometimes even application libraries, and lots of static linking because of programsize), but it is a typical “embedded” scenario, which is where I am currently.Other thoughts or comments ?
Vivek,
IIUC it seems that we need two pieces of information to do IPRA,
1. what registers the callee clobbers
2. what the callee does to the call-graph
And it is #2 that we are missing when we define an external function,
Even when we declare it with a preserves or a regmask attribute,
So what I / we need is another attribute that says this is a leaf function,
At least in my case all I’m really concerned with are leaf functions
Thoughts ?
Vivek,
IIUC it seems that we need two pieces of information to do IPRA,
1. what registers the callee clobbers
2. what the callee does to the call-graph
And it is #2 that we are missing when we define an external function,
Even when we declare it with a preserves or a regmask attribute,
So what I / we need is another attribute that says this is a leaf function,
At least in my case all I’m really concerned with are leaf functions
Vivek,
Here’s the way I see it, let me know if you agree or disagree,
You cannot optimize a function’s calling convention (register-usage) unless
You can see and change every caller, and you only know this for non-static functions
if you know that all calls to external functions cannot call back into the current
compilation unit.
#1 gives you the info necessary to change the call-site to the external function
So you don’t need #2 to do RA around the call-site to the external function, instead
You need #2 before you can change any non-static function’s calling convention
within the current compilation unit, assuming you have this information for all
external functions.
To be more concrete, let foo() be a non-static function in the current compilation
Unit, any calls to foo() from external functions will have to use the “default”
Calling convention, so foo’s calling convention cannot be changed. We have to
Know that none of the external functions can call-back to the compilation unit
(they are “leaf” functions relative to the compilation unit) before we can change
Foo()’s calling convention.
Also, the issue of escaping-pointer-to-function is made clear by the example
Of the atexit() and exit() library functions, IE even static functions can end up
Being called by external functions. So exit() can never be declared “leaf”, and
To get the benefit of IPRA it needs to be within the compilation unit, either
By whole-program compilation or by LTO, if it is used.
--Peter.
Mehdi,
The external functions I need to call are all hand-written assembly language,
How would/could LTO handle that ?
--Peter Lawrence.
From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Friday, July 08, 2016 10:58 AM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
On Jul 11, 2016, at 6:51 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:Mehdi,The external functions I need to call are all hand-written assembly language,How would/could LTO handle that ?
On Jul 11, 2016, at 6:45 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:
Vivek,
Here’s the way I see it, let me know if you agree or disagree,You cannot optimize a function’s calling convention (register-usage) unlessYou can see and change every caller,
and you only know this for non-static functionsif you know that all calls to external functions cannot call back into the currentcompilation unit.
#1 gives you the info necessary to change the call-site to the external functionSo you don’t need #2 to do RA around the call-site to the external function, insteadYou need #2 before you can change any non-static function’s calling conventionwithin the current compilation unit, assuming you have this information for allexternal functions.
Mehdi,
I’m compiling embedded applications which are small enough to do
whole-program-compilation. There’s no advantage in breaking them up into
separate compilation pieces and linking them, even though in source form
they are composed of a couple of separate source files.
So for me the compilation unit is always the entire program (and includes main())
Except for some hand-coded-assembly-language support functions that are “external”
to the compilation unit and in my case never call back into the compilation unit,
IE they are always “leaf” functions from the point of view of the compilation unit’s call-graph.
Hence I would like a clang function attribute that says this function is “leaf”
So that IPRA can know that none of the functions it is compiling is ever called
From outside this compilation unit.
And I apologize to everyone for confusingly using the term “compilation unit”
When I meant “whole program”.
Yes I am aware of the fact that if you change a function’s calling convention
By converting some scratch regs into save regs (for example because they aren’t even touched)
Then you are safe to call it from either the default calling convention or the
Optimized calling convention. This is the safe thing to do, and is why I will
Only use “preserves_most” and “preserves_all” optimized calling conventions,
As those will have been implemented by a back-end writer who is aware of
All these compilations (as opposed to the “registermask=” calling convention
Which is much less safe)
I do however feel that IPRA in the whole-program case should not be restricted to
Only scratch-becoming-save changes, I don’t have any data to support the notion,
But it begs to be investigated, unless someone can somehow prove that it can’t help
Performance.
--Peter.
Mehdi,
I’m compiling embedded applications which are small enough to do
whole-program-compilation. There’s no advantage in breaking them up into
separate compilation pieces and linking them, even though in source form
they are composed of a couple of separate source files.
So for me the compilation unit is always the entire program (and includes main())
Except for some hand-coded-assembly-language support functions that are “external”
to the compilation unit and in my case never call back into the compilation unit,
IE they are always “leaf” functions from the point of view of the compilation unit’s call-graph.
Hence I would like a clang function attribute that says this function is “leaf”
So that IPRA can know that none of the functions it is compiling is ever called
From outside this compilation unit.
And I apologize to everyone for confusingly using the term “compilation unit”
When I meant “whole program”.
Yes I am aware of the fact that if you change a function’s calling convention
By converting some scratch regs into save regs (for example because they aren’t even touched)
Then you are safe to call it from either the default calling convention or the
Optimized calling convention. This is the safe thing to do, and is why I will
Only use “preserves_most” and “preserves_all” optimized calling conventions,
As those will have been implemented by a back-end writer who is aware of
All these compilations (as opposed to the “registermask=” calling convention
Which is much less safe)
I do however feel that IPRA in the whole-program case should not be restricted to
Only scratch-becoming-save changes, I don’t have any data to support the notion,
But it begs to be investigated, unless someone can somehow prove that it can’t help
Performance.
Sent from my iPhoneMehdi,
I’m compiling embedded applications which are small enough to do
whole-program-compilation. There’s no advantage in breaking them up into
separate compilation pieces and linking them, even though in source form
they are composed of a couple of separate source files.
Ok, so LTO case basically.
So for me the compilation unit is always the entire program (and includes main())
Except for some hand-coded-assembly-language support functions that are “external”
to the compilation unit and in my case never call back into the compilation unit,
IE they are always “leaf” functions from the point of view of the compilation unit’s call-graph.
Hence I would like a clang function attribute that says this function is “leaf”
So that IPRA can know that none of the functions it is compiling is ever called
From outside this compilation unit.
I believe the usual (and best way from the compiler point of view) way to address your particular scenario is to have a proper export list and use LTO.For instance if you never call into the program from one of your hand-coded assembly routines, LTO should be able to turn every global functions/variables into local ones.
Mehdi,
I am looking for an understanding of 1) IPRA in general, 2) IPRA in LLVM.
Whether I want to use LTO or not is a separate issue.
1) I currently believe it is a true statement that:
If all external functions are known to not call back into the “whole-program”
Being compiled, then IPRA is free to do anything at all to the functions being
Compiled, not limited to only “upgrades” calling convention changes, but
Also allowing “downgrades” calling convention changes as well.
Do you think my current belief #1 is correct ?
2) it seems that LLVM currently limits itself to “upgrades” calling convention changes,
The reason being so that not all call sites are required to be changed,
therefore calls through function pointers can use the default calling convention
If for example there is insufficient analysis to know for sure what functions can be
called from that site.
Is my understanding #2 of IPRA in LLVM correct ?
--Peter.
“whole-program” here is a misnomer since there are external functions, but I don’t
Have a better term for this.
“upgrades” means some scratch regs are converted to save
(the callee either doesn’t touch them at all, or does do save/restore)
“downgrades” means some save regs are converted to scratch
(the callee no longer does save/restore to some registers, and does clobber them)
On Jul 12, 2016, at 12:20 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:
Mehdi,
I am looking for an understanding of 1) IPRA in general, 2) IPRA in LLVM.Whether I want to use LTO or not is a separate issue.1) I currently believe it is a true statement that:If all external functions are known to not call back into the “whole-program”Being compiled, then IPRA is free to do anything at all to the functions beingCompiled, not limited to only “upgrades” calling convention changes, butAlso allowing “downgrades” calling convention changes as well.Do you think my current belief #1 is correct ?
2) it seems that LLVM currently limits itself to “upgrades” calling convention changes,The reason being so that not all call sites are required to be changed,therefore calls through function pointers can use the default calling conventionIf for example there is insufficient analysis to know for sure what functions can becalled from that site.Is my understanding #2 of IPRA in LLVM correct ?
“whole-program” here is a misnomer since there are external functions, but I don’tHave a better term for this.
Mehdi,
In my mind at least, “whole program” means no dynamic libraries, so the only
external functions are simple runtime support, do you have a suggested term for that ?
--Peter.
From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Tuesday, July 12, 2016 12:31 PM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
On Jul 12, 2016, at 12:20 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:
Vivek,
I apologize if you took my original email as a request for implementation,
I meant to be asking what is already available, I think the answer to that
is the ‘preserves_most’ and ‘preserves_all’ attributes, but I will also
Use ‘regmask’ if those prove to be too sub-optimal.
I am still interested in figuring out the necessary and sufficient conditions
For LLC to do optimal IPRA when given a “whole program”
(as per my previous definition of “whole program”),
As opposed to how to accomplish this with LTO,
If you are open to having such discussions, even though your focus
IIUC is supposed to be LTO, then great. I think Mehdi is stuck trying
To convince me to use LTO, but given all the changes I’ve had to make
To CodeGen (IE outside of my Target sub-dir) for having separate Data and Address
register sets, I think using LTO is a long term solution that I can’t take
On just now (IE the svn branch merge problem)
As one of my old math professors used to say “don’t use a sledge hammer
To crush a pea”, to wit I am only compiling a single source file as an entire whole
Program and I don’t do any linking, why should I have to use a linker.
--Peter Lawrence
Vivek,
I have an application where many of the leaf functions are
Hand-coded assembly language, because they use special IO instructions
That only the assembler knows about. These functions typically don't
Use any registers besides the incoming argument registers, IE they don't
Need to use any additional callee-save nor caller-save registers.
Perhaps using some form of __attribute__ ?
Maybe __attribute__ ((registermask = ....)) ?
--Peter Lawrence.
From: vivek pandya [mailto:vivekv...@gmail.com]
Sent: Wednesday, July 13, 2016 11:47 AM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: mehdi...@apple.com; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
Hello Peter,
On Jul 13, 2016, at 12:26 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:Vivek,I apologize if you took my original email as a request for implementation,I meant to be asking what is already available, I think the answer to thatis the ‘preserves_most’ and ‘preserves_all’ attributes, but I will alsoUse ‘regmask’ if those prove to be too sub-optimal.I am still interested in figuring out the necessary and sufficient conditionsFor LLC to do optimal IPRA when given a “whole program”(as per my previous definition of “whole program”),As opposed to how to accomplish this with LTO,
If you are open to having such discussions, even though your focusIIUC is supposed to be LTO, then great. I think Mehdi is stuck tryingTo convince me to use LTO, but given all the changes I’ve had to makeTo CodeGen (IE outside of my Target sub-dir) for having separate Data and Addressregister sets, I think using LTO is a long term solution that I can’t takeOn just now (IE the svn branch merge problem)
As one of my old math professors used to say “don’t use a sledge hammerTo crush a pea”, to wit I am only compiling a single source file as an entire wholeProgram and I don’t do any linking, why should I have to use a linker.
Mehdi,
I am perusing the 3.8 trunk sources, and don’t find evidence where I
would expect it for LLVM “downgrading” a function’s calling convention.
PrologEpilogEmitter() { “CodeGen/”
...
TFI->determineCalleeSaves() { “Target/XYZ/”
TargetFrameLowering::determineCalleeSaves() { “CodeGen/”
Return <<< some object derived from “*CallingConv.td” >>>; “build/lib/Target/XYX/”
}
...
SavedRegs.set(Reg); // to “add” a reg, EG for ‘hasFP’, ETC
...
}
}
The SavedRegs set always starts out with a predefined calling-convention value
That comes typically from “*CallingConv.td” hence is not function-specific.
The only time SavedRegs.reset() is ever called (which is rarely to begin with)
are for target-specific, calling-conventions-specific reasons, never function-specific.
Perhaps I’m looking in the wrong place ?
But I think while we both agree that in principle LLVM could “downgrade” a function,
Given that it can provably see every call-site to it, it does not seem like this is actually
Happening, unless I’m missing something ???
(even if true I’m not claiming we’re missing an important case, I don’t have any
Logical arguments either way and don’t have any evidence either way. I’m just
Trying to understand what LLVM actually does or does not do).
--Peter Lawrence.
On Jul 13, 2016, at 4:24 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:Mehdi,I am perusing the 3.8 trunk sources, and don’t find evidence where Iwould expect it for LLVM “downgrading” a function’s calling convention.
Mehdi,
I’m seeing lots of “upgrading” logic,
If (UseIPRA)
createPass(new DummyCGSCCPass);
if (UseIPRA)
addPass(createRegUsageInfoPropPass());
if (UseIPRA)
addPass(createRegUsageInfoCollector());
???
--Peter.
From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Wednesday, July 13, 2016 4:26 PM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
On Jul 13, 2016, at 4:24 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:
Mehdi,
My bad, I said “3.8 trunk” when I should have said “trunk”
On Jul 13, 2016, at 12:26 PM, Lawrence, Peter <c_pl...@qca.qualcomm.com> wrote:Vivek,I apologize if you took my original email as a request for implementation,I meant to be asking what is already available, I think the answer to thatis the ‘preserves_most’ and ‘preserves_all’ attributes, but I will alsoUse ‘regmask’ if those prove to be too sub-optimal.
Vivek,
First of all, I'd like to thank you for you hard work. Your work really helps me a lot.
But I have a question about regmask collector.
In lib/CodeGen/RegUsageInfoCollector.cpp, there's a for-loop to iterator over all registers to check
if they are modified:
for (unsigned PReg = 1, PRegE = TRI->getNumRegs(); PReg < PRegE; ++PReg)
if (MRI->isPhysRegModified(PReg, true))
markRegClobbered(TRI, &RegMask[0], PReg);
void RegUsageInfoCollector::markRegClobbered(const TargetRegisterInfo *TRI,
uint32_t *RegMask, unsigned PReg) {
// If PReg is clobbered then all of its alias are also clobbered.
for (MCRegAliasIterator AI(PReg, TRI, true); AI.isValid(); ++AI) {
DEBUG(dbgs() << "mark: " << TRI->getName(*AI) << "\n");
RegMask[*AI / 32] &= ~(1u << (*AI % 32));
}
}
Suppose that r0, r1 is sub-regs of d0. And function use only r0. Then both r0, d0 will return true
when call with MRI->isPhysRegModified. When call `markRegClobbered' using d0, r1 will mark as clobbered, too.
But I don't think that r1 should marked as clobbered.
I'm wondering that if this is expcted behavior? Thanks again.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Mehdi,
Bravo, Well done Mr Amini !
Ordinarily I would find adding “static” to all functions objectionable,
We’re doing whole-program compilation and optimization, and don’t
use a linker, so “static” currently doesn’t appear anywhere in our sources.
And I view “static” as not really part of the language, rather more
of a linker directive. But we really only have a small handful of real-time
performance critical functions, it will be trivial to declare them static.
--Peter Lawrence.
From: mehdi...@apple.com [mailto:mehdi...@apple.com]
Sent: Wednesday, July 13, 2016 12:42 PM
To: Lawrence, Peter <c_pl...@qca.qualcomm.com>
Cc: vivek pandya <vivekv...@gmail.com>; llvm-dev <llvm...@lists.llvm.org>; llvm-dev...@lists.llvm.org; Hal Finkel <hfi...@anl.gov>
Subject: Re: [llvm-dev] IPRA, interprocedural register allocation, question
On Jul 13, 2016, at 12:40 PM, Mehdi Amini <mehdi...@apple.com> wrote:
Mehdi,
Many thanks, this really helps me understand “the llvm way”