[LLVMdev] [Clang] [lld] [llvm-link] Whole program / dead-code optimization

844 views
Skip to first unread message

e...@modk.it

unread,
Jul 16, 2015, 2:41:27 PM7/16/15
to LLVM Developers Mailing List
Hi All,

After the initial learning curve, we're excited to have put together a completely gcc/binutils-free toolchain based on LLVM.  Now that we have things working, we desperately need to optimize the resulting binaries.  Our bin files are up to 10x their fully optimized gcc equivalent (1.5k vs 16k).  This is for a bare metal ARM based system so this is significant.

We're using lld for linking and the following dead code elimination techniques seem to be dead ends:

1) whole program optimization on our most egregious space waster (-fwhole-program not supported by clang)
2) link time optimization (looks like this is only supported by lld for the COFF path not the ELF path)
3) using a linker plugin like gold (-fuse-linker-plugin doesn't seem to be supported by clang)

We have control over the whole codebase and could essentially compile all of our C/C++ code as single file if there was a way to tell clang that it is seeing the whole program.

Any thoughts on how we could achieve this?  This slidedeck suggests using llvm-link to accomplish this: http://llvm.org/devmtg/2013-11/slides/Gao-LTO.pdf.  Is this the most promising way forward?  

Thanks,
Ed

John Criswell

unread,
Jul 16, 2015, 4:15:49 PM7/16/15
to e...@modk.it, LLVM Developers Mailing List
Is there a reason why LLVM's link-time optimization won't work for you?

http://llvm.org/docs/GoldPlugin.html
http://llvm.org/docs/LinkTimeOptimization.html

Regards,

John Criswell


Thanks,
Ed


_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

e...@modk.it

unread,
Jul 16, 2015, 4:32:57 PM7/16/15
to John Criswell, LLVM Developers Mailing List

Is there a reason why LLVM's link-time optimization won't work for you?

http://llvm.org/docs/GoldPlugin.html
http://llvm.org/docs/LinkTimeOptimization.html


Well the primary motivation to move to LLVM is licensing which is why we also ditched binutils since we can't package gcc for iOS due to the GPL.  So in the end the gold plugin wouldn't work for licensing reasons even if we can get it to work technically but thanks for the links I'm still trying to wrap my head around the problem and any info helps.

-Ed

Nick Lewycky

unread,
Jul 17, 2015, 2:21:30 AM7/17/15
to e...@modk.it, LLVM Developers Mailing List
e...@modk.it wrote:
>
> Is there a reason why LLVM's link-time optimization won't work for you?
>
> http://llvm.org/docs/GoldPlugin.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_GoldPlugin.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PR31BXeMANGrAQP2Tt9Eg5psH82vj8Oq1WmyprGhyn8&e=>
> http://llvm.org/docs/LinkTimeOptimization.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LinkTimeOptimization.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PoqmeRXrssdG9xj6Fko_SKttwLPWqUVkxFH41dOcg4w&e=>
>
>
> Well the primary motivation to move to LLVM is licensing which is why we
> also ditched binutils since we can't package gcc for iOS due to the
> GPL. So in the end the gold plugin wouldn't work for licensing reasons
> even if we can get it to work technically but thanks for the links I'm
> still trying to wrap my head around the problem and any info helps.

The right future is a world where lld performs llvm lto for you.

Until then, the technique in Gao's PDF is what I would recommend.

Nick

e...@modk.it

unread,
Jul 18, 2015, 10:50:29 AM7/18/15
to Nick Lewycky, LLVM Developers Mailing List
Thanks Nick.  I've been pursuing Gao's technique but can't seem to get opt to remove obviously dead code from even the following trivial example:

int mult(int a, int b){

    return a*b;

}


int main(void){

    return 0;

}


While mult is never called it still is not removed.  I just can't seem to get opt to understand it's seeing the whole program so it can remove this globally accessible function.  What am I missing?  Seems related to the missing -fwhole-program flag in clang.  Perhaps this is not even possible?  If I can't get any answers here I may repost that specific question since I didn't list [opt] in the original question subject.


Thanks,

Ed

e...@modk.it

unread,
Jul 18, 2015, 1:00:57 PM7/18/15
to Nick Lewycky, LLVM Developers Mailing List
After digging a bit more it seems we can achieve the same as gcc's -fwhole-program by simply marking the mult function as "static" which is all -fwhole-program does anyway.  From https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

 -fwhole-program
Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attributeexternally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers.

So we can accomplish that for now with a simple pass on the source.  But that had me thinking, how do we accomplish the same for unused C++ classes or member functions within classes.  I figured we could accomplish that by changing the linkage type within the llvm IR.  But it turns out these already get linkonce_odr linkage.  http://llvm.org/docs/LangRef.html states "Unreferenced linkonce globals are allowed to be discarded" 

class num{

  private:

    int number;

  public:

    num(int n):number(n){}

    int mult(int other){

        return number*other;

    }

};


int main(void){

    return 0;


If I compile the above to LLVM IR there is actually no trace of the num class which kind of baffles me because what if I was compiling a library and this class was needed in the library consumer?

Either way with this knowledge I think we can get the results we're looking for in the short term and will follow up if we find or come up with anything that could be generally useful to others.

Thanks,
Ed

Nick Lewycky

unread,
Jul 19, 2015, 12:31:50 AM7/19/15
to e...@modk.it, LLVM Developers Mailing List
e...@modk.it wrote:
> After digging a bit more it seems we can achieve the same as gcc's
> -fwhole-program by simply marking the mult function as "static" which is
> all -fwhole-program does anyway. From
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html:
>
> /-fwhole-program/
> /Assume that the current compilation unit represents the whole
> program being compiled. All public functions and variables with the
> exception of |main| and those merged by
> attribute|externally_visible| become static functions and in effect
> are optimized more aggressively by interprocedural optimizers. /
>
> So we can accomplish that for now with a simple pass on the source.

The pass is called "-internalize". You can control its operation a bit
with the -internalize-public-api-file=<filename> and
-internalize-public-api-list=<list> flags.

But
> that had me thinking, how do we accomplish the same for unused C++
> classes or member functions within classes. I figured we could
> accomplish that by changing the linkage type within the llvm IR. But it
> turns out these already get linkonce_odr linkage.
> http://llvm.org/docs/LangRef.html states "Unreferenced linkonce globals
> are allowed to be discarded"

The answer is still internalize. Don't include their names in the list
of public APIs and they'll be switched to 'internal' linkage, then
deleted by the LTO passes.

>
> classnum{
>
> private:
>
> intnumber;
>
> public:
>
> num(intn):number(n){}
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.htm
>
> intmult(intother){
>
> returnnumber*other;
>
> }
>
> };
>
>
> intmain(void){
>
> return0;
>
> }
>
> If I compile the above to LLVM IR there is actually no trace of the num
> class which kind of baffles me because what if I was compiling a library
> and this class was needed in the library consumer?

There is no representation of a class in llvm or .o files. Instead,
there's a plain struct which represents the typed memory (ie., storage
for the non-static data members only), plus a pile of functions which
take a pointer to that memory as their first argument (the member
functions taking their 'this' argument).

Note that 'static' functions also just change the linkage to internal.
So does putting code in an anonymous namespace. The main thing you get
out of linker integration into LLVM LTO is that the linker can look at
the pile of non-llvm code and determine which symbols are required by
the rest of the system and compute that public API list for internalize.

Nick

> Either way with this knowledge I think we can get the results we're
> looking for in the short term and will follow up if we find or come up
> with anything that could be generally useful to others.
>
> Thanks,
> Ed
>
> On Sat, Jul 18, 2015 at 9:46 AM, e...@modk.it <mailto:e...@modk.it>
> <e...@modk.it <mailto:e...@modk.it>> wrote:
>
> Thanks Nick. I've been pursuing Gao's technique but can't seem to
> get opt to remove obviously dead code from even the following
> trivial example:
>
> intmult(inta, intb){
>
> returna*b;
>
> }
>
>
> intmain(void){
>
> return0;
>
> }
>
>
> While mult is never called it still is not removed. I just can't
> seem to get opt to understand it's seeing the whole program so it
> can remove this globally accessible function. What am I missing?
> Seems related to the missing -fwhole-program flag in clang. Perhaps
> this is not even possible? If I can't get any answers here I may
> repost that specific question since I didn't list [opt] in the
> original question subject.
>
>
> Thanks,
>
> Ed
>
>
> On Fri, Jul 17, 2015 at 1:15 AM, Nick Lewycky <nich...@mxc.ca
> <mailto:nich...@mxc.ca>> wrote:
>
> e...@modk.it <mailto:e...@modk.it> wrote:
>
>
> Is there a reason why LLVM's link-time optimization
> won't work for you?
>
> http://llvm.org/docs/GoldPlugin.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_GoldPlugin.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PR31BXeMANGrAQP2Tt9Eg5psH82vj8Oq1WmyprGhyn8&e=
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_GoldPlugin.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PR31BXeMANGrAQP2Tt9Eg5psH82vj8Oq1WmyprGhyn8&e=>>
> http://llvm.org/docs/LinkTimeOptimization.html
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LinkTimeOptimization.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PoqmeRXrssdG9xj6Fko_SKttwLPWqUVkxFH41dOcg4w&e=
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_LinkTimeOptimization.html&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=rF94h73bKDdWVhxOWqRXpvw5pSMgvuHQXJ__qw8n2LU&s=PoqmeRXrssdG9xj6Fko_SKttwLPWqUVkxFH41dOcg4w&e=>>
>
>
> Well the primary motivation to move to LLVM is licensing
> which is why we
> also ditched binutils since we can't package gcc for iOS due
> to the
> GPL. So in the end the gold plugin wouldn't work for
> licensing reasons
> even if we can get it to work technically but thanks for the
> links I'm
> still trying to wrap my head around the problem and any info
> helps.
>
>
> The right future is a world where lld performs llvm lto for you.
>
> Until then, the technique in Gao's PDF is what I would recommend.
>
> Nick
>
>
>

e...@modk.it

unread,
Jul 19, 2015, 11:22:06 AM7/19/15
to Nick Lewycky, LLVM Developers Mailing List
Nick, 
 
that had me thinking, how do we accomplish the same for unused C++
classes or member functions within classes.  I figured we could
accomplish that by changing the linkage type within the llvm IR.  But it
turns out these already get linkonce_odr linkage.
http://llvm.org/docs/LangRef.html states "Unreferenced linkonce globals
are allowed to be discarded"

The answer is still internalize. Don't include their names in the list of public APIs and they'll be switched to 'internal' linkage, then deleted by the LTO passes.

Thanks, someone else suggested this but I didn't realize this affected the link step vs the opt step so I felt it wasn't working.
I'm looking at this now but to be clear, is this known to work with llvm-lld or just gnu-ld? Your previous answer of "the right future is a world where lld performs llvm lto for you" had me thinking not to expect any LTO from lld which is why I've been focused on opt.

There is no representation of a class in llvm or .o files. Instead, there's a plain struct which represents the typed memory (ie., storage for the non-static data members only), plus a pile of functions which take a pointer to that memory as their first argument (the member functions taking their 'this' argument).

Got it.. You do see a lot of class.xyz references in the llvm assembly so it's clear what it represents in the C++ code. But I came to C++ from C so I always think of classes that way anyway ;)

 
Note that 'static' functions also just change the linkage to internal. So does putting code in an anonymous namespace. The main thing you get out of linker integration into LLVM LTO is that the linker can look at the pile of non-llvm code and determine which symbols are required by the rest of the system and compute that public API list for internalize.

What I was surprised about was the effect "inline" has on linkage in C++ and the fact that defining a member function or constructor within the class body makes it inline.  While I don't understand the choice of the name inline semantically, it makes sense that member functions defined within the class body can be removed when not referenced locally since libraries would certainly provide a header with the definitions in separate C++ files (how the compiler knew to discard all traces of an unused class without knowledge of the whole program is what baffled me earlier but makes sense now).  

Thanks,
Ed
Reply all
Reply to author
Forward
0 new messages