Teresa Johnson | | Software Engineer | | tejo...@google.com | | 408-460-2413 |
Thanks for the writeup! A few notes/questions below. Sorry
for the slow response.
> 1 Introduction
>
> This document describes handling of symbols that may need linkage type changes or renaming to support ThinLTO importing.This applies to both the symbol in its original module as well as in the module importing it.
>
> In LLVM, the GlobalValue class is used to represent function, variable and alias symbols. The Function and GlobalVariable classes are both derived from the GlobalObject class which itself is derived from GlobalValue. The GlobalAlias class is derived directly from GlobalValue. Note that LLVM GlobalValues include static variables and functions in C/C++.
>
> During ThinLTO importing of a function from another module, the symbols from that module are parsed and imported as either a declaration or definition. Note that while we are importing one function at a time from another module, we typically always import all variable definitions (the one exceptions are variables with AppendingLinkage and WeakAnyLinkage, as described in Section 2.2).
>
> The following sections discuss the handling of values with the given original linkage types (in the original module). The effects on the linkage type in both the original module and in the imported copy in another module are described.
>
>
> 2 Non-Discardable Values
>
> Non-discardable values are those that cannot be discarded by a module even when the value is unreferenced, i.e. it may be referenced by another module. This includes all linkage types except local (internal or private) and linkonce.
(Isn't available_externally also discardable?)
> For these values (both variables and functions), a reference may be imported into another module without fear that its definition may be eliminated in the original module. As a result, when the definitions of these non-discardable values are imported, the imported copy may safely be eliminated after optimizations such as inlining, as a copy is guaranteed to be available in the original module. In practice we may not eliminate these imported definitions, as discussed below, depending on the linkage type handling in the imported copy.
>
> 2.2 Linkage Effects
>
> There is no change to the linkage type required in the original module for any non-discardable values that may be imported to another module.
>
> During importing, ideally all non-discardable definitions have their linkage type changed in the imported copy to AvailableExternallyLinkage. This signals to the compiler that the definition can safely be eliminated after inlining (i.e. by the new EliminateAvailableExternally pass). In practice, we do not change the linkage for all imported non-discardable defs. The linkage effects for the non-discardable linkage types are described below:
>
> 2.2.2 ExternalLinkage
>
> An imported ExternalLinkage definition can be changed to AvailableExternallyLinkage. If it is eliminated later by the EliminateAvailableExternally pass, the resulting decl becomes ExternalLinkage.
>
> 2.2.1 WeakAnyLinkage
>
> Importing a WeakAnyLinkage definition could change the result of the program as it could cause a different weak definition to be selected by the linker. WeakAnyLinkage can be specified via __attribute__ ((weak)) on a function definition, to allow overriding by a strong definition in another module. If no strong definition exists, the linker will select the first weak definition. Importing a weak definition into a different module can change the order the weak defs are seen by the linker and change the program semantics. Therefore, any WeakAnyLinkage definitions are only imported as declarations, which are given ExternalWeakLinkage. WeakAny aliases are handled similarly (imported as ExternalWeakLinkage aliases).
>
> 2.2.2 WeakODRLinkage
>
> For WeakODRLinkage, there is a guarantee that all copies will be equivalent, so the issue described above for WeakAny does not exist, and the definition can be imported. For WeakODRLinkage, the imported definition should retain the original WeakODRLinkage.
I think LinkOnceODRLinkage has the right semantics here.
> If imported as a declaration, it should instead have ExternalWeakLinkage.
I think you can get away with just ExternalLinkage for this? We
found a definition to import, so it must exist somewhere.
Moreover, thinking about where ODR comes from in C++, this:
--
template <class T> T foo() { return 0; }
extern template int foo<int>();
int bar() { return foo<int>(); }
--
creates:
--
declare i32 @_Z3fooIiET_v() #1
--
and this:
--
template <class T> T foo() { return 0; }
template int foo<int>();
--
creates:
--
define weak_odr i32 @_Z3fooIiET_v() #0 {
ret i32 0
}
--
> WeakODRLinkage symbols cannot be marked AvailableExternallyLinkage, because if the def is later dropped (by the EliminateAvailableExternally pass), the new decl is marked ExternalLinkage. For these weak symbols, however, the correct linkage for the decl is actually ExternalWeakLinkage, so that they get treated appropriately by the linker. But the information about their original weak linkage would be gone once they were changed to AvailableExternallyLinkage. For now, since weak symbols are expected to be uncommon, we will leave these symbols with their original weak linkage, which means they will not be discardable in the imported destination. If this becomes a problem, we can investigate retaining information about the original linkage type.
I think importing the definitions as LinkOnceODRLinkage will just
do the right thing here.
> 2.2.3 AppendingLinkage
>
> This applies to special variables such as the global constructors and destructors lists. We never import these as they would get executed multiple times, which is incorrect.
>
> 2.2.4 CommonLinkage
>
> Since common symbols are always zero-initialized variables, they do not take up room. It is simplest to import these defs as common.
>
> 3 Linkonce Values
>
> The LinkOnceODRLinkage and LinkOnceAnyLinkage types refer to linkonce linkage, which allows merging of different globals with the same name. Unreferenced linkonce globals may also be discarded. Linkonce values include some COMDAT functions (COMDAT may also have Weak linkage) and vtable variables. For linkonce values, duplicates are allowed and the linker selects one.
>
> 3.1 Linkage Effects
>
> Since duplicates are allowed and the linkonce values are already discardable, imported linkonce values can remain linkonce in the imported copy. Any duplicate imported copy will be handled by the linker, and it may remain discardable in the importing module if it isn’t referenced after importing/inlining.
>
> Similarly, there is also no change in linkage type required in the original module.
>
> 3.2 Importing Strategy
>
> The main issue with linkonce values is that they are discardable in the original module (e.g. if all references are inlined in the original module). However, ThinLTO importing may introduce a cross-module reference to a linkonce value in the original module. Care must be taken to ensure that such a reference imported into another module is satisfied at link time by a definition somewhere. To handle this for linkonce functions, the ThinLTO importer must force-import any linkonce functions referenced by another imported function. To do this, after importing a function, the ThinLTO importer walks all newly imported operations looking for references to functions with linkonce linkage type. Any found are also imported, along with functions and variables in the same COMDAT group (note the COMDAT group must always be imported in its entirety regardless of whether it has linkonce or weak linkage). For linkonce variables, since GlobalVariable definitions are always imported when we import a function from the same module as described in the introduction, the linkonce variable definitions are therefore imported and available.
This scares me a little for linkonce -- there's a minor change to
semantics if the importing module would have linked against a
*different* definition of the same symbol -- but I'm not really
sure it matters much.
This promotion can change semantics in important ways if the
executable is a plugin (or a new object file, like with `ld -r`).
I think you can fix this by giving promoted statics "hidden"
visibility, instructing the linker to demote them back to where
they came from.
I think this will also fix symbol table bloat. The resulting link
will downgrade the symbols back to (effectively) InternalLinkage,
and the user has the option of -strip-symbols/etc. to effectively
downgrade everything to PrivateLinkage if there's an obscurity
concern.
> 4.3.2 Renaming
>
> Since there may be multiple static functions from different modules with the same name before importing/promotion, or a global function which already has the same name, the promoted static functions must be renamed to avoid naming conflicts. It is important that the promoted definition in the original module is given the same name as the promoted reference in the importing module, so that the reference in the importing module can be satisfied by the original module at link time.
>
> To do the renaming consistently, we can include a module-specific identifier in the new name. The plugin step in phase-2 of ThinLTO (which builds the combined function index/summary) has visibility into all modules included in the ThinLTO build. It can simply number the modules and record the assigned module ID in the function summary information (either along with each function from that module, or along with the module name which will be shared in a module string table for efficiency). Then during promotion, the module ID in the combined function index/summary can be consulted and appended to the original name, along with an LLVM-specific suffix identifying this as a promoted static.
>
> For example,
> static void foo(); // In module with ID 1
> originally has bitcode definition in module ID 1:
> define internal void @foo() // InternalLinkage
> which becomes after promotion/renaming:
> define void @foo.llvm.1() // ExternalLinkage, new suffix “.llvm.1”
>
> When we import a static definition into another module (say module ID 2), before promotion it has declaration:
> define internal void @foo() // InternalLinkage, func summary info: from Module ID 1
> which becomes after promotion/renaming:
> define available_externally void @foo.llvm.1() // AvailableExternallyLinkage, new suffix “.llvm.1”
>
> If a reference is imported but not the definition, before promotion the new declaration is:
> declare internal void @foo() // InternalLinkage, func summary info: from Module ID 1
> which becomes after promotion/renaming:
> declare void @foo.llvm.1() // ExternalLinkage, new suffix “.llvm.1”
>
> Note that the new EliminateAvailableExternallyPass (under review) will change the linkage type from AvailableExternallyLinkage to ExternalLinkage on the declaration it leaves for any eliminated available externally definitions, which is consistent with the above behavior.
>
I wonder whether a prefix would be better?
All the promotion/renaming scares me. I feel like there may be
dragons here we're not aware of.
The only concrete concern I have (once you switch to hidden
visibility) is the interaction with non-LTO'd objects being linked
into the same executable, but I wonder if I'm missing something
else, too...
> <ThinLTOSymbolLinkageandRenaming.pdf>_______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> WeakODRLinkage symbols cannot be marked AvailableExternallyLinkage, because if the def is later dropped (by the EliminateAvailableExternally pass), the new decl is marked ExternalLinkage. For these weak symbols, however, the correct linkage for the decl is actually ExternalWeakLinkage, so that they get treated appropriately by the linker. But the information about their original weak linkage would be gone once they were changed to AvailableExternallyLinkage. For now, since weak symbols are expected to be uncommon, we will leave these symbols with their original weak linkage, which means they will not be discardable in the imported destination. If this becomes a problem, we can investigate retaining information about the original linkage type.
I think importing the definitions as LinkOnceODRLinkage will just
do the right thing here.
> 2.2.3 AppendingLinkage
>
> This applies to special variables such as the global constructors and destructors lists. We never import these as they would get executed multiple times, which is incorrect.
>
> 2.2.4 CommonLinkage
>
> Since common symbols are always zero-initialized variables, they do not take up room. It is simplest to import these defs as common.
>
> 3 Linkonce Values
>
> The LinkOnceODRLinkage and LinkOnceAnyLinkage types refer to linkonce linkage, which allows merging of different globals with the same name. Unreferenced linkonce globals may also be discarded. Linkonce values include some COMDAT functions (COMDAT may also have Weak linkage) and vtable variables. For linkonce values, duplicates are allowed and the linker selects one.
>
> 3.1 Linkage Effects
>
> Since duplicates are allowed and the linkonce values are already discardable, imported linkonce values can remain linkonce in the imported copy. Any duplicate imported copy will be handled by the linker, and it may remain discardable in the importing module if it isn’t referenced after importing/inlining.
>
> Similarly, there is also no change in linkage type required in the original module.
>
> 3.2 Importing Strategy
>
> The main issue with linkonce values is that they are discardable in the original module (e.g. if all references are inlined in the original module). However, ThinLTO importing may introduce a cross-module reference to a linkonce value in the original module. Care must be taken to ensure that such a reference imported into another module is satisfied at link time by a definition somewhere. To handle this for linkonce functions, the ThinLTO importer must force-import any linkonce functions referenced by another imported function. To do this, after importing a function, the ThinLTO importer walks all newly imported operations looking for references to functions with linkonce linkage type. Any found are also imported, along with functions and variables in the same COMDAT group (note the COMDAT group must always be imported in its entirety regardless of whether it has linkonce or weak linkage). For linkonce variables, since GlobalVariable definitions are always imported when we import a function from the same module as described in the introduction, the linkonce variable definitions are therefore imported and available.
This scares me a little for linkonce -- there's a minor change to
semantics if the importing module would have linked against a
*different* definition of the same symbol -- but I'm not really
sure it matters much.
> 4.3.2 Renaming
>
> Since there may be multiple static functions from different modules with the same name before importing/promotion, or a global function which already has the same name, the promoted static functions must be renamed to avoid naming conflicts. It is important that the promoted definition in the original module is given the same name as the promoted reference in the importing module, so that the reference in the importing module can be satisfied by the original module at link time.
>
> To do the renaming consistently, we can include a module-specific identifier in the new name. The plugin step in phase-2 of ThinLTO (which builds the combined function index/summary) has visibility into all modules included in the ThinLTO build. It can simply number the modules and record the assigned module ID in the function summary information (either along with each function from that module, or along with the module name which will be shared in a module string table for efficiency). Then during promotion, the module ID in the combined function index/summary can be consulted and appended to the original name, along with an LLVM-specific suffix identifying this as a promoted static.
>
> For example,
> static void foo(); // In module with ID 1
> originally has bitcode definition in module ID 1:
> define internal void @foo() // InternalLinkage
> which becomes after promotion/renaming:
> define void @foo.llvm.1() // ExternalLinkage, new suffix “.llvm.1”
>
> When we import a static definition into another module (say module ID 2), before promotion it has declaration:
> define internal void @foo() // InternalLinkage, func summary info: from Module ID 1
> which becomes after promotion/renaming:
> define available_externally void @foo.llvm.1() // AvailableExternallyLinkage, new suffix “.llvm.1”
>
> If a reference is imported but not the definition, before promotion the new declaration is:
> declare internal void @foo() // InternalLinkage, func summary info: from Module ID 1
> which becomes after promotion/renaming:
> declare void @foo.llvm.1() // ExternalLinkage, new suffix “.llvm.1”
>
> Note that the new EliminateAvailableExternallyPass (under review) will change the linkage type from AvailableExternallyLinkage to ExternalLinkage on the declaration it leaves for any eliminated available externally definitions, which is consistent with the above behavior.
>
I wonder whether a prefix would be better?
All the promotion/renaming scares me. I feel like there may be
dragons here we're not aware of.
The only concrete concern I have (once you switch to hidden
visibility) is the interaction with non-LTO'd objects being linked
into the same executable, but I wonder if I'm missing something
else, too...
This scares me a little for linkonce -- there's a minor change to
semantics if the importing module would have linked against a
*different* definition of the same symbol -- but I'm not really
sure it matters much.
>
I wonder whether a prefix would be better?
All the promotion/renaming scares me. I feel like there may be
dragons here we're not aware of.
The only concrete concern I have (once you switch to hidden
visibility) is the interaction with non-LTO'd objects being linked
into the same executable, but I wonder if I'm missing something
else, too...
Nope, just thinking out loud. Suffix sounds fine to me.