Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone's list though, I realize)
Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?
This might be slightly sub-optimal, due to, say, the namespace being foreign to that CU. But it's how we do types currently, I think? So at least it'd be consistent and probably cheap enough/better.
I should see the IR metadata in both cases to really know how it worked before, but it seems that to be able to merge the two subprogram definitions when linking, we’d need to be able to have a list of CU per subprogram instead of a single one, right?
> Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?
Looks like this should work as well, but I don’t enough about the way we emit Dwarf...
—
Mehdi
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Trying to wrap my brain around this, so a few questions below. =)
Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone's list though, I realize)
Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?I can see how this would be done in LTO where the compiler has full visibility. For ThinLTO presumably we would need to do some index-based marking? Can we at least do something when we import an inlined SP and drop it since we know it is defined elsewhere?
—
Mehdi
Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone's list though, I realize)
Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?I can see how this would be done in LTO where the compiler has full visibility. For ThinLTO presumably we would need to do some index-based marking? Can we at least do something when we import an inlined SP and drop it since we know it is defined elsewhere?Complete visibility isn't required to benefit here - and unfortunately there's nothing fancier (that I know of) that we can do to avoid emitting one definition of each used inline function in each thinlto object file we produce (we can't say "oh, the name of the function, its mangled name, the names and types of its parameters are over in that other object file/somewhere else" - but we can avoid emitting those descriptions in each /CU/ that uses the inlined function within a single ThinLTO object)
I can provide some more thorough examples if that'd be helpful :)
Thanks,Teresa
This might be slightly sub-optimal, due to, say, the namespace being foreign to that CU. But it's how we do types currently, I think? So at least it'd be consistent and probably cheap enough/better.
Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone's list though, I realize)
Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?I can see how this would be done in LTO where the compiler has full visibility. For ThinLTO presumably we would need to do some index-based marking? Can we at least do something when we import an inlined SP and drop it since we know it is defined elsewhere?Complete visibility isn't required to benefit here - and unfortunately there's nothing fancier (that I know of) that we can do to avoid emitting one definition of each used inline function in each thinlto object file we produce (we can't say "oh, the name of the function, its mangled name, the names and types of its parameters are over in that other object file/somewhere else" - but we can avoid emitting those descriptions in each /CU/ that uses the inlined function within a single ThinLTO object)
I can provide some more thorough examples if that'd be helpful :)Ok, I think I understand. This is only emitting once per object file, which with ThinLTO can contain multiple CUs due to importing. But then with full LTO it sounds like we would be in even better shape, since it has a single module with all the CUs?
Any thoughts? I imagine this is probably worth a reasonable amount of savings in an optimized build. Not huge, but not nothing. (probably not the top of anyone's list though, I realize)
Should we remove the CU link from a non-internal linkage subprogram (& this may have an effect on the GV representation issue originally being discussed) and just emit it into whichever CU happens to need it first?I can see how this would be done in LTO where the compiler has full visibility. For ThinLTO presumably we would need to do some index-based marking? Can we at least do something when we import an inlined SP and drop it since we know it is defined elsewhere?Complete visibility isn't required to benefit here - and unfortunately there's nothing fancier (that I know of) that we can do to avoid emitting one definition of each used inline function in each thinlto object file we produce (we can't say "oh, the name of the function, its mangled name, the names and types of its parameters are over in that other object file/somewhere else" - but we can avoid emitting those descriptions in each /CU/ that uses the inlined function within a single ThinLTO object)
I can provide some more thorough examples if that'd be helpful :)Ok, I think I understand. This is only emitting once per object file, which with ThinLTO can contain multiple CUs due to importing. But then with full LTO it sounds like we would be in even better shape, since it has a single module with all the CUs?Right - currently we emit once per CU, but with a change in format we could emit once per object file - which hurts ThinLTO over non-LTO (because ThinLTO produces more CUs (due to imports) per object file) and is neutral for full LTO (since it produces the same number of CUs, just in one object file).
Improving this representation to produce once per object would help get ThinLTO back what it's currently paying - and improve full LTO further than its current position in this regard.
But the gains might not be major/significant - I've done nothing to assess that, just observing that it is suboptimal.
Thanks for asking/helping me explain it further, hopefully this is more descriptive.
- DaveTeresaThanks,Teresa
This might be slightly sub-optimal, due to, say, the namespace being foreign to that CU. But it's how we do types currently, I think? So at least it'd be consistent and probably cheap enough/better.
// !dbgFunction -> DISubprogram// Declaration (optional, only for member functions).DISubprogramDefn -> DISubprogramDecl// Variables:DISubprogramDefn -> MDNode -> DILocalVariable// Terminator for scope chains:DILocalVariable [-> DILocalScope [-> ...]] -> DISubprogramDefnDILocation [-> DILocalScope [-> ...]] -> DISubprogramDefnDICompositeType -> DISubprogramDefn // function-local type// Elements (member functions):DICompositeType -> MDNode -> DISubprogramDeclDISubprogramDecl -> DICompositeTypeDISubprogramDefn -> DICompositeType
A few disjoint thoughts; sorry they're so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).Firstly, why *should* DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).- It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce.
Now that the Function <-> DISubprogram connection has been reversed, I'm not entirely sure it's still a necessary precaution.- It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren't really uniqued, anyway (since we don't bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste.
Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren't uniqued anymore either) or bring back uniquing cycles (which would have effects... but not the desired one).But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:- are acyclic, and- reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).
On Fri, Dec 23, 2016 at 11:47 AM Duncan P. N. Exon Smith <dexon...@apple.com> wrote:A few disjoint thoughts; sorry they're so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).Firstly, why *should* DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).- It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce.Any idea why that ^ was a problem/bugs?Now that the Function <-> DISubprogram connection has been reversed, I'm not entirely sure it's still a necessary precaution.- It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren't really uniqued, anyway (since we don't bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste.
Ah, fair point - my trivial example didn't have any local variables, so didn't hit this problem. They do indeed form cycles and I haven't tested, but seems totally reasonable/likely that that would break my simple example because cycles break uniquing and all that.
Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren't uniqued anymore either) or bring back uniquing cycles (which would have effects... but not the desired one).But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:- are acyclic, and- reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).
Perhaps before I dive into your suggestion (which, it seems, is to separate out the definition but let the declaration/locals form cycles
- that would at least be dropped because the definition would only be kept due to it being referenced from the llvm::Function), what about something simpler:
What if scope chains didn't go all the way to the top (the subprogram) but stopped short of that?
Now, granted - this was part of the great assertion I added several years back that caught all kinds of silliness - mostly around inlined call sites not having debugloc and leading to scope chains that lead to distinct roots. But the alternative to having and maintaining an invariant - is just to make things true by construction. All debug info scope chains within a function with an associated DISubprogram have an implicit last element of that DISubprogram?
That would mean it'd be harder to catch the bugs I found
On Fri, Dec 23, 2016 at 11:47 AM Duncan P. N. Exon Smith <dexon...@apple.com> wrote:A few disjoint thoughts; sorry they're so delayed (I skimmed the responses below, and I think these are still relevant/not covered elsewhere).Firstly, why *should* DISubprogram definitions be distinct? There were two reasons this was valuable (this was from before there was a cu: link).- It helped to fix long-standing bugs in the IRLinker, where uniqued-DISubprograms in different compile units would coalesce.Any idea why that ^ was a problem/bugs?Now that the Function <-> DISubprogram connection has been reversed, I'm not entirely sure it's still a necessary precaution.- It broke a several uniquing cycles involving DISubprogram. The existence of the uniquing cycles meant that affected DISubprograms weren't really uniqued, anyway (since we don't bother to solve graph isomorphism); and breaking the cycles sped up the IRLinker and reduced memory waste.
Ah, fair point - my trivial example didn't have any local variables, so didn't hit this problem. They do indeed form cycles and I haven't tested, but seems totally reasonable/likely that that would break my simple example because cycles break uniquing and all that.
Making DISubprogram uniqued again would either have no real effect (because distinct nodes they reference aren't uniqued anymore either) or bring back uniquing cycles (which would have effects... but not the desired one).But there ought to be a way to get coalescing in the right places. We just need to ensure that the uniqued/coalesced parts of the graph:- are acyclic, and- reference only coalesced nodes (note that leaves can happily include the strangely-coalesced distinct DICompositeType beast I created).
Perhaps before I dive into your suggestion (which, it seems, is to separate out the definition but let the declaration/locals form cyclesNo (new) cycles; in fact this breaks some.- the Function itself references the defn and the locals;- the locals and the defn reference (transitively) the decl; and- the decl doesn't reference any of those.The only cycle is the decl <-> composite type one that already exists (and that odr magic (mostly) fixes). Maybe that one could be removed somehow too, but it isn't directly relevant to the problem you're looking at.
(Note that this is somewhat inspired by Reid's point in his recent debug info talk, that CodeView avoids type cycles by breaking types into two records.)
- that would at least be dropped because the definition would only be kept due to it being referenced from the llvm::Function), what about something simpler:
What if scope chains didn't go all the way to the top (the subprogram) but stopped short of that?
Now, granted - this was part of the great assertion I added several years back that caught all kinds of silliness - mostly around inlined call sites not having debugloc and leading to scope chains that lead to distinct roots. But the alternative to having and maintaining an invariant - is just to make things true by construction. All debug info scope chains within a function with an associated DISubprogram have an implicit last element of that DISubprogram?That seems reasonable, I think, now that the function points directly at its subprogram.I'm not sure it's simpler to get right though. Dropping the scope runs the risk of coalescing things incorrectly. I imagine the backend relies on pointer distinctness of two local variables called "a" that are in (or are parameters of) unrelated subprograms. I think in some situations they could be coalesced incorrectly if they had no scope.
$ cat a.cpp
#include "a.h"
void f3() { f2(); }
$ cat b.cpp
#include "a.h"
void f3();
void f1() { }
int main() {
f3();
f2();
}
$ cat a.h
void f1();
struct t1 { int x; };
inline __attribute__((always_inline)) void f2(t1 = t1()) { f1(); }
$ $ clang++-tot -fuse-ld=lld -flto a.cpp b.cpp -g && llvm-dwarfdump-tot a.out -debug-info | grep "DW_AT_name\|DW_TAG_\|DW_AT_type" | sed -e "s/^............//"
DW_TAG_compile_unit
DW_AT_name ("a.cpp")
DW_TAG_subprogram
DW_AT_name ("f2")
DW_TAG_formal_parameter
DW_AT_type (0x0000003e "t1")
DW_TAG_structure_type
DW_AT_name ("t1")
DW_TAG_member
DW_AT_name ("x")
DW_AT_type (0x00000054 "int")
...
DW_TAG_compile_unit
DW_AT_name ("b.cpp")
DW_TAG_subprogram
DW_AT_name ("f2")
DW_TAG_formal_parameter
DW_AT_type (0x000000000000003e "t1")
...
DW_TAG_compile_unit
DW_AT_name ("a.cpp")
DW_TAG_subprogram
DW_AT_name ("f2")
DW_TAG_formal_parameter
DW_AT_type (0x0000003e "t1")
DW_TAG_structure_type
DW_AT_name ("t1")
DW_TAG_member
DW_AT_name ("x")
DW_AT_type (0x00000054 "int")
DW_TAG_base_type
DW_AT_name ("int")
DW_TAG_subprogram
DW_AT_name ("f3")
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000002a "_Z2f22t1")
DW_TAG_formal_parameter
DW_AT_abstract_origin (0x00000036)
DW_TAG_compile_unit
DW_AT_name ("b.cpp")
DW_TAG_subprogram
DW_AT_name ("f1")
DW_TAG_subprogram
DW_AT_name ("main")
DW_AT_type (0x0000000000000054 "int")
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x000000000000002a "_Z2f22t1")
(side note: Anyone interested in making ThinLTO home type definitions? That'd be great to reduce type duplication - would mean ThinLTO could turn off type units and/or that .o/.dwo/etc files would just generally be smaller anyway)
My recollection is that type units can end up being used for types that actually don’t take up a lot of space, and so the overhead of the type units ends up costing extra. I put a task on our internal tracker to look into this properly, but we wouldn’t mind if someone else had a run at it (if only to prove that my recollection is incorrect).
Thanks,
--paulr
who is on vacation and really shouldn’t be looking at these emails
My recollection is that type units can end up being used for types that actually don’t take up a lot of space, and so the overhead of the type units ends up costing extra. I put a task on our internal tracker to look into this properly, but we wouldn’t mind if someone else had a run at it (if only to prove that my recollection is incorrect).
Thanks,
--paulr
who is on vacation and really shouldn’t be looking at these emails
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Reid Kleckner via llvm-dev
Sent: Friday, October 15, 2021 3:26 PM
To: David Blaikie <dbla...@gmail.com>
Cc: llvm-dev <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions
Sorry for derailing into your aside, but...
On Fri, Oct 15, 2021 at 12:07 PM David Blaikie <dbla...@gmail.com> wrote:
(side note: Anyone interested in making ThinLTO home type definitions? That'd be great to reduce type duplication - would mean ThinLTO could turn off type units and/or that .o/.dwo/etc files would just generally be smaller anyway)
Hey, I think that is a great idea! ThinLTO tends to be used in release build configurations, which tend to have debug info enabled. The size of symbols in release build config matters a lot because it is typically archived for a long time.
COFF & MachO platforms have other ways to deduplicate types (PDBs & dsymutil), but less duplicate stuff always makes things faster.
For ELF users already using type units, this optimization will only allow us to recover the overhead of type units, which I recall is significant. In our evaluation of the feature for Chrome, we found it increased the final binary size significantly: crbug.com/1031936#c4