[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

704 views
Skip to first unread message

Alexey Lapshin via llvm-dev

unread,
May 8, 2020, 9:18:36 AM5/8/20
to llvm...@lists.llvm.org

Folks, we work on optimization of binary size and improvement of debug info quality.
To reduce the size of the binary we use -ffunction-sections so that unused code would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left behind.
Besides inflated debug info size, we ended up with overlapping address ranges and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete debug info.
Currently, it has the following limitations: does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections, split DWARF, thin lto.

Following are size/performance results for the D74169:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo --compress-debug-sections=zlib

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    |
--------------------------------------------------------------
|    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
--------------------------------------------------------------
|    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
--------------------------------------------------------------


Clang binary:
-------------------------------------------------------------
| Options |      size      |     link time  |  used memory  |
-------------------------------------------------------------
|    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
-------------------------------------------------------------
|    B    |    0.76G( 50%) |   68sec(755%)  | 15055MB(161%) |
-------------------------------------------------------------
|    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
-------------------------------------------------------------
|    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
-------------------------------------------------------------
|    E    |    0.43G( 29%) |   77sec(855%)  | 15000MB(161%) |
-------------------------------------------------------------


lldb loading time:
--------------------------------------------
| Options |      time     |   used memory  |
--------------------------------------------
|    A    |  6.4sec(100%) |  1495MB(100%)  |
--------------------------------------------
|    B    |  4.0sec( 63%) |   826MB( 55%)  |
--------------------------------------------
|    C    |  3.7sec( 58%) |   877MB( 59%)  |
--------------------------------------------
|    D    |  4.3sec( 67%) |  1023MB( 69%)  |
--------------------------------------------
|    E    |  2.1sec( 33%) |   478MB( 32%)  |
--------------------------------------------

I want to discuss the results and to decide whether it is worth to integrate of D74169:

improvements:

1. Reduces the size of debug info(50%).
2. Resolves overlapping of address ranges(D59553).
3. Reduced size of debug info allows tools to work faster and to require less memory.

drawbacks and not implemented features:

1. linking time is increased(755%).

  The --gc-debuginfo option is off by default. So it would affect only those who need it and explicitly specified it.

  I think the current DWARFLinker code could be optimized more to improve performance results.

2. Support of type units.

  That could be implemented further.

3. DWARF5.

   Current DWARFEmitter/DWARFStreamer has an implementation for DWARF generation, which does not support
DWARF5(only debug_names table). At the same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer should be rewritten using
DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to re-use DwarfDebug/DwarfFile.
It would probably be necessary to separate some intermediate level of DwarfDebug/DwarfFile.

4. split DWARF support.

   This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:

   a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage
collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping address
ranges(D59553).

   b) The approach similar to dsymutil implementation could be used to generate monolithic debuginfo created
from .dwo files. That suggestion is from - https://reviews.llvm.org/D74169#1888386.
      i.e., DWARFLinker could be taught to generate the same output as D74169 but for split DWARF as the source.

5. -fmodules-debuginfo

   That problem was described in this review - https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil has the same problem. It could be solved using the fact that DWARFLinker analyzes debuginfo. It could recognize debug info generated for the module and keep it(compile units containing debug info for modules do not have low_pc, high_pc).

6. -flto=thin

   That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could probably be fixed by avoiding generation of such incomplete declaration during thinlto, or, alternatively, DWARFLinker could recognize such situation and copy missed type declaration.

=======================================================================================

Debuginfo, Linker folks, What do you think about current results and future directions?


It introduces quite a significant linking time increase(6x-8x). But it would affect only those who use that feature.

Thus the users will be able to decide whether that linking time increase is acceptable or not.
Resolving all 1-6 points is quite a significant work. But, in the result, debug info is more correct and compact.

Do you think that it would be good to integrate it and to start to work on improving?

Thank you, Alexey.




James Henderson via llvm-dev

unread,
May 11, 2020, 4:25:44 AM5/11/20
to Alexey Lapshin, llvm...@lists.llvm.org
Hi Alexey,

Regarding the link performance timings, have you tried profiling to see if there are any obvious performance improvements that could be made? A slow down of 7x seems like an awfully large amount given what this should be doing after all. Also, do you have an idea whether the slow down is exponential for the size/linear etc?

The problem is that if it is opt-in, but the link time cost is so high, it may put people off ever enabling it, which would be a shame, as the debugger load time improvements seem worthwhile having.

James

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

unread,
May 11, 2020, 4:12:22 PM5/11/20
to Alexey Lapshin, llvm...@lists.llvm.org
Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)? - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

In general, in the current state, I don't have strong feelings either way about this going in as-is with the intent to improve it to make it more viable - or some of that work being done out-of-tree until it's a more viable performance tradeoff. Mostly happy to leave that up to folks more involved with lld.

A couple of minor points...

On Fri, May 8, 2020 at 6:18 AM Alexey Lapshin via llvm-dev <llvm...@lists.llvm.org> wrote:

Folks, we work on optimization of binary size and improvement of debug info quality.
To reduce the size of the binary we use -ffunction-sections so that unused code would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left behind.
Besides inflated debug info size, we ended up with overlapping address ranges and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete debug info.
Currently, it has the following limitations: does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types,


These last 3 ^ are all the same thing, FWIW. (well, in DWARFv5 they go in debug_info, but it's the same feature)
 

multiple .debug_info sections, split DWARF, thin lto.

Following are size/performance results for the D74169:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section

 ^ not sure of the point of testing/showing comparisons with a situation that's currently unsupported
Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware linker. With a DWARF aware linker it'd be generally desirable not to have to add that object size overhead to get the linking improvements. 


3. DWARF5.

   Current DWARFEmitter/DWARFStreamer has an implementation for DWARF generation, which does not support
DWARF5(only debug_names table). At the same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer should be rewritten using
DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to re-use DwarfDebug/DwarfFile.
It would probably be necessary to separate some intermediate level of DwarfDebug/DwarfFile.

4. split DWARF support.

   This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:

   a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage
collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping address
ranges(D59553).

This wouldn't/couldn't completely address the issue - because some address ranges would be in the .dwo files the linker can't see - and they'd still end up with the interesting address ranges.


   b) The approach similar to dsymutil implementation could be used to generate monolithic debuginfo created
from .dwo files. That suggestion is from - https://reviews.llvm.org/D74169#1888386.
      i.e., DWARFLinker could be taught to generate the same output as D74169 but for split DWARF as the source.

5. -fmodules-debuginfo

   That problem was described in this review - https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil has the same problem. It could be solved using the fact that DWARFLinker analyzes debuginfo. It could recognize debug info generated for the module and keep it(compile units containing debug info for modules do not have low_pc, high_pc).

6. -flto=thin

   That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could probably be fixed by avoiding generation of such incomplete declaration during thinlto,

That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type definition is known to exist in another Module, etc)

I don't know if it's a problem since that patch was reverted.

or, alternatively, DWARFLinker could recognize such situation and copy missed type declaration.

=======================================================================================

Debuginfo, Linker folks, What do you think about current results and future directions?


It introduces quite a significant linking time increase(6x-8x). But it would affect only those who use that feature.

Thus the users will be able to decide whether that linking time increase is acceptable or not.
Resolving all 1-6 points is quite a significant work. But, in the result, debug info is more correct and compact.

Do you think that it would be good to integrate it and to start to work on improving?

Thank you, Alexey.




Alexey Lapshin via llvm-dev

unread,
May 11, 2020, 5:07:01 PM5/11/20
to jh737...@my.bristol.ac.uk, llvm...@lists.llvm.org

>Hi Alexey,


Hi James, Thank you for your comments. Please, find my answers below:

>Regarding the link performance timings, have you tried profiling to see if there are any obvious performance >improvements that could be made? A slow down of 7x seems like an awfully large amount given what this >should be doing after all.

I do not see "easy to fix" alternatives. But there are some posibilities to improve performance:

1. ~10% improvement could probably be achieved by optimizing string pools
   (NonRelocatableStringpool/DwarfStringPool).

   Measurements show that it is spent ~10 sec in llvm::StringMapImpl::LookupBucketFor(). The problem
   is that the same strings, again and again, are added to the string pool. Two attributes
   having the same string value would be analyzed (hash calculated) and searched inside
   the string pool. Even if these strings are already in string table(DW_FORM_strp, DW_FORM_strx).
   The process could be optimized for string tables. So that if some string from the string table were
   accessed previously then, it would keep a reference into the string pool. This would eliminate
   a lot of string pool searches.  

2. ~20-30% improvement by processing each object file in parallel.

   Currently, all object files are analyzed sequentially and cloned sequentially.
   Cloning is started in parallel with analyzing. That scheme could be changed:
   analyzing and cloning could be done in parallel for each object file.
   That requires refactoring of DWARFLinker and making string pools and DeclContextTree
    thread-safe.

3. ~10-20% improvement by support type units.

   Currently, dsymutil/DWARFLinker does not support type units. If type units would be supported, then the "analyzing" step could be skipped for significant part of debug info data. This would save time.

4. ~2-3% improvement could probably be achieved by optimizing DWARF parser classes.
   Following is a list of ideas:
   
   https://reviews.llvm.org/D78672#inline-720056
   https://reviews.llvm.org/D78672#2000012
   https://reviews.llvm.org/D78672#2000363.

>Also, do you have an idea whether the slow down is exponential for the size/linear etc?

It is linear. Following is the data for different runs(Output size is the size of overall binary) :

---------------------------------------
| linking time, sec | Output size, MB |
---------------------------------------
|         4         |        64       |
|         5         |        79       |
|        18         |       211       |
|        25         |       308       |
|        29         |       356       |
|        51         |       526       |
|        72         |       788       |
---------------------------------------

>The problem is that if it is opt-in, but the link time cost is so high, it may put people off ever enabling it, which >would be a shame, as the debugger load time improvements seem worthwhile having.

From the other side - integrating of D74169 allows to make things iteratively. Doing above performance optimizations would require significant time. Implementing support of DWARF5 would probably require significant time. It would be much longer to implement whole thing at a time. Also, if D74169 would be integrated then additional people could probably join that work. I think LLVM developer policy encourages splitting some work on smaller pieces and iteratively integrate them.

Thank you, Alexey.

>James

Alexey Lapshin via llvm-dev

unread,
May 13, 2020, 3:37:10 PM5/13/20
to David Blaikie, llvm...@lists.llvm.org

Hi David, Excuse me for delayed answer. It took some time to prepare. Please, find the answers bellow...


>Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

> - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.


There are two general requirements:
 1) Remove (or clean) invalid debug info.
 2) Optimize the DWARF size.

The specifics which our users have:
 - embedded platform which uses 0 as start of .text section.
 - custom toolset which does not support all features yet(f.e. split dwarf).
 - tolerant of the link-time increase.
 - need a useful way to share debug builds.

For the first point: we have a problem "Overlapping address ranges starting from 0"(D59553).
We use custom solution, but the general solution like D74169 would be better here.

For the second point: split dwarf could be a good alternative to have debug info with minimal size.
Still, it has drawbacks (not supported by tools currently, does not solve the "Overlapping address ranges"
problem, not very convenient to share(even using .dwp)).

Thus in long terms, the D74169 looks to be a good solution for us: resolves "Overlapping address ranges"
problem, binary with minimal size, supported by current tools, easy to share debug build(single binary with
minimal size).

> In general, in the current state, I don't have strong feelings either way about this going in as-is with the intent to >improve it to make it more viable - or some of that work being done out-of-tree until it's a more viable >performance tradeoff. Mostly happy to leave that up to folks more involved with lld.
>
>A couple of minor points...

>> C: --function-sections --gc-sections --fdebug-types-section
> ^ not sure of the point of testing/showing comparisons with a situation that's currently unsupported


that situation is currently supported(--gc-debuginfo is not used in this measurement).
"--fdebug-types-section" is supported functionality.
The purpose of these data is to compare results for "--fdebug-types-section" and "--gc-debuginfo".


>>2. Support of type units.


>>  That could be implemented further.

>Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
>linker. With a DWARF aware linker it'd be generally desirable not to have to add that object size overhead to
>get the linking improvements. 

But, DWARFLinker should adequately work with type units since they are already implemented.
If someone uses --fdebug-types-section, then it should adequately work when used together
with --gc-debuginfo(if --gc-debuginfo would be accepted).
Right?

Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
Currently, DWARFLinker analyzes context to understand whether types are the same or not.
But the context is known when types are generated. So, no need to spent the time analyzing it.
If types could be compared without analyzing context, then Dwarf-aware linker would work faster.
That is just an idea(not for immediate implementation): If types would be stored in some "type table"
(instead of COMDAT section group) and could be accessed through hash-id(like type units)
- then it would be the solution requiring fewer bits to store but allowing to compare types
by hash-id(not analysing context).
In this case, size increasing would be small. And processing time could be done faster.

this is just an idea and could be discussed separately from the problem of integrating of D74169.


>>4. split DWARF support.


>>   This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:
>>   a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage

>>   collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping

>> address ranges(D59553).


>This wouldn't/couldn't completely address the issue - because some address ranges would be in the .dwo files >the linker can't see - and they'd still end up with the interesting address ranges.

I see, Thank you. Thus it would not be a complete solution.


>> 6. -flto=thin


>>    That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in

>> current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could

>> probably be fixed by avoiding generation of such incomplete declaration during thinlto,

>> That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>> more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type
>> definition is known to exist in another Module, etc)

>I don't know if it's a problem since that patch was reverted.

Yes. That patch was reverted, but this patch(D74169) has the same problem.
if D74169 would be applied and --gc-debuginfo used then structure type
definition would be removed.

DWARFLinker could handle that case - "removing definitions from some llvm Modules if the type
definition is known to exist in another Module".
i.e. DWARFLinker could replace the declaration with the definition.

But that problem could be more easily resolved when debug info is generated(probably without
significant increase of debug info size):

Let`s check the example:

0x0000000b: DW_TAG_compile_unit
              DW_AT_low_pc      (0x0000000000201700)
              DW_AT_high_pc     (0x0000000000201719)

0x0000002a:   DW_TAG_subprogram
0x00000043:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000000000000086 "_Z1fv")
                  DW_AT_low_pc  (0x0000000000201700)
                  DW_AT_high_pc (0x0000000000201718)

0x00000057:       DW_TAG_variable
                    DW_AT_abstract_origin       (0x0000000000000096 "var")
0x00000065:       NULL

0x00000073: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000080)

0x00000086:   DW_TAG_subprogram
                DW_AT_name      ("f")
                DW_AT_inline    (DW_INL_inlined)

0x00000096:     DW_TAG_variable
                  DW_AT_name    ("var")
                  DW_AT_type    (0x000000a9 "volatile Foo")
0x000000a1:     NULL

0x000000a9:   DW_TAG_volatile_type
                DW_AT_type      (0x000000ae "Foo")

0x000000ae:   DW_TAG_structure_type  
                DW_AT_name      ("Foo")
                DW_AT_declaration       (true)

0x000000c1: DW_TAG_compile_unit
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_high_pc     (0x0000000000000019)

0x000000e0:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x0000000000000019)
                DW_AT_name      ("f")

0x000000fd:     DW_TAG_variable
                  DW_AT_name    ("var")
                  DW_AT_type    (0x00000119 "volatile Foo")

0x00000119:   DW_TAG_volatile_type
                DW_AT_type      (0x0000011e "Foo")

0x0000011e:   DW_TAG_structure_type
                DW_AT_name      ("Foo")
                DW_AT_decl_line (1)

Here we have:

DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function "f".
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function "f".
DW_TAG_compile_unit(0x000000c1) - compile unit containing function "f" definition.

Code for function "f" was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
containing "f" definition (since there is no corresponding code). But it has structure "Foo" definition
DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
was removed by thinlto and replaced with declaration.

Would it cost too much if type definition would not be replaced with declaration for "abstract instance root"?
The number of concrete instances is bigger than number of abstract instance roots.
Probably, it would not be too costly to leave definition in abstract instance root?

Alternatively, Would it cost too much if type definition would not be replaced with declaration when declaration references type from not used function? (lto could understand that concrete function is not used).

David Blaikie via llvm-dev

unread,
May 14, 2020, 9:34:57 PM5/14/20
to Alexey Lapshin, llvm...@lists.llvm.org
On Wed, May 13, 2020 at 12:36 PM Alexey Lapshin <alap...@accesssoftek.com> wrote:

Hi David, Excuse me for delayed answer. It took some time to prepare. Please, find the answers bellow...


>Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

> - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.


There are two general requirements:
 1) Remove (or clean) invalid debug info.

Perhaps a simpler direct solution for your immediate needs might be a much narrower, and more efficient linker-DWARF-awareness feature:

With DWARFv5, rnglists present an opportunity for a DWARF linker to rewrite the ranges without parsing the rest of the DWARF. /technically/ this isn't guaranteed - rnglist entries can be referenced either directly, or by index. If all rnglists are referenced by index, then a linker could parse only the debug_rnglists section and rewrite ranges to remove any address ranges that refer to optimized-out code.

This would only be correct for rnglists that had no direct references to them (that only were referenced via the indexes) - but we could either implement it with that assumption, or could add an LLVM extension attribute on the CU that would say "I promise I only referenced rnglists via rnglistx forms/indexes). If this DWARF-aware linking would have to read the CU DIE (not all the other DIEs) it /could/ also then rewrite high/low_pc if the CU wasn't using ranges... but that wouldn't come up in the function-removal case, because then you'd have ranges anyway, so no need for that.

Such a DWARF-aware rnglist linking could also simplify rnglists, in cases where functions ended up being laid out next to each other, the linker could coalesce their ranges together.

I imagine this could be implemented with very little overhead to linking, especially compared to the overhead of full DWARF-aware linking.

Though none of this fixes Split DWARF, where the linker doesn't get a chance to see the addresses being used - but if you only want/need the CU-level ranges to be correct, this might be a viable fix, and quite efficient.
 
 2) Optimize the DWARF size.

Do your users care much about this? I imagine if they had significant DWARF size issues, they'd have significant link time issues and the kind of cost to link time this feature has would be prohibitive - but perhaps they're sharing linked binaries much more often than they're actually performing linking.
 

The specifics which our users have:
 - embedded platform which uses 0 as start of .text section.
 - custom toolset which does not support all features yet(f.e. split dwarf).
 - tolerant of the link-time increase.
 - need a useful way to share debug builds.

Sharing two files (executable and dwp) is significantly less useful than sharing one file? 
 

For the first point: we have a problem "Overlapping address ranges starting from 0"(D59553).
We use custom solution, but the general solution like D74169 would be better here.

If CU ranges are the only ones that need fixing, then I think the above solution might be as good/better - if more than CU ranges need fixing, then I think we might want to start talking about how to fix DWARF itself (split and non-split) to signal certain addresses point to dead code with a specific blessed value that linkers would need to implement - because with Split DWARF there's no way to solve the non-CU addresses at the linker.
 
For the second point: split dwarf could be a good alternative to have debug info with minimal size.
Still, it has drawbacks (not supported by tools currently, does not solve the "Overlapping address ranges"
problem, not very convenient to share(even using .dwp)).

Thus in long terms, the D74169 looks to be a good solution for us: resolves "Overlapping address ranges"
problem, binary with minimal size, supported by current tools, easy to share debug build(single binary with
minimal size).

> In general, in the current state, I don't have strong feelings either way about this going in as-is with the intent to >improve it to make it more viable - or some of that work being done out-of-tree until it's a more viable >performance tradeoff. Mostly happy to leave that up to folks more involved with lld.
>
>A couple of minor points...

>> C: --function-sections --gc-sections --fdebug-types-section
> ^ not sure of the point of testing/showing comparisons with a situation that's currently unsupported


that situation is currently supported(--gc-debuginfo is not used in this measurement).

Ah, I was confused because it looks like/the description said it was... 
 
"--fdebug-types-section" is supported functionality.
The purpose of these data is to compare results for "--fdebug-types-section" and "--gc-debuginfo".

OK
 


>>2. Support of type units.


>>  That could be implemented further.

>Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
>linker. With a DWARF aware linker it'd be generally desirable not to have to add that object size overhead to
>get the linking improvements. 

But, DWARFLinker should adequately work with type units since they are already implemented.

Maybe - it'd be nice & all, but I don't think it's an outright necessity - if someone knows they're using a DWARF-aware linker, they'd probably not use type units in their object files. It's possible someone doesn't know for sure & maybe they have pre-canned debug object files from someone else, etc.
 
If someone uses --fdebug-types-section, then it should adequately work when used together
with --gc-debuginfo(if --gc-debuginfo would be accepted).
Right?

Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
Currently, DWARFLinker analyzes context to understand whether types are the same or not.

When you say "analyzes context" what do you mean? Usually I'd take that to mean "looks at things outside the type itself - like what namespace it's in, etc" - which, yes, it should do that, but it doesn't seem very expensive to do. But I guess you actually mean something about doing structural equivalence in some way, looking at things inside the type?
I don't follow this example - could you provide a small concrete test case I could reproduce?

Oh, I guess this is happening perhaps because ThinLTO can't know for sure that a standalone definition of 'f' won't be needed - so it produces one in case one of the inlining opportunities doesn't end up inlining. Then it turns out all calls got inlined, so the external definition wasn't needed.

Oh, you're suggesting that these 3 CUs got emitted into one object file during LTO, but that DWARFLinker drops a CU without any code in it - even though... So far as I know, in LTO, LLVM directly references types across units if the CUs are all emitted in the same object file. (and if they weren't in the same object file - then the abstract_origin couldn't be pointing cross-CU).

I guess some basic things to say:

With ThinLTO, the concrete/standalone function definition is emitted in case some call sites don't end up being inlined. So we know it'll be emitted (but might not be needed by the actual linker)
ANy number of inline calls might exist - but we shouldn't put the type information into those, because they aren't guaranteed to emit it (if the inline function gets optimized away, there would be nothing to enforce the type being emitted) - and even if we forced the type information to be emitted into one object file that has an inline copy of the function - there's no guarantee that object file will get linked in either.

So, no, I don't think there's much we can do to keep the size of object files down, while guaranteeing the type information will be emitted with the usual linker semantics.

Alexey Lapshin via llvm-dev

unread,
May 19, 2020, 10:18:04 AM5/19/20
to David Blaikie, llvm...@lists.llvm.org

Hi David, please find my comments inside:


Yes, we think about that alternative. This would resolve our problem of invalid debug info
and would work much faster. Thus, if we would not have good results for D74169 then we
will implement it. Do you think it could be useful to have this solution in upstream?


 
>> 2) Optimize the DWARF size.

> Do your users care much about this? I imagine if they had significant DWARF size issues,
> they'd have significant link time issues and the kind of cost to link time this feature has would
> be prohibitive - but perhaps they're sharing linked binaries much more often than they're
> actually performing linking.

Yes, they do. They also have significant link-time issues.
So current performance results of D74169 are not very acceptable.
We hope to improve it.

 
>>The specifics which our users have:
>>  - embedded platform which uses 0 as start of .text section.
>>  - custom toolset which does not support all features yet(f.e. split dwarf).
>>  - tolerant of the link-time increase.
>>  - need a useful way to share debug builds.

> Sharing two files (executable and dwp) is significantly less useful than sharing one file? 

Probably not significantly, but yes, it looks less useful comparing to D74169.
Having only two files (executable and .dwp) looks significantly better than having executable and multiple .dwo files.
Having only one file(executable) with minimal size looks better than the two files with a bigger size.

clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G for .dwp.
clang compiled with -gc-debuginfo takes only 0.76G for single executable.

 
>>For the first point: we have a problem "Overlapping address ranges starting from 0"(D59553).
>>We use custom solution, but the general solution like D74169 would be better here.

> If CU ranges are the only ones that need fixing, then I think the above solution might be as
> good/better - if more than CU ranges need fixing, then I think we might want to start talking about
> how to fix DWARF itself (split and non-split) to signal certain addresses point to dead code with a
> specific blessed value that linkers would need to implement - because with Split DWARF there's
> no way to solve the non-CU addresses at the linker.

I think the worthful solution for that signal value would be LowPC > HighPC.
That does not require additional bits in DWARF.
It would be natural to skip such address ranges since they explicitly marked as invalid.
It could be implemented in a linker very easily. Probably, it would make sense to describe that
usage in DWARF standard.

As to the addresses which are not seen by the linker(since they are in .dwo files) - yes,
they need to have another solution. Could you show an example of such a case, please?



>>>2. Support of type units.

>>>

>>>>  That could be implemented further.

>>>Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
>>>linker. With a DWARF aware linker it'd be generally desirable not to have to add that object size overhead to
>>>get the linking improvements. 
>>
>>But, DWARFLinker should adequately work with type units since they are already implemented.

> Maybe - it'd be nice & all, but I don't think it's an outright necessity - if someone knows they're using
> a DWARF-aware linker, they'd probably not use type units in their object files. It's possible someone
> doesn't know for sure & maybe they have pre-canned debug object files from someone else, etc.

I see.

>>Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
>>Currently, DWARFLinker analyzes context to understand whether types are the same or not.

>When you say "analyzes context" what do you mean? Usually I'd take that to mean
> "looks at things outside the type itself - like what namespace it's in, etc" - which, yes,
> it should do that, but it doesn't seem very expensive to do. But I guess you actually
> mean something about doing structural equivalence in some way, looking at things inside the type?

I think it could be useful for both cases. Currently, dsymutil does only first thing
(look at type name, namespace name, etc..) and does not do the second thing
(doing structural equivalence). Analyzing type names is currently quite expensive
(the only search in string pool takes ~10 sec from 70 sec of overall time).
That is expensive because of many things should be done to work with strings:
parse DWARF, search and resolve relocations, compute a hash for strings,
put data into a string pool, create a fully qualified name(like namespace::function::name).
It looks like it could be optimized and finally require less time, but it still would be a noticeable
part of the overall time.

If dsymutil starts to check for the structural equivalence, then the process would be even more slowly.
So, If instead of comparing types structure, there would be checked single hash-id - then this process
would also be faster.

Thus I think using hash-id to compare types would allow to make current implementation faster and would
allow handling incomplete types by DWARFLinker without massive performance degradation also.

>> But the context is known when types are generated. So, no need to spent the time analyzing it.
>> If types could be compared without analyzing context, then Dwarf-aware linker would work faster.
>> That is just an idea(not for immediate implementation): If types would be stored in some "type table"
>> (instead of COMDAT section group) and could be accessed through hash-id(like type units
>> - then it would be the solution requiring fewer bits to store but allowing to compare types
>> by hash-id(not analysing context).
>> In this case, size increasing would be small. And processing time could be done faster.
>>
>> this is just an idea and could be discussed separately from the problem of integrating of D74169.

>> >> 6. -flto=thin

>> >>    That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in

>> >> current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could

>> >> probably be fixed by avoiding generation of such incomplete declaration during thinlto,

>> >> That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
>> >> more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type
>> >> definition is known to exist in another Module, etc)
>> >I don't know if it's a problem since that patch was reverted.
>>
>> Yes. That patch was reverted, but this patch(D74169) has the same problem.
>> if D74169 would be applied and --gc-debuginfo used then structure type
>> definition would be removed.

>> DWARFLinker could handle that case - "removing definitions from some llvm Modules if the type
>> definition is known to exist in another Module".
>> i.e. DWARFLinker could replace the declaration with the definition.

>> But that problem could be more easily resolved when debug info is generated(probably without
>> significant increase of debug info size):

>> Here we have:

>> DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function "f".
>> DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function "f".
>> DW_TAG_compile_unit(0x000000c1) - compile unit containing function "f" definition.

>> Code for function "f" was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
>> containing "f" definition (since there is no corresponding code). But it has structure "Foo" definition
>> DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
>> by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
>> was removed by thinlto and replaced with declaration.

>> Would it cost too much if type definition would not be replaced with declaration for "abstract instance root"?
>> The number of concrete instances is bigger than number of abstract instance roots.
>> Probably, it would not be too costly to leave definition in abstract instance root?
 
>> Alternatively, Would it cost too much if type definition would not be replaced with declaration when
>> declaration references type from not used function? (lto could understand that concrete function is not used).

>I don't follow this example - could you provide a small concrete test case I could reproduce?

I would provide a test case if necessary. But it looks like this issue is finally clear, and you already commented on that.


> Oh, I guess this is happening perhaps because ThinLTO can't know for sure that a standalone
> definition of 'f' won't be needed - so it produces one in case one of the inlining opportunities
> doesn't end up inlining. Then it turns out all calls got inlined, so the external definition wasn't needed.

> Oh, you're suggesting that these 3 CUs got emitted into one object file during LTO, but that DWARFLinker
> drops a CU without any code in it - even though... So far as I know, in LTO, LLVM directly references
> types across units if the CUs are all emitted in the same object file. (and if they weren't in the same
> object file - then the abstract_origin couldn't be pointing cross-CU).

> I guess some basic things to say:

> With ThinLTO, the concrete/standalone function definition is emitted in case some call sites don't end up
> being inlined. So we know it'll be emitted (but might not be needed by the actual linker)
> ANy number of inline calls might exist - but we shouldn't put the type information into those, because
> they aren't guaranteed to emit it (if the inline function gets optimized away, there would be nothing to
> enforce the type being emitted) - and even if we forced the type information to be emitted into one
> object file that has an inline copy of the function - there's no guarantee that object file will get linked in either.

> So, no, I don't think there's much we can do to keep the size of object files down, while guaranteeing
> the type information will be emitted with the usual linker semantics.

Then dsymutil/DWARFLinker could be changed to handle that(though it would probably be not very efficient).
If thinlto would understand that function is not used finally(and then must not contain referenced type definition),
then this situation could be handled more effectively.

David Blaikie via llvm-dev

unread,
Jun 2, 2020, 2:24:27 PM6/2/20
to Alexey Lapshin, llvm...@lists.llvm.org

A pure rnglist rewriting - I think it'd be OK to have in upstream -
again, cost/benefit/etc would have to be weighed. I'm not sure it
would save enough space to be particularly valuable beyond the
correctness issue - and it doesn't completely solve the correctness
issue for zero-address usage or low-address usage (because you could
still have overlapping subprograms inside a CU - so if you were
symbolizing you could use the correct rnglist to filter, but then go
look inside the CU only to find two subprograms that had that address
& not know which one was the correct one an which one was the
discarded one).

rnglist rewriting might be easy enough to prototype - but depends what
you want to spend your time on, I know this whole issue has been a
huge investment of your time already - but maybe this recent
revitalization of the conversation around having an explicit value in
the linker might be sufficient to address everyone's needs... *fingers
crossed*)

James Henderson via llvm-dev

unread,
Jun 3, 2020, 3:48:42 AM6/3/20
to David Blaikie, llvm...@lists.llvm.org
It makes me sad that the linker (via a library or otherwise) has to be "DWARF-aware" to be able to effectively handle --gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks of data kicking around.

The patching to -1 (or equivalent) is probably a good lightweight solution (though I'd love it if it could be done based on section type in the future rather than section name, but that's probably outside the realm of DWARF), as it requires only minimal understanding in the linker, but anything beyond that seems to be complicated logic that is mostly due to the structure of DWARF. Patching to -1 does feel a bit like a sticking plaster/band aid to patch over the issue rather than properly solving it too - there will still be debug data (potentially significant amounts in COMDAT-heavy objects) that the linker has to write and the debugger has to somehow know how to skip (even if it knows that -1 is special-case due to the standard being updated, it needs to get as far as the -1), which is all wasted effort.

We've already seen from Alexey's prototyping, and from our own experiences with the Sony proprietary linker (which tried to rewrite .debug_line only) that deconstructing the DWARF so that it can be more optimally reassembled at link time is slow going, and will probably inevitably be however much effort is put into optimising it. For a start, given the current standards, it's impossible to know how to deconstruct it without having to parse vast amounts of DWARF, which is typically going to mean a lot more parsing work than the linker would normally have to deal with. Additionally, much of this parsing work is wasted effort, since it seems unlikely in many links that large amounts of the DWARF will be redundant. Having an option to opt-in doesn't help much there, since it just means the logic exists without most people using it, due to it not being good enough, or potentially they don't even know it exists.

I don't have particularly concrete suggestions as to how to solve the structural problems with DWARF at this point. The only thing that seems obvious to me is a more "blessed" approach to fragmentation of sections, similar to what I tried with my prototype mentioned earlier in the thread, although we'd need to figure out the previously stated performance issues. Other ideas might tie into this, like somehow sharing the various table headers a bit like CIEs in .eh_frame that could be merged by the linker - each object could have separate table header sections, which are referenced by the individual .debug_* blocks, which in turn are one per function/data piece and easily discardable/merged by the linker.

Just some thoughts.

James

Robinson, Paul via llvm-dev

unread,
Jun 3, 2020, 9:37:26 AM6/3/20
to jh737...@my.bristol.ac.uk, David Blaikie, llvm...@lists.llvm.org

DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common, certainly not when talking about function code.  The overhead of a unit occurred only once per translation unit, so that expense was reasonably amortized.

 

Splitting functions into their own object-file sections and making them excludable is an evolution of compiler/linker technology that DWARF has not kept up with.  The linker-friendly solutions (COMDAT DWARF) would put function-related .debug_* contributions into a section-group along with the function .text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the content of each per-function DWARF section.  The fully DWARF-conformant solution would create one partial_unit per function, with the corresponding overhead of unit headers (especially painful in the .debug_line section).  Alternatively we fragment DWARF into sections without headers and rely on the linker to make everything look right in the linked executable; this produces .o files that are not DWARF conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers other than the linker.

 

Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.

--paulr

Alexey Lapshin via llvm-dev

unread,
Jun 3, 2020, 1:27:16 PM6/3/20
to jh737...@my.bristol.ac.uk, David Blaikie, Robinson, Paul, llvm...@lists.llvm.org

>DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common,

>certainly not when talking about function code.  The overhead of a unit occurred only once per

>translation unit, so that expense was reasonably amortized.

 

>Splitting functions into their own object-file sections and making them excludable is an evolution of

>compiler/linker technology that DWARF has not kept up with.  The linker-friendly solutions (COMDAT

>DWARF) would put function-related .debug_* contributions into a section-group along with the function

>.text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the

> content of each per-function DWARF section.  The fully DWARF-conformant solution would create one

> partial_unit per function, with the corresponding overhead of unit headers (especially painful in the

> .debug_line section).  Alternatively we fragment DWARF into sections without headers and rely on the

> linker to make everything look right in the linked executable; this produces .o files that are not DWARF

>conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers

>other than the linker.

 

>Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.


Probably we could try to make DWARF easy to parsing, trimming, rewriting so that full DWARF

parsing solution would not take too much time?

f.e. -debug-types-section solution uses COMDAT sections to split and deduplicate types.

That solution works quite fast. It has already mentioned drawback with a big size

overhead(because of section headers/type unit headers sizes). But, the fact that type units

could be identified just by hash-id(without parsing type names and types hierarchies)

allows the linker to reject duplications quickly. Another thing is that the linker drops

duplicated COMDAT sections without any additional check. After duplications are deleted,

the debug info is still consistent.

There could be done DWARF aware solution working using the same two principles:

1. compare types by hash-id.
2. drop duplications without analyzing contents.

If all types are put into a separate type table and have hash-id, then it would be much easier to

deduplicate them. The idea demonstrated here - https://reviews.llvm.org/P8164. (It still has a

questions: whether base types should be put into type table, whether references into type table

should be done by DW_AT_signature or just by offset, etc.. ) While handling that separate type table

the DWARF aware linker would check the only hash_id and put only one type description

with the same id in the final type table. It also would allow us to solve that -flto=thin problem - 

 http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil example there).

i.e., the case when type definition would be removed will not occur.


Thank you, Alexey.

David Blaikie via llvm-dev

unread,
Jun 3, 2020, 5:31:49 PM6/3/20
to Robinson, Paul, llvm...@lists.llvm.org
On Wed, Jun 3, 2020 at 6:34 AM Robinson, Paul <paul.r...@sony.com> wrote:
>
> DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common, certainly not when talking about function code. The overhead of a unit occurred only once per translation unit, so that expense was reasonably amortized.
>
>
>
> Splitting functions into their own object-file sections and making them excludable is an evolution of compiler/linker technology that DWARF has not kept up with. The linker-friendly solutions (COMDAT DWARF) would put function-related .debug_* contributions into a section-group along with the function .text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the content of each per-function DWARF section. The fully DWARF-conformant solution would create one partial_unit per function, with the corresponding overhead of unit headers (especially painful in the .debug_line section). Alternatively we fragment DWARF into sections without headers and rely on the linker to make everything look right in the linked executable; this produces .o files that are not DWARF conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers other than the linker.

"object files don't contain DWARF, but they contain stuff that the
linker will turn into DWARF" wouldn't seem like the worst thing to me
- what sort of pre-linking parsing of DWARF use cases do you have in
mind, other than for our own compiler development uses?
(notwithstanding in-object Split DWARF (where the .dwo sections would
have to be remain usable without linking) or the MachO style debug
info distribution model which is similar)

But even then, I'm not sure how viable it would be - as Fangrui
pointed out on another thread about this: ELF section overhead itself
is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
rather difficult to reconstruct header-less slice-and-dicable sections
in some cases. For type information (a reduced overhead version of
-fdebug-types-section) I could see it - but for functions, they need
to refer to addresses - preferably in the debug_addr section, and
that's accessed by index, so taking chunks out of it would break other
references to it, etc... adding the header would be expensive, and how
would the CU construct its DW_AT_ranges value if that has to be sliced
and diced? Again, some amount of linker magic might solve some of
these problems - but I think there's still a lot of overhead to making
a solution that's workable with a DWARF-agnostic linker (or even with
a DWARF aware one, but in an efficient amount of time/space where it's
not only usable for small programs, or for linking when you're
shipping a final production binary, etc)

& as always, not sure how any of this would work for Split DWARF -
just a debug_adr section that has some addresses that point to
discardable functions... if we want those addresses themselves to be
discardable (so we don't have to use a tombstone value inserted by the
linker) then they'd need to be in separate debug_addr contributions
with headers, etc - the overhead just seems too high to me in all the
ways I can look at that.

David Blaikie via llvm-dev

unread,
Jun 3, 2020, 5:43:28 PM6/3/20
to Alexey Lapshin, llvm...@lists.llvm.org

I think there is scope for lower-overhead type deduplication,
especially now with type units being merged into the debug_info
section. Perhaps we could drop dwo_ids and use section references to
refer to types & rely on the linker to keep those referenced sections
alive - though section references are longer than CU-relative
references. (but we need the extra length - because if the linker
deduplicates a type definition - one CU may be referencing a type very
far away, so the shorter reference might be inadequate) I don't think
the indirection through the type hash is /super/ significant to the
cost - I think it's more in the duplication of many DIEs especially
for function definitions (since the type unit sig8 system only
provides a way to reference the type - not its member functions, their
parameters, etc - so all those DIEs get duplicated in any CU that
needs to provide a definition of a member function). We could
prototype cross-unit DIE references to lower the cost of that
duplication, though rumor has it that constructor based type homing
might provide enough value to obviate the need for type units (or at
least make the overhead not worthwhile - so revisiting the overhead to
reduce it might make it worthwhile again... ).

Probably wouldn't be super hard to use LLVM's existing cross-unit DIE
Referencing machinery (implemented for LTO) to refer directly to DIEs
in a type unit without using the signature system... - hmm, that'd
only work if your type unit DIEs were identical? /maybe/ ? Not sure
how that'd work if you wanted to refer into a type unit, but the type
unit got deduplicated. Might be able to rely on the linker to preserve
every unique copy of the type unit that's referenced if we phrase
things carefully - so if your compiler does produce exactly identical
type units they get deduplicated and sec_refs refer to the uniquely
preserved copy - but otherwise it preserves as many distinct copies as
needed. (I don't know enough about how that works to be sure - but I
know that these linkonce/inline function deduplication does seem to
cause the DWARF to refer to the singular function if that function is
identical, and if it isn't, then you get 0 - so there's /something/ in
the linker that can adjust for deduplicating identical duplicates... )

Robinson, Paul via llvm-dev

unread,
Jun 4, 2020, 11:28:09 AM6/4/20
to David Blaikie, llvm...@lists.llvm.org


> -----Original Message-----
> From: David Blaikie <dbla...@gmail.com>
> Sent: Wednesday, June 3, 2020 5:31 PM
> To: Robinson, Paul <paul.r...@sony.com>
> Cc: jh737...@my.bristol.ac.uk; llvm...@lists.llvm.org
> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
> in lld.
>
> On Wed, Jun 3, 2020 at 6:34 AM Robinson, Paul <paul.r...@sony.com>
> wrote:
> >
> > DWARF was designed in an era when COMDAT and ICF were not a thing, or at
> least not common, certainly not when talking about function code. The
> overhead of a unit occurred only once per translation unit, so that
> expense was reasonably amortized.
> >
> >
> >
> > Splitting functions into their own object-file sections and making them
> excludable is an evolution of compiler/linker technology that DWARF has
> not kept up with. The linker-friendly solutions (COMDAT DWARF) would put
> function-related .debug_* contributions into a section-group along with
> the function .text itself; this multiplies the total number of sections to
> deal with, regardless of the tactics used for the content of each per-
> function DWARF section. The fully DWARF-conformant solution would create
> one partial_unit per function, with the corresponding overhead of unit
> headers (especially painful in the .debug_line section). Alternatively we
> fragment DWARF into sections without headers and rely on the linker to
> make everything look right in the linked executable; this produces .o
> files that are not DWARF conformant (unless we can standardize this in
> DWARF v6) and would be a big hassle for consumers other than the linker.
>
> "object files don't contain DWARF, but they contain stuff that the
> linker will turn into DWARF" wouldn't seem like the worst thing to me
> - what sort of pre-linking parsing of DWARF use cases do you have in
> mind, other than for our own compiler development uses?

No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
would want to be able to report what's actually happening, but indeed
all the other use-cases that come to mind are not looking at .o files.

> (notwithstanding in-object Split DWARF (where the .dwo sections would
> have to be remain usable without linking) or the MachO style debug
> info distribution model which is similar)

I expect Split DWARF would be incompatible with fragments. I don't
know details about MachO but seems likely the same is true there.

> But even then, I'm not sure how viable it would be - as Fangrui
> pointed out on another thread about this: ELF section overhead itself
> is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> rather difficult to reconstruct header-less slice-and-dicable sections
> in some cases. For type information (a reduced overhead version of
> -fdebug-types-section) I could see it - but for functions, they need
> to refer to addresses - preferably in the debug_addr section, and
> that's accessed by index, so taking chunks out of it would break other
> references to it, etc... adding the header would be expensive, and how
> would the CU construct its DW_AT_ranges value if that has to be sliced
> and diced? Again, some amount of linker magic might solve some of
> these problems - but I think there's still a lot of overhead to making
> a solution that's workable with a DWARF-agnostic linker (or even with
> a DWARF aware one, but in an efficient amount of time/space where it's
> not only usable for small programs, or for linking when you're
> shipping a final production binary, etc)

The idea we have blue-skied internally would work something like this
(initially explicated in terms of the .debug_info section, then seeing
how that tactic applies to other sections):

There's a top fragment, containing the CU header and the CU DIE itself.
Linker magic makes this first in the output file.
Types also go here; certainly base types, and other file-scope types
can be included here or put into type units. (Type units aren't
fragmented, they are their own thing same as always.)
There's a matching bottom fragment, which is just the terminating NULL
for the CU DIE; linker magic makes this last in the output file.

Each function has its own fragment, which is in the same link-group
(COMDAT or whatever) as the function's .text section; that way, if the
function is discarded, so is the .debug_info fragment. Offhand I can't
think of any cases (other than DW_AT_specification, addressed below) of
references to a subprogram DIE from elsewhere, so it should be fine to
discard the entire function fragment as needed. Linker magic puts all
function fragments between the top and bottom fragments, in some
indeterminate order. Each function fragment is the usual complete
subtree, rooted in DW_TAG_subprogram. References to types are either
to type units as normal, or to types in the top fragment. Note that
these references do not require relocations; type units are by signature
as always, and for types in the top fragment, the offsets into the top
fragment are known at compile time.

Inlined functions are described as part of the function they have been
inlined into, being children of the function DIE. DW_AT_specification
refers to the abstract declaration which is in its own fragment (or the
top fragment, but that keeps the declaration from being elided if all
references go away).

If functions are inside namespaces, each function fragment will need
to have namespace DIEs around the function DIE. This adds overhead
but it's pretty small.

I hand-wave filling in the CU header's unit length. I'd expect a
relocation with a reference to the bottom fragment should be able to
compute the correct value.

That's the story for .debug_info; what about other sections?

Sections referenced by index from .debug_info can't be fragmented;
this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.

.debug_str doesn't need to be fragmented, linkers DTRT already.
.debug_macro contents are not tied to functions and won't be fragmented.

.debug_loclists and .debug_rnglists should be fragmentable the same
way as .debug_info; they exist only as extensions of .debug_info, and
the range list for the CU itself is merely a concatenated set of
contributions from each constituent function, so that should Just Work
(although it won't be optimal, adjacent ranges won't be coalesced).
I believe the same is true for .debug_loc and .debug_ranges, although
I haven't checked.
.debug_aranges is functionally equivalent to the CU rangelist.

.debug_line can work the same way as .debug_info but is worth a word.
The top fragment has the header, including the directory/file lists
because those are referenced by index. DW_LNE_define_file can't be
used. Each function has a fragment containing the sequence for that
function, starting with set_address and ending with end_sequence.
The bottom fragment is empty, existing only to allow the length to
be computed.
.debug_line_str is a string section and requires nothing special.

.debug_names ... haven't looked at it but I suspect either it doesn't
survive or it has to be generated post-link (or by the linker).
.debug_frame I *think* can be fragmented, but I haven't take the
time to look at it to make sure.

Those are all the sections I see in DWARF v5 Appendix B.

So that's the blue-sky vision of linker-magic COMDAT DWARF, which
took me about an hour to write down just now. There is certainly
a non-trivial overhead in terms of ELF sections; in the general
case we would have 5 per-function fragments (for .debug_info,
.debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).

Not small, but then other features in the works are using huge
quantities of ELF sections too (section-per-basic-block).

>
> & as always, not sure how any of this would work for Split DWARF -
> just a debug_adr section that has some addresses that point to
> discardable functions... if we want those addresses themselves to be
> discardable (so we don't have to use a tombstone value inserted by the
> linker) then they'd need to be in separate debug_addr contributions
> with headers, etc - the overhead just seems too high to me in all the
> ways I can look at that.

Yeah I think .dwo sections can't take advantage of fragmenting, and
.debug_addr is referenced by index so it can't be fragmented. Although
the point is not to avoid tombstone values, but to do a more efficient
job of editing the final DWARF to omit gc'd functions; it's no problem
at all to use a tombstone value in .debug_addr IMO.
--paulr
> https://urldefense.com/v3/__https://reviews.llvm.org/D54747*1503720__;Iw!!
> JmoZiZGBv3RvKRSx!q8U1OiuTHDnORPTzJINrJOLwncHMDEAyE45t99RrMdkDdSYLjh78mgJen
> L-N0pxHMQ$ . It also exists in
> > >>> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-
> dev__;!!JmoZiZGBv3RvKRSx!q8U1OiuTHDnORPTzJINrJOLwncHMDEAyE45t99RrMdkDdSYLj
> h78mgJenL-Oh8zYPg$
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm...@lists.llvm.org
> > https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-
> dev__;!!JmoZiZGBv3RvKRSx!q8U1OiuTHDnORPTzJINrJOLwncHMDEAyE45t99RrMdkDdSYLj
> h78mgJenL-Oh8zYPg$

David Blaikie via llvm-dev

unread,
Jun 4, 2020, 2:43:58 PM6/4/20
to Robinson, Paul, llvm...@lists.llvm.org

Yep, if they're sub-contribution regions, that wouldn't play well with
Split DWARF. (& full contribution isolation have the DWARF header
overhead, etc)

I'd still be concerned about the ELF header overhead even of this
sub-contribution scheme, but could be interesting to see how it plays
out in practice.

All that said, to avoid burying the lede here, I'll splice something
from the end up here:

> Although the point is not to avoid tombstone values, but to do a more efficient job of editing the final DWARF to omit gc'd functions; it's no problem at all to use a tombstone value in .debug_addr IMO.

But the tombstone values are Alexey's underlying issue (this ongoing
design discussion for over a year now) & /sort/ of mine too recently
(which, unfortunately, is what's reinvigoraetd this discussion -
would've been nice if I/we/someone had identified this sooner &
could've helped Alexey in a more timely manner): Alexey is dealing
with a platform where 0 is a valid address so the lld/gold strategy of
resolving relocations to dead code to "0+addend" creates ambiguous
DWARF. I'm dealing with a case of zero-length functions ("int f1() {
}" or "void f2() { __builtin_unreachable(); }") causing early
termination of DWARFv4 range lists.

The reason for the DWARF-aware linker proposal was because the "let's
choose a better tombstone" discussion didn't go anywhere & people sort
of encouraged in this direction of "what if we didn't need a
tombstone/the linker fixed up the debug info instead". So if the DWARF
redundancy elimination doesn't address the issue of zero as a valid
address, it doesn't address Alexey's needs, unfortunately. :/

That said, I super appreciate the time you've put into writing this up
and it is valuable & I'd love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.

> > But even then, I'm not sure how viable it would be - as Fangrui
> > pointed out on another thread about this: ELF section overhead itself
> > is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> > rather difficult to reconstruct header-less slice-and-dicable sections
> > in some cases. For type information (a reduced overhead version of
> > -fdebug-types-section) I could see it - but for functions, they need
> > to refer to addresses - preferably in the debug_addr section, and
> > that's accessed by index, so taking chunks out of it would break other
> > references to it, etc... adding the header would be expensive, and how
> > would the CU construct its DW_AT_ranges value if that has to be sliced
> > and diced? Again, some amount of linker magic might solve some of
> > these problems - but I think there's still a lot of overhead to making
> > a solution that's workable with a DWARF-agnostic linker (or even with
> > a DWARF aware one, but in an efficient amount of time/space where it's
> > not only usable for small programs, or for linking when you're
> > shipping a final production binary, etc)
>
> The idea we have blue-skied internally would work something like this
> (initially explicated in terms of the .debug_info section, then seeing
> how that tactic applies to other sections):
>
> There's a top fragment, containing the CU header and the CU DIE itself.
> Linker magic makes this first in the output file.

Quick curiosity: Is there existing linker magic for this? What does it
look like? I'd love to know so I can play around with hand crafted
prototypes/keep it in mind for such things.

(basically the ability for an object file to say "here's the start and
end of my contribution to this section, and some bits that /can/ go in
the middle, but you can drop them if you like")

> Types also go here; certainly base types, and other file-scope types
> can be included here or put into type units. (Type units aren't
> fragmented, they are their own thing same as always.)

Separately, it might be worth considering putting types in such a
thing - but, yes, the "How do you reference them when they might be in
your unit or someone else's unit", etc, would have to be figured out.
I guess using an external symbol might be the solution there - again,
with a better understanding of the ^ mentioned linker magic, I'd
probably play around with hand crafting some examples just to see how
this could work.

> There's a matching bottom fragment, which is just the terminating NULL
> for the CU DIE; linker magic makes this last in the output file.

Last of all the contributions from this object file, not last in the
whole output file, right? (please excuse the pedantry, just double
checking)

> Each function has its own fragment, which is in the same link-group
> (COMDAT or whatever) as the function's .text section; that way, if the
> function is discarded, so is the .debug_info fragment. Offhand I can't
> think of any cases (other than DW_AT_specification, addressed below) of
> references to a subprogram DIE from elsewhere,

The call_site DWARF would want to refer to a subprogram DIE, but that
could be handled by (first pass) having a declaration subprogram in
the initial fragment that the call_site could refer to using the usual
assembler-resolved CU-relative offset. Of course that'd mean a bunch
of (probably the bigger part) of the function's DWARF footprint
wouldn't be deduplicated, but would address this part of the address
tombstone issue (if not using debug_addr) & reduce some of the DWARF -
the addresses are pretty big (if you're not pooling them), etc.

> so it should be fine to
> discard the entire function fragment as needed. Linker magic puts all
> function fragments between the top and bottom fragments, in some
> indeterminate order. Each function fragment is the usual complete
> subtree, rooted in DW_TAG_subprogram.

Rooted at the top level (well, below the DW_TAG_compile_unit) DIE, as
you mention later - namespace, or whatever else.

> References to types are either
> to type units as normal, or to types in the top fragment. Note that
> these references do not require relocations; type units are by signature
> as always, and for types in the top fragment, the offsets into the top
> fragment are known at compile time.
>
> Inlined functions are described as part of the function they have been
> inlined into, being children of the function DIE. DW_AT_specification
> refers to the abstract declaration which is in its own fragment (or the
> top fragment, but that keeps the declaration from being elided if all
> references go away).

Yep, this overlaps with the call_site stuff I mentioned earlier - same
ideas. Either top fragment, or its own fragment. Keeping its own
fragment alive, and figuring out how to reference it (depending on
fragment layout/elision) would require some work, but I think it's
do-able. Might even be do-able so it can be deduplicated across CUs
(use a sec_offset form, use a linker-resolved relocation to it) - this
infrastructure would overlap with type deduplication without type
units too.

Though linker resolved relocations add more bytes...

> If functions are inside namespaces, each function fragment will need
> to have namespace DIEs around the function DIE. This adds overhead
> but it's pretty small.
>
> I hand-wave filling in the CU header's unit length. I'd expect a
> relocation with a reference to the bottom fragment should be able to
> compute the correct value.

*nod*

> That's the story for .debug_info; what about other sections?
>
> Sections referenced by index from .debug_info can't be fragmented;
> this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.
>
> .debug_str doesn't need to be fragmented, linkers DTRT already.

(linkers deduplicate debug_str - but can they be made to remove
unreferenced strings too? in that cas ewe'd have an interesting
tradeoff of maybe using FORM_strp rather than strx - if we wanted the
linker to be able to drop strings from dropped function definitions,
etc)

> .debug_macro contents are not tied to functions and won't be fragmented.
>
> .debug_loclists and .debug_rnglists should be fragmentable the same
> way as .debug_info; they exist only as extensions of .debug_info, and
> the range list for the CU itself is merely a concatenated set of
> contributions from each constituent function, so that should Just Work
> (although it won't be optimal, adjacent ranges won't be coalesced).

At least the way we currently emit loclists and rnglists is by using
an index (the header of loclists and rnglists has an index to offset
mapping) - like strx, this would make it hard/impossible for a
DWARF-agnostic linker to see through to find out which indexes were
actually used. We could potentially not use the loclistx/rnglistx
forms/indexes from fragments - instead using sec_offsets that would
make them relocatable/removable/etc. (so long as all the index-based
referenced lists came in the debug_loclist/debug_rnglist header
fragment)

> I believe the same is true for .debug_loc and .debug_ranges, although
> I haven't checked.

Yep, those ones are easier - there's no contribution header, they can
only be referenced via sec_offset, so slicing and dicing them is
cheap.

But the tombstone problem still exists for the CU's debug_ranges -
though /maybe/ it could be carefully constructed from fragments...
that's going to be a /lot/ of sections in the end though.

> .debug_aranges is functionally equivalent to the CU rangelist.

Yup. (as we've touched on before, we don't use aranges at Google -
instead relying on CU's ranges which are just a little more expensive
to retrieve - but no need to duplicate the data in both places - if
consumers really find the aranges worthwhile to avoid parsing a few
attributes on the CU DIE, perhaps a future spec could let
debug_aranges reference a range list? so that aranges and the CU could
share the same data?)

> .debug_line can work the same way as .debug_info but is worth a word.
> The top fragment has the header, including the directory/file lists
> because those are referenced by index. DW_LNE_define_file can't be
> used. Each function has a fragment containing the sequence for that
> function, starting with set_address and ending with end_sequence.
> The bottom fragment is empty, existing only to allow the length to
> be computed.

Yep - can't remove dead file and directory names, unfortunately - and
the line table's pretty compact, so not sure it'd be a great savings
(especially compared to the ELF section overhead - at the object file
size at least (though probably a small win for linked executable
size)). Chances are those strings (now in debug_line_str) would be
used /somewhere/ in the program, so linker string deduplication would
get most of the wins - just dead offset entries in the line table
header.

> .debug_line_str is a string section and requires nothing special.
>
> .debug_names ... haven't looked at it but I suspect either it doesn't
> survive or it has to be generated post-link (or by the linker).

Generally you're going to want a DWARF-aware linker for debug_names,
same as gdb-index, etc.

> .debug_frame I *think* can be fragmented, but I haven't take the
> time to look at it to make sure.
>
> Those are all the sections I see in DWARF v5 Appendix B.
>
> So that's the blue-sky vision of linker-magic COMDAT DWARF, which
> took me about an hour to write down just now. There is certainly
> a non-trivial overhead in terms of ELF sections; in the general
> case we would have 5 per-function fragments (for .debug_info,
> .debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).
>
> Not small, but then other features in the works are using huge
> quantities of ELF sections too (section-per-basic-block).

That work's being scoped to be fairly selective about which basic
blocks it puts in unique sections - just those that are especially
performance sensitive, so the cost isn't as high as you might
otherwise imagine. Adding 5 new sections per function would be
probably a significantly larger growth than anything else I'm aware
of, but I haven't run the numbers by any means.

Thanks again for the write up!

- Dave

Fangrui Song via llvm-dev

unread,
Jun 4, 2020, 5:31:16 PM6/4/20
to jh737...@my.bristol.ac.uk, llvm...@lists.llvm.org

Your proposed option --dead-reloc-addend=.debug_info=0xffffffffffffffff
seems like a good idea. (I'd expect it to support signed -1 and -2 for
convenience & consistency in some other places (we sometimes use addends
as signed values)).

LLD only supports absolute relocation types (plus R_PPC64_DTPREL64 which
can go to .debug_addr, plus R_RISCV_{ADD,SUB}*).

The computed value is S + A.
We still consider the symbolic value S as zero, but override A with the
supplied option --dead-reloc-addend=.debug_info=-1
I particularly like that `addend` is part of the option name.

My mere complaint is that the relocation record is not dead, but rather
its referenced symbol is dead. However, I can't think of a better
name...

Checked with Martin Storsjö, this option may be useful for other binary
formats supporting DWARF. (binutils does not like ELF-specific options
not called -z foobar).
I think it is fine to add this option to LLD if GNU ld is also happy
with the name. I'll check with them.

"There is a danger that one community won't accept an extension that
they haven't been involved in the design process for." :) (Coutesy of Peter)

The built-in rules of the linker are the following:

--dead-reloc-addend=.debug_loc=-2
--dead-reloc-addend=.debug_ranges=-2
--dead-reloc-addend=.debug_*=-1

They can be overridden.

David Blaikie via llvm-dev

unread,
Jun 4, 2020, 5:41:21 PM6/4/20
to Fangrui Song, llvm...@lists.llvm.org
FWIW, I think it's probably best to at least initially frame the
discussion around non-configurable value for the sake of reducing the
scope/possible surface area of the feature/users/etc. I'd probably
only encourage adding the user-configurable flag if/when someone has a
use case for it.

Robinson, Paul via llvm-dev

unread,
Jun 4, 2020, 6:11:56 PM6/4/20
to David Blaikie, bd1976 llvm, llvm...@lists.llvm.org
+ Ben Dunbobbin, whose name I take in vain below.
He's my local expert on weird ELF features.
But, upthread we had a tombstone discussion IIRC, which seemed to converge
on "-1 except .debug_loc/.debug_ranges use -2" didn't it? If we're still
going on about having the linker rewriting DWARF, then the fragmenting
idea is worth pursuing as an alternative to Alexey's current work.

>
> That said, I super appreciate the time you've put into writing this up
> and it is valuable & I'd love to see some (even hand-crafted assembly)
> prototypes, maybe do some back-of-the-envelope numbers to see whether
> the ELF header overhead would be worth it, etc.

It would be nice to verify that the section-fragment idea would produce
something that looked usable. Hand-written assembly... would require
research into how to specify the right section attributes, but would
likely be less effort than trying to make LLVM do something plausible.

I'll see about creating an internal task for this.

>
> > > But even then, I'm not sure how viable it would be - as Fangrui
> > > pointed out on another thread about this: ELF section overhead itself
> > > is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> > > rather difficult to reconstruct header-less slice-and-dicable sections
> > > in some cases. For type information (a reduced overhead version of
> > > -fdebug-types-section) I could see it - but for functions, they need
> > > to refer to addresses - preferably in the debug_addr section, and
> > > that's accessed by index, so taking chunks out of it would break other
> > > references to it, etc... adding the header would be expensive, and how
> > > would the CU construct its DW_AT_ranges value if that has to be sliced
> > > and diced? Again, some amount of linker magic might solve some of
> > > these problems - but I think there's still a lot of overhead to making
> > > a solution that's workable with a DWARF-agnostic linker (or even with
> > > a DWARF aware one, but in an efficient amount of time/space where it's
> > > not only usable for small programs, or for linking when you're
> > > shipping a final production binary, etc)
> >
> > The idea we have blue-skied internally would work something like this
> > (initially explicated in terms of the .debug_info section, then seeing
> > how that tactic applies to other sections):
> >
> > There's a top fragment, containing the CU header and the CU DIE itself.
> > Linker magic makes this first in the output file.
>
> Quick curiosity: Is there existing linker magic for this? What does it
> look like? I'd love to know so I can play around with hand crafted
> prototypes/keep it in mind for such things.

Ben Dunbobbin did research into this some time ago, under the auspices
of a "COMDAT DWARF" investigation. He's part of Sony's linker team, and
it was a discussion with that team where I became convinced that the
fragmenting idea was feasible using existing defined ELF capabilities,
although perhaps in ways nobody had really taken advantage of. It
involved section groups and/or section ordering, but somebody much more
familiar with ELF than I am would have to explain it. I've cc'd Ben.

Regarding my discussion with our linker team:
They asked me whether it was feasible to use sections to subset the
DWARF, and I described the functional need (top & bottom fragments,
arbitrary stuff in between) and they thought the ELF section-group
and/or section-ordering features would be able to provide that.

I'm not aware that anyone actually tried prototyping that. The work
that James did (mentioned upthread) IIRC was using COMDAT and full
units with unit headers. My fading memory suggests the discussion
described just above was after that.

>
> (basically the ability for an object file to say "here's the start and
> end of my contribution to this section, and some bits that /can/ go in
> the middle, but you can drop them if you like")
>
> > Types also go here; certainly base types, and other file-scope types
> > can be included here or put into type units. (Type units aren't
> > fragmented, they are their own thing same as always.)
>
> Separately, it might be worth considering putting types in such a
> thing - but, yes, the "How do you reference them when they might be in
> your unit or someone else's unit", etc, would have to be figured out.
> I guess using an external symbol might be the solution there - again,
> with a better understanding of the ^ mentioned linker magic, I'd
> probably play around with hand crafting some examples just to see how
> this could work.
>
> > There's a matching bottom fragment, which is just the terminating NULL
> > for the CU DIE; linker magic makes this last in the output file.
>
> Last of all the contributions from this object file, not last in the
> whole output file, right? (please excuse the pedantry, just double
> checking)

The object file would (loosely speaking) have a ".debug_info.first",
some number of ".debug_info.excludable-middle", and a ".debug_info.last"
which would all be glommed together in first-middle-last order in the
output .debug_info section. I believe I was told that this would be
per-object-file, otherwise yeah it wouldn't work at all.

This is why we need input from somebody who actually knows ELF. 😊

>
> > Each function has its own fragment, which is in the same link-group
> > (COMDAT or whatever) as the function's .text section; that way, if the
> > function is discarded, so is the .debug_info fragment. Offhand I can't
> > think of any cases (other than DW_AT_specification, addressed below) of
> > references to a subprogram DIE from elsewhere,
>
> The call_site DWARF would want to refer to a subprogram DIE, but that
> could be handled by (first pass) having a declaration subprogram in
> the initial fragment that the call_site could refer to using the usual
> assembler-resolved CU-relative offset. Of course that'd mean a bunch
> of (probably the bigger part) of the function's DWARF footprint
> wouldn't be deduplicated, but would address this part of the address
> tombstone issue (if not using debug_addr) & reduce some of the DWARF -
> the addresses are pretty big (if you're not pooling them), etc.

Ah, forgot about call_site. Yeah referring to a declaration should work.

>
> > so it should be fine to
> > discard the entire function fragment as needed. Linker magic puts all
> > function fragments between the top and bottom fragments, in some
> > indeterminate order. Each function fragment is the usual complete
> > subtree, rooted in DW_TAG_subprogram.
>
> Rooted at the top level (well, below the DW_TAG_compile_unit) DIE, as
> you mention later - namespace, or whatever else.

Right, each fragment would be a complete subtree that would ordinarily
be a direct child of DW_TAG_compile_unit. With whatever DIE it needed.
Future refinements are quite possible!

>
> > .debug_macro contents are not tied to functions and won't be fragmented.
> >
> > .debug_loclists and .debug_rnglists should be fragmentable the same
> > way as .debug_info; they exist only as extensions of .debug_info, and
> > the range list for the CU itself is merely a concatenated set of
> > contributions from each constituent function, so that should Just Work
> > (although it won't be optimal, adjacent ranges won't be coalesced).
>
> At least the way we currently emit loclists and rnglists is by using
> an index (the header of loclists and rnglists has an index to offset
> mapping) - like strx, this would make it hard/impossible for a
> DWARF-agnostic linker to see through to find out which indexes were
> actually used. We could potentially not use the loclistx/rnglistx
> forms/indexes from fragments - instead using sec_offsets that would
> make them relocatable/removable/etc. (so long as all the index-based
> referenced lists came in the debug_loclist/debug_rnglist header
> fragment)

Ah, I hadn't looked at how we do those lists. But sounds solvable.
Sony does squeeze out the sequences for dead functions; I think it's
not a huge win, in terms of total debug info size, but the .debug_line
section does not let you skip dead sequences; you still have to parse
the whole thing. Our debugger guys were pleased at not having to
spend time doing something that useless. (Yeah it does mean the
linker has to parse the whole .debug_line section; but our theory is
that you probably run the debugger more than you run the linker, and
in any case you do it interactively, so debugger load time is probably
more annoying than some fractional increase in build/link time.)

The dir/file tables can't be squeezed, but one expects it's not a
huge cost with .debug_line_str having lots of deduplication
opportunities.

>
> > .debug_line_str is a string section and requires nothing special.
> >
> > .debug_names ... haven't looked at it but I suspect either it doesn't
> > survive or it has to be generated post-link (or by the linker).
>
> Generally you're going to want a DWARF-aware linker for debug_names,
> same as gdb-index, etc.
>
> > .debug_frame I *think* can be fragmented, but I haven't take the
> > time to look at it to make sure.
> >
> > Those are all the sections I see in DWARF v5 Appendix B.
> >
> > So that's the blue-sky vision of linker-magic COMDAT DWARF, which
> > took me about an hour to write down just now. There is certainly
> > a non-trivial overhead in terms of ELF sections; in the general
> > case we would have 5 per-function fragments (for .debug_info,
> > .debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).
> >
> > Not small, but then other features in the works are using huge
> > quantities of ELF sections too (section-per-basic-block).
>
> That work's being scoped to be fairly selective about which basic
> blocks it puts in unique sections - just those that are especially
> performance sensitive, so the cost isn't as high as you might
> otherwise imagine. Adding 5 new sections per function would be
> probably a significantly larger growth than anything else I'm aware
> of, but I haven't run the numbers by any means.

Doing it for *every* function would be the worst case, for when
you're trying to squeeze everything (gc + icf). We could likely
get wins if we did it just for the functions that today end up in
a COMDAT section (inline functions, template instantiations) which
previous research has found to be pretty significant (and major
motivation for the Program Repository work that we've previously
described at a Dev Meeting, https://llvm.org/devmtg/2016-11/#talk22)

>
> Thanks again for the write up!

NP, it was fun to trot out this stuff.
--paulr

Fangrui Song via llvm-dev

unread,
Jun 4, 2020, 8:12:22 PM6/4/20
to Robinson, Paul, llvm...@lists.llvm.org
On 2020-06-04, Robinson, Paul via llvm-dev wrote:
>+ Ben Dunbobbin, whose name I take in vain below.
>He's my local expert on weird ELF features.

Hey, I have read
https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
"monolithic input section handling" from Ben:)

+1 for "-1 except .debug_loc/.debug_ranges use -2"

>>
>> That said, I super appreciate the time you've put into writing this up
>> and it is valuable & I'd love to see some (even hand-crafted assembly)
>> prototypes, maybe do some back-of-the-envelope numbers to see whether
>> the ELF header overhead would be worth it, etc.
>
>It would be nice to verify that the section-fragment idea would produce
>something that looked usable. Hand-written assembly... would require
>research into how to specify the right section attributes, but would
>likely be less effort than trying to make LLVM do something plausible.
>
>I'll see about creating an internal task for this.

According to Peter Smith, Arm Compiler 5 splits up DWARF v3 debugging
information and puts these sections into comdat groups:

"This approach did produce significantly more debug information than gcc
did. For small microcontroller projects this wasn't a problem. For
larger feature phone problems we had to put a lot of work into keeping
the linker's memory usage down as many of our customers at the time were
using 32-bit Windows machines with a default maximum virtual memory of 2Gb."

I'd also love to see some examples (even hand-crafted assembly).

We probably have to reuse the ".debug_info" string (in assembly this requires
unique linkage, which has been implemented in LLVM for a while but relatively
new in binutils (future 2.35)) which is already an entry in .strtab, otherwise
the string itself can cost quite a lot.

(Mostly https://sourceware.org/pipermail/binutils/2020-May/111361.html )

James Henderson via llvm-dev

unread,
Jun 5, 2020, 6:02:27 AM6/5/20
to David Blaikie, llvm...@lists.llvm.org
On Thu, 4 Jun 2020 at 19:43, David Blaikie <dbla...@gmail.com> wrote:

That said, I super appreciate the time you've put into writing this up
and it is valuable & I'd love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.
Given my past experience in this area, I'm happy to help out with this, once I've got my current plans for the debug line work largely concluded, so from sometime next week probably. My thinking is to simulate the DWARF usage with blank sections of the appropriate size and with the right number of relocations, instead of the real contents, to give us a rough idea of the link-time performance hit of fragmenting using the current LLD.
 
James

Alexey Lapshin via llvm-dev

unread,
Jun 5, 2020, 7:32:35 AM6/5/20
to Fangrui Song, David Blaikie, llvm...@lists.llvm.org

>FWIW, I think it's probably best to at least initially frame the
>discussion around non-configurable value for the sake of reducing the
>scope/possible surface area of the feature/users/etc. I'd probably
>only encourage adding the user-configurable flag if/when someone has a
>use case for it.

I second that: "it's probably best to at least initially frame the


discussion around non-configurable value for the sake of reducing the
scope/possible surface area of the feature/users/etc".

The necessity of using some different concrete value most probably
would arise if there is a tool which uses this another value.
Until there is a known use case, it would be better to use just:

--dead-reloc-addend

Thank you, Alexey.

Alexey Lapshin via llvm-dev

unread,
Jun 5, 2020, 8:04:05 AM6/5/20
to Fangrui Song, David Blaikie, Alexey Lapshin, llvm...@lists.llvm.org
small typo in previous letter:

>>FWIW, I think it's probably best to at least initially frame the
>>discussion around non-configurable value for the sake of reducing the
>>scope/possible surface area of the feature/users/etc. I'd probably
>>only encourage adding the user-configurable flag if/when someone has a
>>use case for it.

>I second that: "it's probably best to at least initially frame the
>discussion around non-configurable value for the sake of reducing the
>scope/possible surface area of the feature/users/etc".

>The necessity of using some different concrete value most probably
>would arise if there is a tool which uses this another value.
>Until there is a known use case, it would be better to use just:

>--dead-reloc-addend

--dead-reloc-addend=<value>

Alexey Lapshin via llvm-dev

unread,
Jun 5, 2020, 4:55:39 PM6/5/20
to David Blaikie, llvm...@lists.llvm.org
>>
>> >DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common,
>> >certainly not when talking about function code. The overhead of a unit occurred only once per
>> >translation unit, so that expense was reasonably amortized.
>> >
>> >Splitting functions into their own object-file sections and making them excludable is an evolution of
>> >compiler/linker technology that DWARF has not kept up with. The linker-friendly solutions (COMDAT
>> >DWARF) would put function-related .debug_* contributions into a section-group along with the function
>> >.text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the
>> > content of each per-function DWARF section. The fully DWARF-conformant solution would create one
>> > partial_unit per function, with the corresponding overhead of unit headers (especially painful in the
>>
> > .debug_line section). Alternatively we fragment DWARF into sections without headers and rely on the
>> > linker to make everything look right in the linked executable; this produces .o files that are not DWARF
>> >conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers
>> >other than the linker.
>> >Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.
>>
>Alexey> Probably we could try to make DWARF easy to parsing, trimming, rewriting so that full DWARF
>Alexey> parsing solution would not take too much time?
>Alexey>
>Alexey> f.e. -debug-types-section solution uses COMDAT sections to split and deduplicate types.
>Alexey> That solution works quite fast. It has already mentioned drawback with a big size
>Alexey> overhead(because of section headers/type unit headers sizes). But, the fact that type units
>Alexey> could be identified just by hash-id(without parsing type names and types hierarchies)
>Alexey> allows the linker to reject duplications quickly. Another thing is that the linker drops
>Alexey> duplicated COMDAT sections without any additional check. After duplications are deleted,
>Alexey> the debug info is still consistent.
>Alexey> There could be done DWARF aware solution working using the same two principles:
>Alexey> 1. compare types by hash-id.
>Alexey> 2. drop duplications without analyzing contents.
>Alexey>
>Alexey> If all types are put into a separate type table and have hash-id, then it would be much easier to
>Alexey> deduplicate them. The idea demonstrated here - https://reviews.llvm.org/P8164. (It still has a
>Alexey> questions: whether base types should be put into type table, whether references into type table
>Alexey> should be done by DW_AT_signature or just by offset, etc.. ) While handling that separate type table
>Alexey> the DWARF aware linker would check the only hash_id and put only one type description
>Alexey> with the same id in the final type table. It also would allow us to solve that -flto=thin problem -
>Alexey> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil example there).
>Alexey> i.e., the case when type definition would be removed will not occur.

David>I think there is scope for lower-overhead type deduplication,
David>especially now with type units being merged into the debug_info
David>section. Perhaps we could drop dwo_ids and use section references to
David>refer to types & rely on the linker to keep those referenced sections
David>alive - though section references are longer than CU-relative
David>references. (but we need the extra length - because if the linker
David>deduplicates a type definition - one CU may be referencing a type very
David>far away, so the shorter reference might be inadequate) I don't think
David>the indirection through the type hash is /super/ significant to the
David>cost - I think it's more in the duplication of many DIEs especially
David>for function definitions (since the type unit sig8 system only
David>provides a way to reference the type - not its member functions, their
David>parameters, etc - so all those DIEs get duplicated in any CU that
David>needs to provide a definition of a member function). We could
David>prototype cross-unit DIE references to lower the cost of that
David>duplication, though rumor has it that constructor based type homing
David>might provide enough value to obviate the need for type units (or at
David>least make the overhead not worthwhile - so revisiting the overhead to
David>reduce it might make it worthwhile again... ).

David>Probably wouldn't be super hard to use LLVM's existing cross-unit DIE
David>Referencing machinery (implemented for LTO) to refer directly to DIEs
David>in a type unit without using the signature system... - hmm, that'd
David>only work if your type unit DIEs were identical? /maybe/ ? Not sure
David>how that'd work if you wanted to refer into a type unit, but the type
David>unit got deduplicated. Might be able to rely on the linker to preserve
David>every unique copy of the type unit that's referenced if we phrase
David>things carefully - so if your compiler does produce exactly identical
David>type units they get deduplicated and sec_refs refer to the uniquely
David>preserved copy - but otherwise it preserves as many distinct copies as
David>needed. (I don't know enough about how that works to be sure - but I
David>know that these linkonce/inline function deduplication does seem to
David>cause the DWARF to refer to the singular function if that function is
David>identical, and if it isn't, then you get 0 - so there's /something/ in
David>the linker that can adjust for deduplicating identical duplicates... )

Probably I was a bit unclear: the above idea is not for types
(placed in COMDAT sections) deduplicated by the linker.
This idea goes in another direction than fragmenting dwarf
using elf sections&tricks. It seems to me that the cost of fragmenting is too high.
It is not only the sizes of structures describing fragments but also the complexity
of tools that should be taught to work with fragmented DWARF.
(f.e. llvm-dwarfdump applied to object file should be able to read fragmented DWARF,
but applied to linked executable it should work with non-fragmented DWARF).
That idea is for the tool which works the same way as dsymutil ODR.

I will shortly describe the idea of making DWARF be easier processed by dsymutil/DWARFLinker:

The idea is to have only one "type table" per object file(special section .debug_types_table).
This "type table" would contain all types.
There could be a special type of reference - type_offset - that offset points into the type table.
Basic types could always be placed into the start of "type table" thus, offsets to basic types
most often would be 1 byte. There also would be a special kind of reference - reference inside the type.
Type units sig8 system - would not be used to reference types.

Types deduplication is assumed to be done, not by linker mechanism for COMDAT,
but by a tool like dsymutil. This tool would create resulting .debug_types_table by putting there
types from source .debug_types_table-s. Only one copy of the type would be placed into the
resulting table. All references pointing to the deleted copy would be corrected to point
to the single copy inside "type table". (that is how dsymutil works currently)

sig8 hash-id would be used to compare types and to deduplicate them.
It would speed up the current dsymutil context analysis.
Types having the same hash-id could be deduplicated.
This would allow deduplicating a more number of types than current dsymutil.
Incomplete type definitions having a similar set of members are not deduplicated by dsymutil currently.
In this case they would have the same hash-id.

This "type table" would take less space than current "type units" and current ODR solution.

Above is just an idea on how to help DWARF-aware linker(based on idea removing obsolete debug info)
to work faster(if that is interesting).

Alexey.

James Henderson via llvm-dev

unread,
Jun 7, 2020, 8:08:09 PM6/7/20
to Fangrui Song, llvm...@lists.llvm.org
On Fri, 5 Jun 2020 at 01:12, Fangrui Song via llvm-dev <llvm...@lists.llvm.org> wrote:
On 2020-06-04, Robinson, Paul via llvm-dev wrote:
>+ Ben Dunbobbin, whose name I take in vain below.
>He's my local expert on weird ELF features. 

Hey, I have read
https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
"monolithic input section handling" from Ben:)
Just for full clarity - I'm one of Ben's team-mates on the linker and binutils team, so hopefully my ELF knowledge is also up to scratch! Ben and I have bounced a number of these ideas off of each other, so should have a roughly equivalent understanding of the topic. I believe he's got today and next week off, so I don't know if he'll answer anything on here for the next week or two.
I think it should be fairly straightforward for dumping tools that know about fragmented DWARF to just glue it all together before dumping. In an ideal world, it would be something in the section or DWARF header that told the tool that this needed to be done, although I'm not entirely sure what.
Also +1. I'm happy for this approach to go ahead for current DWARF versions, since we already actually do this in our downstream port.
I definitely looked at this myself at some point, and IIRC, the prototype performance figures I posted earlier actually had very minimal linker work required to get this to work, but I might be getting myself mixed up with a different experiment! Anyway, LLD does already have sufficient support to do most, maybe all of this, I believe. Linkers not using linker scripts automatically group sections with the same name into a single output section. Sections with the same name within the same object are grouped consecutively, in order according to their input order, so a series of <header>,<body>,<body>,<footer> sections within the same CU would end up as a single cohesive section, as long as they were all named the same thing (strictly speaking, I don't think anything in the ELF standard requires this, but every linker that I know of behaves this way). It's possible to do different things using linker scripts (e.g. grouping sections with different names into a single output section), but I don't think that's needed for this approach.

When it comes to making the discarding happen naturally, there are two approaches. One is to use COMDATs. The idea is that the header and footer sections would not be in a group, but the other fragments would be in the same group as their corresponding function section. The problem with this approach is that the function sections must be COMDATs themselves, so this wouldn't help with functions that are not in COMDATs for semantic reasons, even if they are in their own sections.

A second approach is similar to the first with the addition that functions don't have to be COMDATs. However, it would require linker and assembler changes are "non-COMDAT" groups. From the ELF spec, it's technically not a requirement that all section groups are COMDATs. Groups merely are kept or discarded all at once, whilst COMDAT groups are a special case that say there can be more than one group, of which only one is kept in the end. Last time I checked (it was a while ago), LLD didn't support section groups other than COMDAT groups, so this probably won't fly. Similarly, the assembler only provided syntax to support COMDAT groups, although I did experiment with a version that supported non-COMDAT groups too. I don't think I have any performance numbers or similar for the results though.

The third approach, which probably makes the second approach redundant, and maybe also the first, uses the ELF SHF_LINK_ORDER flag to achieve the goal of discarding debug information. The SHF_LINK_ORDER section flag causes a set of sections at link time to be concatenated in the same order as their linked-to section, and if the linked-to section is discarded, so is the referencing section. Thus, if there were text sections .text.1, text.2, etc in that order in the object, with corresponding debug data fragments associated with each, all called .debug_info (same applies for the other sections), the fragment for text.1 would appear first, then that for .text.2 etc. The problem is what to do with the header and footer fragments. In these cases, I believe they both end up at one end, which is obviously no good. I circumvented this in my prototype by using linker scripts, IIRC In the ELF spec, it doesn't say what to do with sections without the flag, if they end up in the same output section as those with the flag. In an ideal world, they'd be preserved in the output in the same relative order (i.e. a section before the ordered ones would appear first, and after the ordered ones last), but I don't know how viable that is in a general sense.

.stack_sizes is a section that already follows a combination of approaches 1 and 3 - 1 for .stack_sizes contributions related to COMDAT sections, and 3 for those that aren't COMDATs. However, that section doesn't have a header/footer need, so doesn't quite get us the whole way. Here's example assembly snippets for the first and third approaches using .stack_sizes, but the section name could be switched for .debug_info/.debug_line etc etc easily enough:

# Non-COMDAT pair. This .stack_sizes is linked via SHF_LINK_ORDER
.section .text.main,"ax",@progbits
.section .stack_sizes,"o",@progbits,.text.main,unique,0

# COMDAT pair. This .stack_sizes is linked via SHF_LINK_ORDER and a group. The SHF_LINK_ORDER ensures it is ordered the same as the non-COMDAT versions.
.section .text.bar,"axG",@progbits,bar,comdat
.section .stack_sizes,"Go",@progbits,bar,comdat,.text._Z3barILi42EEiv,unique,1

The "o" in the attributes indicates the SHF_LINK_ORDER flag, and the name before the "unique" bit is the associated section. The "<symbol name>,comdat" bit and "G" make it part of a COMDAT group with the specified symbol as the identifier for that group, whilst the "unique, <number>" simply is the way to make unique sections with the same name.

David Blaikie via llvm-dev

unread,
Jun 8, 2020, 8:25:54 PM6/8/20
to Robinson, Paul, llvm...@lists.llvm.org

Actually the thread Fangrui linked got me thinking & so I poked
around, according to <
https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-94076/index.html>
it actually doesn't require as much magic as I thought it might:

"In the absence of the sh_link ordering information, sections from a
single input file combined within one section of the output file will
be contiguous and have the same relative ordering as they did in the
input file. The contributions from multiple input files will appear in
link-line order."

OK, so at the basic level, we could have a debug_info starting
section, some number of comdat debug_info sections, and the debug_info
tail section (the null terminator).

Keeping types alive/cross-referencing them might be trickier (since
their liveness isn't just tied to an existing real comdat), but
do-able.

Yeah, might try and work that up one of these days...

Turns out I decided to do it now. Some of this relates to
threaded-later-than-this-email replies I've read by the time I'm
writing this email.

Yep - for DIEs that don't need to be referenced (such as subprogram
DIEs - assuming they aren't the target of a call_site - all global
variable DIEs (don't think there's any way to target them in LLVM's
current DWARF emission) using comdats and relying on the linker's
guarantee (which is at least documented for some Unix linkers: "In the
absence of the sh_link ordering information, sections from a single
input file combined within one section of the output file will be
contiguous and have the same relative ordering as they did in the
input file. The contributions from multiple input files will appear in
link-line order." -
https://docs.oracle.com/cd/E19683-01/816-1386/chapter6-94076/index.html
"When not otherwise constrained, sections should be emitted in input
order." - https://web.stanford.edu/~ouster/cgi-bin/cs140-spring18/pintos/specs/sysv-abi-update.html/ch4.sheader.html
)

So that works a treat, exactly as Paul suggested.

Also works to modify the debug_ranges/rnglists section keeping list
fragments in the same comdat group too - though the object size
overhead there is probably higher than worthwhile, given the size of
the contributions versus the size of ELF sections/groups/etc. If
someone really wants to trim their ranges/rnglists down for size - I
think a linker feature would be suitable there, because the section is
small/format is simpler (hmm, except addrx forms - those would require
parsing CU DIEs to get the addr_base to know which addresses were
referenced by the range list, etc... :/).

Doing this with types is a bit more difficult (yes, type units exist,
but I was wondering if we could avoid some of their overhead with
techniques like this) - I'm not sure how to make a hunk of .debug_info
that gets dropped if no symbol in it is referenced. (-gc-sections
didn't seem to activate on the hunk of .debug_info I tried... I guess
if that worked then all the debug_info would be dropped all the time -
so we'd need some way to specifically opt-in ). Attempts at using an
external symbol so type hunks could be deduplicated by comdat, with
cross-CU references resolved by symbol - but that doesn't seem to have
worked out (the final value used for the symbolic reference is not the
desired value... I'm probably holding it wrong).

David Blaikie via llvm-dev

unread,
Jun 8, 2020, 8:47:19 PM6/8/20
to Alexey Lapshin, llvm...@lists.llvm.org

Yeah, I was just wondering whether it could be useful for that too.

> This idea goes in another direction than fragmenting dwarf
> using elf sections&tricks. It seems to me that the cost of fragmenting is too high.

I tend to agree - but I'm sort of leaning towards trying to use object
features as much as possible, then implementing just enough custom
handling in the linker to recoup overhead, etc. (eg: add some kind of
small header/brief description that makes it easy for the linker to
slice-and-dice - but hopefully a domain-specific such header can be a
bit more compact than the fully general ELF form)

> It is not only the sizes of structures describing fragments but also the complexity
> of tools that should be taught to work with fragmented DWARF.
> (f.e. llvm-dwarfdump applied to object file should be able to read fragmented DWARF,
> but applied to linked executable it should work with non-fragmented DWARF).
> That idea is for the tool which works the same way as dsymutil ODR.
>
> I will shortly describe the idea of making DWARF be easier processed by dsymutil/DWARFLinker:
>
> The idea is to have only one "type table" per object file(special section .debug_types_table).
> This "type table" would contain all types.
> There could be a special type of reference - type_offset - that offset points into the type table.
> Basic types could always be placed into the start of "type table" thus, offsets to basic types
> most often would be 1 byte. There also would be a special kind of reference - reference inside the type.
> Type units sig8 system - would not be used to reference types.
>
> Types deduplication is assumed to be done, not by linker mechanism for COMDAT,
> but by a tool like dsymutil. This tool would create resulting .debug_types_table by putting there
> types from source .debug_types_table-s. Only one copy of the type would be placed into the
> resulting table. All references pointing to the deleted copy would be corrected to point
> to the single copy inside "type table". (that is how dsymutil works currently)

^ that's the step that's probably a bit expensive for a general-use
tool - it implies parsing all the DWARF to find those references and
rewrite them, I think. For a high-performance solution that could be
run by the linker I think it'd be necessary to have a solution that
doesn't involve parsing all the DIEs.

One way to do that would be to have a CU-local type indirection table.
DIEs reference local type numbers (like local address/string numbers -
addrx/strx/rnglistx) and that table contains either sig8 (no linker
fixups required) or the local type offsets you describe - the linker
would then only need to read this type number indirection table and
rewrite them to the final type numbers.

Fangrui Song via llvm-dev

unread,
Jun 9, 2020, 1:41:42 PM6/9/20
to David Blaikie, llvm...@lists.llvm.org

The official ELF specification (acknowledged by multiple parties,
Linux, *BSD, HP-UX, Solaris, haiku, etc) is
http://www.sco.com/developers/gabi/latest/contents.html We need to read
the Solaris Linker and Libraries Guide with a grain of salt.

The special section indexes SHN_BEFORE and SHN_AFTER are currently
Solaris specific and none of GNU ld, gold, LLD recognizes them.
(If we find needs, we can consider them)

The sh_link field of SHF_LINK_ORDER is currently used by !associated.
I need to read more what we can do with the field. https://reviews.llvm.org/D72904

Linkers do retain the input file order. AFAICT this is guaranteed by the
ELF specification. In practice many linkers do this. (LLD has an option
which changes this convention: --shuffle-sections=seed)

In general, --gc-sections retains a non-SHF_ALLOC section
if it is not associated to another SHF_ALLOC section (via SHF_LINK_ORDER,
SHT_RELA, or a section group).

If we want --gc-sections to be effectful on fragmented .debug_*, we'll
need to carefully construct section references (via relocations).

Lookup table style sections (.debug_addr .debug_str_offsets) will
definitely be difficult to merge. SHF_MERGE (constant merging) is an
optional ELF feature which most linkers implement, but it does not allow
section headers/footers (these DWARF v5 sections all have a header) or
varying entry sizes.

SHF_MERGE
The data in the section may be merged to eliminate duplication. Unless
the SHF_STRINGS flag is also set, the data elements in the section are
of a uniform size. The size of each element is specified in the section
header's sh_entsize field. If the SHF_STRINGS flag is also set, the data
elements consist of null-terminated character strings. The size of each
character is specified in the section header's sh_entsize field.

Since SHF_MERGE isn't usable, performing any constant merging requires
some DWARF awareness :/

Alexey Lapshin via llvm-dev

unread,
Jun 9, 2020, 5:31:25 PM6/9/20
to David Blaikie, llvm...@lists.llvm.org

I think this indeed should be implemented and evaluated.
So that various approaches could be compared.

>> It is not only the sizes of structures describing fragments but also the complexity
>> of tools that should be taught to work with fragmented DWARF.
>> (f.e. llvm-dwarfdump applied to object file should be able to read fragmented DWARF,
>> but applied to linked executable it should work with non-fragmented DWARF).
>> That idea is for the tool which works the same way as dsymutil ODR.
>>
>> I will shortly describe the idea of making DWARF be easier processed by dsymutil/DWARFLinker:
>>
>> The idea is to have only one "type table" per object file(special section .debug_types_table).
>> This "type table" would contain all types.
>> There could be a special type of reference - type_offset - that offset points into the type table.
>> Basic types could always be placed into the start of "type table" thus, offsets to basic types
>> most often would be 1 byte. There also would be a special kind of reference - reference inside the type.
>> Type units sig8 system - would not be used to reference types.
>>
>> Types deduplication is assumed to be done, not by linker mechanism for COMDAT,
>> but by a tool like dsymutil. This tool would create resulting .debug_types_table by putting there
>> types from source .debug_types_table-s. Only one copy of the type would be placed into the
>> resulting table. All references pointing to the deleted copy would be corrected to point
>> to the single copy inside "type table". (that is how dsymutil works currently)

>^ that's the step that's probably a bit expensive for a general-use
>tool - it implies parsing all the DWARF to find those references and
>rewrite them, I think. For a high-performance solution that could be
>run by the linker I think it'd be necessary to have a solution that
>doesn't involve parsing all the DIEs.

According to the current dsymutil processing,
exactly this process is not the most time-consuming.
That could be done relatively fast.

Anyway, I think the dsymutil approach is still valuable, and it
would be useful to optimize it.
Do you think it would be useful to make dsymutil/DWARFLinker truly multi-thread?
(To make dsymutil/DWARFLinker able to process each object file in a separate thread)

>One way to do that would be to have a CU-local type indirection table.
>DIEs reference local type numbers (like local address/string numbers -
>addrx/strx/rnglistx) and that table contains either sig8 (no linker
>fixups required) or the local type offsets you describe - the linker
>would then only need to read this type number indirection table and
>rewrite them to the final type numbers.

Yes, that could be additionally done if this process would be time-consuming.

David, thank you for all your comments and explanations. They are extremely helpful.

Thank you, Alexey.

David Blaikie via llvm-dev

unread,
Jun 15, 2020, 11:02:07 PM6/15/20
to Fangrui Song, llvm...@lists.llvm.org
Just a general point to this whole thread:

Linker-based DWARF redundancy/dead-DWARF elimination isn't really a
feature I/Google (in the parts of it I'm involved with) would use. We
use Split DWARF internally & mostly have issues with object size, not
so much with linked executable size - so on multiple fronts, this work
probably wouldn't be deployed in the parts of Google I work with.

& I'm not sure how much Alexey needs this - the original proposal to
remove dead DWARF was as a way to address the 0-as-a-valid-address
issue due to lack of a good tombstone. Now that we're moving forward
on the -2/-1 tombstone thing - I'm not sure if any of us (in this
community/thread) have a deeply pressing need to remove redundant/dead
DWARF any more than we did last week/month/etc.

That said, I do find it a fun/interesting topic & am enjoying playing
around with linker/object features & seeing what can be done here,
what the tradeoffs might be, etc - I just don't want to be misleading
in my level of investment here. Not sure about other folks if anyone
ends up fully prototyping this and the object size/linked executable
size tradeoffs are worthwhile, etc. It'd be interesting to see.

Thanks for the link!

> The special section indexes SHN_BEFORE and SHN_AFTER are currently
> Solaris specific and none of GNU ld, gold, LLD recognizes them.
> (If we find needs, we can consider them)
>
> The sh_link field of SHF_LINK_ORDER is currently used by !associated.
> I need to read more what we can do with the field. https://reviews.llvm.org/D72904
>
> Linkers do retain the input file order. AFAICT this is guaranteed by the
> ELF specification. In practice many linkers do this. (LLD has an option
> which changes this convention: --shuffle-sections=seed)

Great!

Ah, well, that presents a workaround: an empty .text comdat group to
associate with the DWARF type fragment...

> If we want --gc-sections to be effectful on fragmented .debug_*, we'll
> need to carefully construct section references (via relocations).

What sort of relocations did you have in mind? I have managed to use
the above (empty .text comdat to associate with DWARF descriptions of
types - then using a relocation to a symbol in the .debug_info type
fragment from the .debug_info function fragment (using a function's
parameter type as the test case here)) - this does the right thing
(dropping the type if all the function fragments that refer to it are
dropped by -gc-sections, for instance).

Looks something roughly like:

.section .text,"axG",@progbits,_Z2f13foo,comdat,unique,1
...
.section .text,"axG",@progbits,_Z2f23foo,comdat,unique,1
...
.section .text,"axG",@progbits,_Z3foo,comdat,unique,1
# empty .text comdat to make the .debug_info type comdat droppable

.section .debug_info,"",@progbits,unique,1
# DWARF header/CU DIE/etc


.section .debug_info,"G",@progbits,_Z3foo,comdat,unique,1
# label for type DIE - we'd need a more advanced mangling scheme for
# this to ensure it doesn't overlap with C++, etc
_Z3foo:
# type DIEs

.section .debug_info,"G",@progbits,_Z2f13foo,comdat,unique,1
# DWARF fragment for 'f1'
...
.long _Z3foo # DW_AT_type
...
# similar fragment for f2

.section .debug_info,"",@progbits,unique,2
# DWARF footer
.byte 0 # End Of Children Mark
.Ldebug_info_end0:

But I have some trouble making that _Z3foo relocation to do what I
want across multiple files. So if two files have code like the above -
the _Z3foo comdat is picked from one of them (if one of the ".long
_Z3foo" is retained from that file), and the ".long _Z3foo" references
are resolved correctly to refer to the offset in the .debug_info
_Z3foo comdat group. But the ".long _Z3foo" references from the other
input file are resolved to zero. Any chance of making that do what I
want?

(honestly, this is probably all too high object overhead for some
users - I mean, it doesn't apply to my/Google's use case at all, given
Split DWARF - but maybe other users would be happy with a
linker-agnostic small-linked-DWARF result even at the cost of
significant object size)

> Lookup table style sections (.debug_addr .debug_str_offsets) will
> definitely be difficult to merge. SHF_MERGE (constant merging) is an
> optional ELF feature which most linkers implement, but it does not allow
> section headers/footers (these DWARF v5 sections all have a header) or
> varying entry sizes.

Yeah, I think they're more or less a lost cause because of the
indexing in them. I don't have any great ideas for them. Generally you
want a DWARF-aware link of those sections to create a single more
efficient lookup table, so that can take into account dropped code,
etc, potentially.

- Dave

David Blaikie via llvm-dev

unread,
Jun 15, 2020, 11:04:36 PM6/15/20
to Alexey Lapshin, llvm...@lists.llvm.org

Fair enough - though I'd still imagine any solution that involves
parsing all the DIEs still wouldn't be fast enough (maybe an order of
magnitude faster than the current solution even - but that's stuill,
what, 6 or 7x slower than linking without the feature?) for most users
to consider it a good tradeoff.

> Anyway, I think the dsymutil approach is still valuable, and it
> would be useful to optimize it.
> Do you think it would be useful to make dsymutil/DWARFLinker truly multi-thread?
> (To make dsymutil/DWARFLinker able to process each object file in a separate thread)

Perhaps - that I'd probably leave up to the folks who are more
invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
integrated into llvm-dwp and then I'll be interested in getting as
much performance out of it as lld - so multithreading and things would
be on the books.

> >One way to do that would be to have a CU-local type indirection table.
> >DIEs reference local type numbers (like local address/string numbers -
> >addrx/strx/rnglistx) and that table contains either sig8 (no linker
> >fixups required) or the local type offsets you describe - the linker
> >would then only need to read this type number indirection table and
> >rewrite them to the final type numbers.
>
> Yes, that could be additionally done if this process would be time-consuming.
>
> David, thank you for all your comments and explanations. They are extremely helpful.

Sure thing - really appreciate your patience with all this - it's... a
lot of moving parts.

- Dave

Alexey Lapshin via llvm-dev

unread,
Jun 22, 2020, 2:30:40 PM6/22/20
to David Blaikie, llvm...@lists.llvm.org

>to consider it a good trade-off.

It seems to me that even the current 6x-7x slowdown could be useful.
Users who already use dsymutil or llvm-dwp(assuming DWARFLinker
would be taught to work with a split dwarf) tools spend this time and,
in some scenarios, waste disk space by inter-mediate files.
Thus if they would use this LLD feature in its current state
- they would still receive benefits.

Speaking of performance results - LLD is a multi-thread linker;
it handles sections in parallel. DWARFLinker generates DWARF using
AsmPrinter which is a stream - so it could make resulting DWARF only
continuously. It is not surprising that the parallel solution works faster.
Making DWARFLinker truly multi-threaded would probably allow us
to make slowdown to be at 2x-4x range.

>> Anyway, I think the dsymutil approach is still valuable, and it
>> would be useful to optimize it.
>> Do you think it would be useful to make dsymutil/DWARFLinker truly multi-thread?
>> (To make dsymutil/DWARFLinker able to process each object file in a separate thread)

>Perhaps - that I'd probably leave up to the folks who are more
>invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
>integrated into llvm-dwp and then I'll be interested in getting as
>much performance out of it as lld - so multithreading and things would
>be on the books.

I think improving dsymutil is a valuable thing.
Though there are several directions which might be considered
to make it more robust:

1. support of latest DWARF - DWARF5/DWARF64...
2. implement multi-threaded execution.
3. support of split DWARF.
4. implement dsymutil for non-darwin platform.

All of this is a massive piece of work.
Our original investment was to solve two problems:

1. Overlapped address ranges, which is currently close to being solved. Thank you for helping with that!

2. Size of debug info. That still becomes an issue, but we are unsure whether we are ready to
invest in solving all the above 1-4 problems and how much community interested in it.

Thank you, Alexey.

Fangrui Song via llvm-dev

unread,
Jun 22, 2020, 5:05:25 PM6/22/20
to Alexey Lapshin, llvm...@lists.llvm.org

If it is 6x-7x slowdown (which may be optimized to 2x-4x), I wonder
whether it is a good trade-off keeping it as an in-linker pass, or
rather we should just use another utility compressing the output separately.

If the slowdown is such a pain, I might not consider --gc-debuginfo a
readily usable feature like --gdb-index or future --debug-names (DWARF
v5 accelator table - I have a plan to add it but I am always distracted
by other priorities at hand). Considering that this breaks GNU linkers,
I will add the following lines to the build system

if LINKER_IS_LLD && ENABLE_GC_DEBUGINFO
add -Wl,--gc-debuginfo

I don't think this is more complex than:

if ENABLE_GC_DEBUGINFO
set linker to a wrapper which optimizes the output like dwz

We probably should add another -f option for specifying the linker path,
like -fld-path= https://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html

David Blaikie via llvm-dev

unread,
Jun 22, 2020, 9:19:45 PM6/22/20
to Alexey Lapshin, llvm...@lists.llvm.org

FWIW, dwp (llvm-dwp hasn't really been optimized compared to binutils
dwp) is designed to be very quick - by not needing to do a lot of
parsing/fixups. Which, yes, means larger output files than would be
possible with more parsing/etc. It also doesn't take any input from
the linker (so it can run in parallel with the linker) - so it can't
remove dead subprograms. Given Google's the major (perhaps only
significant?) user of Split DWARF - I can say that the needs don't
necessarily overlap well with something that would take significantly
longer to run or use significantly more memory. Faster/cheaper/with
somewhat bigger output files is probably the right tradeoff for
Google's use case, at least.

I imagine Apple's use for dsymutil is somewhat similar - it's not used
in the iterative development cycle, only in final releases - well,
maybe their situation is more "neutral" - not a major pain point in
any case I'd guess.

> Thus if they would use this LLD feature in its current state
> - they would still receive benefits.
>
> Speaking of performance results - LLD is a multi-thread linker;
> it handles sections in parallel. DWARFLinker generates DWARF using
> AsmPrinter which is a stream - so it could make resulting DWARF only
> continuously. It is not surprising that the parallel solution works faster.
> Making DWARFLinker truly multi-threaded would probably allow us
> to make slowdown to be at 2x-4x range.

*nod* that's still a really expensive link - but I understand that's a
suitable tradeoff for your users

> >> Anyway, I think the dsymutil approach is still valuable, and it
> >> would be useful to optimize it.
> >> Do you think it would be useful to make dsymutil/DWARFLinker truly multi-thread?
> >> (To make dsymutil/DWARFLinker able to process each object file in a separate thread)
>
> >Perhaps - that I'd probably leave up to the folks who are more
> >invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
> >integrated into llvm-dwp and then I'll be interested in getting as
> >much performance out of it as lld - so multithreading and things would
> >be on the books.
>
> I think improving dsymutil is a valuable thing.
> Though there are several directions which might be considered
> to make it more robust:
>
> 1. support of latest DWARF - DWARF5/DWARF64...

I expect/though some of the Apple folks had already worked on DWARF5 support?
DWARF64 - that's been around for a while, and just hasn't been needed
by LLVM users thus far, it seems (until recently - where some
developers have started working on that)

> 2. implement multi-threaded execution.
> 3. support of split DWARF.

Maybe, though I'm still not sure it'd be the right tradeoff -
especially if it involved having to wait to run the .dwo merger (call
it DWARF-aware dwp, or dsymutil with dwp support) until after the
linker ran.

> 4. implement dsymutil for non-darwin platform.

That's probably, essentially (3), more-or-less. Split DWARF is
somewhat of a formalization of Apple's/MachO DWARF distribution model
(leave DWARF it in files that aren't linked/use them from a debugger,
but also be able to merge them into some final file (dsym or dwp) for
archival purposes)

> All of this is a massive piece of work.
> Our original investment was to solve two problems:
>
> 1. Overlapped address ranges, which is currently close to being solved. Thank you for helping with that!

Yeah, again, sorry that's taken quite so long/somewhat circuitous route.

> 2. Size of debug info. That still becomes an issue, but we are unsure whether we are ready to
> invest in solving all the above 1-4 problems and how much community interested in it.

Fair, for sure - I don't think you'd need to sign up to solve all of
them (don't think they necessarily need solving). Potentially moving
the logic out into a separate tool as Fangrui's considering - a
post-link DWARF optimizer, rather than in-linker DWARF optimization.

I really don't want to give you the runaround like this - but multiple
times slower links is something that seems pretty problematic for most
users, to the point of weighing the maintainability of lld against the
convenience of having this functionality in-linker rather than in a
post-link optimizer.

(I know you've spoken a bit before about your users needs - but if
it's possible, could you explain (again :/) why they have such a
strong need for smaller DWARF? While DWARF size is an ongoing concern
for many users (Google certainly - hence the invention of Split DWARF,
use of type units and compressed DWARF, etc) - usually it's in rather
large programs, but it sounds like you're dealing with relatively
small ones (otherwise the increase in link time, I'd imagine, would be
prohibitive for your users?)? You mentioned that the usability cost of
Split DWARF for your users was too high (or high enough to justify
this alternative work of DWARF-aware linking)? That all seems a bit
surprising to me - though I understand the deployment issues of Split
DWARF do present some challenges to users in more heterogenous
environments than Google's... still, I'd have thought there was some
hope there)

Alexey Lapshin via llvm-dev

unread,
Jun 23, 2020, 5:19:58 PM6/23/20
to Fangrui Song, llvm...@lists.llvm.org

gc-debuginfo could be done not from linker but as a standalone
tool(like dsymutil,llvm-dwp), as you said. The reasons why it
was suggested to do from the linker:

1. Linker already has liveness information built and object files loaded.
Thus, it would be the fastest implementation if called from the linker.
Otherwise, there should be created and written address map while
linking, there should be generated inter-mediate debug-info files.
And then, the separate tool would read that map and load object files
or inter-mediate debug-info files again. So processing time
would become even longer.

2. Linker already processes debug info: error reporting, --gdb-index,
upcoming --debug-names. From the design point of view, it would be
good to have a separate module - DWARFLinker - which implements
all that functionality. So that there would not be additional separate
specific linker implementation of them. Instead, already existed
implementation would be called from the linker. i.e. Depending on the
tasks, the linker would call either DWARFLinker.generate-gdb-index(),
DWARFLinker.generate-debug-names(), DWARFLinker.gc-debuginfo().

The idea behind gc-debuginfo was not to slowdown the linking process for everybody.
But to allow generation optimized debug-info for those who need it.
That is the same idea as LTO. LTO slowdowns usual compilation significantly,
but it creates a highly optimized code.

Thank you, Alexey.

David Blaikie via llvm-dev

unread,
Jun 23, 2020, 5:47:13 PM6/23/20
to Alexey Lapshin, llvm...@lists.llvm.org

I think the suggestion would be to link as normal, then process to
optimize (like dwz - I believe it does something like this too). It
wouldn't need a map from the linker - it could use the existing
tombstone values in the linked DWARF to determine what to drop. Yes,
the processing time would be longer, for sure. (I think Fangrui's
suggestion there is "if you're willing to take a (let's say optimized)
2x or more increase to link time, maybe (let's say doing the
intermediate step adds some fairly significant chunk of overhead) 3x
wouldn't make too much of a difference?)

> 2. Linker already processes debug info: error reporting,

It's important that it's only consulted in the error path - if the
link is failing anyway, taking a little longer to fail isn't likely to
significantly hurt the user experience (compared to the time it takes
a human to read the message, process it/think about what it means and
come up with a theory about how to address it). Compared to getting in
the way of the automated path to a working executable.

> --gdb-index,
> upcoming --debug-names.

Importantly, these can be produced without parsing a lot of DWARF if
the input contains gnu_pubnames, or debug_names (yeah, currently
gdb_index still is quite costly (but I think more like 10s of percent
in link time, not multiple 100s of percent) in memory and link time
even when it's only parsing gnu_pubnames.

> From the design point of view, it would be
> good to have a separate module - DWARFLinker - which implements
> all that functionality. So that there would not be additional separate
> specific linker implementation of them. Instead, already existed
> implementation would be called from the linker. i.e. Depending on the
> tasks, the linker would call either DWARFLinker.generate-gdb-index(),
> DWARFLinker.generate-debug-names(), DWARFLinker.gc-debuginfo().
>
> The idea behind gc-debuginfo was not to slowdown the linking process for everybody.
> But to allow generation optimized debug-info for those who need it.
> That is the same idea as LTO. LTO slowdowns usual compilation significantly,
> but it creates a highly optimized code.

Yep - I think the main difference there is going to be the size of the
user base compared to the complexity (certainly LTO adds complexity to
the linker - though without much alternative (well, alternative would
be build systems being aware of this - sort of like Fangrui's
suggestion, LTO could be a separate tool that merges LLVM bitcode
files, then creates real object files that go to the actual native
linker) to gain the desired performance).

Fangrui: What's your assessment of the complexity of adding this
functionality to lld? Are you concerned it'll be an ongoing
maintenance burden on other work in lld? If not, I'd be inclined
to/lean towards accepting this & having some room for Alexey to
improve the performance for his own users needs (& hopefully that'll
improve DWARFLinker functionality/performance as well), see if anyone
else wants this link time tradeoff.

bd1976 llvm via llvm-dev

unread,
Jun 24, 2020, 11:16:14 AM6/24/20
to David Blaikie, paul.r...@sony.com, llvm...@lists.llvm.org
Thanks for copying me in Paul! Sorry, for the late reply.

I have had a personal interest in this subject for a long time and I have had discussions on linking DWARF with many of you in person at LLVM events. I don't have much to add to what's upthread and James Henderson has already answered the questions I was copied in for. However, I did want to make a general point about ELF that I thought might be helpful having read through the above:

It may be that changes to the ELF spec might be warranted in order to improve DWARF linking (on ELF). Everyone may already be aware of this but FYI: the ownership of ELF has been a problem for a while. Currently, the ELF spec is composed of the spec here: http://www.sco.com/developers/gabi/ plus the decisions made on the list: https://groups.google.com/forum/#!forum/generic-abi. Obviously, this is less than ideal. However, this is hopefully changing in the near future see: https://groups.google.com/forum/#!topic/generic-abi/9OO5vhxb00Y. This should mean that modifications to ELF are more viable in the future.

David Blaikie via llvm-dev

unread,
Jun 24, 2020, 11:25:47 AM6/24/20
to bd1976 llvm, llvm...@lists.llvm.org
On Wed, Jun 24, 2020 at 8:15 AM bd1976 llvm <bd197...@gmail.com> wrote:
>
> Thanks for copying me in Paul! Sorry, for the late reply.
>
> I have had a personal interest in this subject for a long time and I have had discussions on linking DWARF with many of you in person at LLVM events. I don't have much to add to what's upthread and James Henderson has already answered the questions I was copied in for. However, I did want to make a general point about ELF that I thought might be helpful having read through the above:
>
> It may be that changes to the ELF spec might be warranted in order to improve DWARF linking (on ELF). Everyone may already be aware of this but FYI: the ownership of ELF has been a problem for a while. Currently, the ELF spec is composed of the spec here: http://www.sco.com/developers/gabi/ plus the decisions made on the list: https://groups.google.com/forum/#!forum/generic-abi. Obviously, this is less than ideal. However, this is hopefully changing in the near future see: https://groups.google.com/forum/#!topic/generic-abi/9OO5vhxb00Y. This should mean that modifications to ELF are more viable in the future.

What sort of changes to the spec are you thinking of here? (the ones I
know of would maybe be something like MachO's subsections via symbols
(to reduce section overhead by just telling the linker it can slice up
sections at public symbol boundaries without the need for section
headers to describe it) & maybe some similar things for DWARF (the
slicable debug_info I was showing/prototyping earlier - would benefit
from some way to communicate the slice boundaries to the linker that
didn't have the overhead of ELF sections) - or maybe just an overhaul
of the section and relocation formats in general to make them more
compact)

Alexey Lapshin via llvm-dev

unread,
Jun 25, 2020, 5:23:49 PM6/25/20
to David Blaikie, llvm...@lists.llvm.org

I see. FWIW, Comparison splitdwarf+dwp and DWARFLinker from lld:

1. split-dwarf+llvm-dwp = linking time for clang 6 sec,
generating time for .dwp 53 sec, clang=997M clang.dwp=1.1G.
2. DWARFLinker from lld = linking time for clang 72 sec, clang=760M.


>> Thus if they would use this LLD feature in its current state
>> - they would still receive benefits.
>>
>> Speaking of performance results - LLD is a multi-thread linker;
>> it handles sections in parallel. DWARFLinker generates DWARF using
>> AsmPrinter which is a stream - so it could make resulting DWARF only
>> continuously. It is not surprising that the parallel solution works faster.
>> Making DWARFLinker truly multi-threaded would probably allow us
>> to make slowdown to be at 2x-4x range.
>
>*nod* that's still a really expensive link - but I understand that's a
>suitable tradeoff for your users
>

Btw, 2x or 7x is for pure linking time. Overall compilation slowdown
is not so significant. Building LLVM codebase has only 20% slowdown.

>> >> Anyway, I think the dsymutil approach is still valuable, and it
>> >> would be useful to optimize it.
>> >> Do you think it would be useful to make dsymutil/DWARFLinker truly multi-thread?
>> >> (To make dsymutil/DWARFLinker able to process each object file in a separate thread)
>>
>> >Perhaps - that I'd probably leave up to the folks who are more
>> >invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
>> >integrated into llvm-dwp and then I'll be interested in getting as
>> >much performance out of it as lld - so multithreading and things would
>> >be on the books.
>>
>> I think improving dsymutil is a valuable thing.
>> Though there are several directions which might be considered
>> to make it more robust:
>>
>> 1. support of latest DWARF - DWARF5/DWARF64...
>
>I expect/though some of the Apple folks had already worked on DWARF5 support?
>DWARF64 - that's been around for a while, and just hasn't been needed
>by LLVM users thus far, it seems (until recently - where some
>developers have started working on that)

There already implemented debug_names table, but debug_rnglists,
debug_loclists, type units - are not implemented yet. The thing which
should probably be changed is that dsymutil should not have its version
of code generating DWARF tables. It should call already existed
DWARF5/DWARF64 implementations. Then dsymutil would always
use last DWARF generators.

We have many large programs and keep Dayly/Nightly debug builds,
which takes a lot of disk space. Compilation time for these programs is big.
The scenario is "compile once".(not compile-debug-compile-debug).
So we think that solution(like dsymutil/DWARFLinker) would not slowdown
the compilation time of overall build significantly(see above numbers for
llvm codebase) and would allow us to reduce disk space required to keep
all of these builds.

>You mentioned that the usability cost of
>Split DWARF for your users was too high (or high enough to justify
>this alternative work of DWARF-aware linking)? That all seems a bit
>surprising to me - though I understand the deployment issues of Split
>DWARF do present some challenges to users in more heterogenous
>environments than Google's... still, I'd have thought there was some
>hope there)

Our tools does not support split dwarf yet. Though we plan to implement it.
When we would have support of split dwarf then it would be
convenient to have easy way to share built debug binaries. llvm-dwp is the
answer to this. DWARFLinker could probably be another answer.

David Blaikie via llvm-dev

unread,
Jun 25, 2020, 5:44:33 PM6/25/20
to Alexey Lapshin, llvm...@lists.llvm.org

FWIW, llvm-dwp is not very well optimized (which is to say: it is not
optimized), binutils dwp might be a better comparison (& even that
doesn't have the parallelism & some potential further memory savings
that lld has that we could take advantage of in a dwp-like tool)

What build mode was the clang binary built in? Optimized or unoptimized?

> 2. DWARFLinker from lld = linking time for clang 72 sec, clang=760M.

It does seem a tad strange that the clang binary would be smaller
non-split with DWARF linking than it was split. Though I could imagine
this might be possible in an optimized build (wehre debug_ranges
become quite relatively expensive in the .o file contribution with
Split DWARF)

Could you compare the section sizes between these two clang binaries, perhaps?

> >> Thus if they would use this LLD feature in its current state
> >> - they would still receive benefits.
> >>
> >> Speaking of performance results - LLD is a multi-thread linker;
> >> it handles sections in parallel. DWARFLinker generates DWARF using
> >> AsmPrinter which is a stream - so it could make resulting DWARF only
> >> continuously. It is not surprising that the parallel solution works faster.
> >> Making DWARFLinker truly multi-threaded would probably allow us
> >> to make slowdown to be at 2x-4x range.
> >
> >*nod* that's still a really expensive link - but I understand that's a
> >suitable tradeoff for your users
> >
>
> Btw, 2x or 7x is for pure linking time. Overall compilation slowdown
> is not so significant. Building LLVM codebase has only 20% slowdown.

Understood - that's still quite significant to most users, I'd imagine.

> >> >> Anyway, I think the dsymutil approach is still valuable, and it
> >> >> would be useful to optimize it.
> >> >> Do you think it would be useful to make dsymutil/DWARFLinker truly multi-thread?
> >> >> (To make dsymutil/DWARFLinker able to process each object file in a separate thread)
> >>
> >> >Perhaps - that I'd probably leave up to the folks who are more
> >> >invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
> >> >integrated into llvm-dwp and then I'll be interested in getting as
> >> >much performance out of it as lld - so multithreading and things would
> >> >be on the books.
> >>
> >> I think improving dsymutil is a valuable thing.
> >> Though there are several directions which might be considered
> >> to make it more robust:
> >>
> >> 1. support of latest DWARF - DWARF5/DWARF64...
> >
> >I expect/though some of the Apple folks had already worked on DWARF5 support?
> >DWARF64 - that's been around for a while, and just hasn't been needed
> >by LLVM users thus far, it seems (until recently - where some
> >developers have started working on that)
>
> There already implemented debug_names table, but debug_rnglists,
> debug_loclists, type units - are not implemented yet.

Superficially, type units wouldn't be on the list of features (like
DWARF64 - it's optional) I'd try to support in dsymutil - since their
size overhead is more justified for a DWARF-agnostic linker that's
using comdat groups. With a DWARF-aware linker I'd be specifically
hoping to avoid using type units to help

> The thing which
> should probably be changed is that dsymutil should not have its version
> of code generating DWARF tables. It should call already existed
> DWARF5/DWARF64 implementations. Then dsymutil would always
> use last DWARF generators.

Possibly - I don't know what the architectural tradeoffs for that look
like - I'd imagine DWARFLinker has sufficiently different
needs/tradeoffs than LLVM's DWARF generation code (rewriting existing
DIEs compared to building new ones from scratch, etc) that it might be
hard for them to share a lot of their implementation.

Ah, OK - for archival purposes. So the interactive developers wouldn't
necessarily be using this feature. Makes sense - similar to dsymutil
and dwp, mostly used for archival purposes & you can debug straight
from .o/.dwos for interactive/iterative development.

In that case, it seems more likely that a separate tool might suffice.

Also, out of curiosity - have you tried just compressing the output
(-gz (I think that does the right thing for the linker level
compression too, otherwise -Wl,-compress-debug-sections might do it))
or are you already doing that in addition?

> >You mentioned that the usability cost of
> >Split DWARF for your users was too high (or high enough to justify
> >this alternative work of DWARF-aware linking)? That all seems a bit
> >surprising to me - though I understand the deployment issues of Split
> >DWARF do present some challenges to users in more heterogenous
> >environments than Google's... still, I'd have thought there was some
> >hope there)
>
> Our tools does not support split dwarf yet. Though we plan to implement it.
> When we would have support of split dwarf then it would be
> convenient to have easy way to share built debug binaries. llvm-dwp is the
> answer to this. DWARFLinker could probably be another answer.

Ah, fair enough - thanks for the context!

bd1976 llvm via llvm-dev

unread,
Jun 25, 2020, 10:38:58 PM6/25/20
to David Blaikie, llvm...@lists.llvm.org
Your list covers most of what I have thought about.

I think an extension of the format which allows sections to be represented in a lightweight manner is something that ELF will *need* to adopt eventually to keep pace with the increasing numbers of sections being output by toolchains.

I have some experience with sub-sections via symbols. With sub-sections via symbols a way to tag on additional properties onto relocations would seem useful. For example, DWARF makes use of references that point "one byte past the end" of a region. When you're working with subsections-via-symbols it is useful to have such relocations tagged in some manner, otherwise they appear to be pointing at the next region, not the previous one.

My hope is that the community can take ownership of ELF and establish a process to improve ELF for all interested parties. I think it would encourage people to bring forward more proposals like Peter's: https://groups.google.com/d/topic/generic-abi/MPr8TVtnVn4/discussion if they know that if the proposal is accepted the official spec will be updated rather than it just remaining on the list. A good process should take into account the priorities of major vendors. If a major vendor is asking for changes because ELF is, for example, causing a performance problem, then that should be given due weight.

Alexey Lapshin via llvm-dev

unread,
Jun 26, 2020, 12:29:18 PM6/26/20
to David Blaikie, llvm...@lists.llvm.org

right, that is unoptimized build with -ffunction-sections.

>> 2. DWARFLinker from lld = linking time for clang 72 sec, clang=760M.

>It does seem a tad strange that the clang binary would be smaller
>non-split with DWARF linking than it was split. Though I could imagine
>this might be possible in an optimized build (wehre debug_ranges
>become quite relatively expensive in the .o file contribution with
>Split DWARF)

>Could you compare the section sizes between these two clang binaries, perhaps?

.debug_ranges is three times bigger and .debug_line is twice bigger.

>> >> Thus if they would use this LLD feature in its current state
>> >> - they would still receive benefits.
>> >>
>> >> Speaking of performance results - LLD is a multi-thread linker;
>> >> it handles sections in parallel. DWARFLinker generates DWARF using
>> >> AsmPrinter which is a stream - so it could make resulting DWARF only
>> >> continuously. It is not surprising that the parallel solution works faster.
>> >> Making DWARFLinker truly multi-threaded would probably allow us
>> >> to make slowdown to be at 2x-4x range.
>> >
>> >*nod* that's still a really expensive link - but I understand that's a
>> >suitable tradeoff for your users
>> >
>>
>> Btw, 2x or 7x is for pure linking time. Overall compilation slowdown
>> is not so significant. Building LLVM codebase has only 20% slowdown.
>
>Understood - that's still quite significant to most users, I'd imagine.

I see.

It is not easy, and would require some additions, but it would benefit
in that all format implementation is in one place. Thus changing that place
would reflect in other places. There are at least three implementations for
.debug_ranges, .debug_aranges currently...

agreed: if to continue the work on this then it makes sense to
do it as separate tool. Make it fast enough. And if there would be interest
in it - then it would probably be possible to return to idea calling it from linker.

>Also, out of curiosity - have you tried just compressing the output
>(-gz (I think that does the right thing for the linker level
>compression too, otherwise -Wl,-compress-debug-sections might do it))
>or are you already doing that in addition?

sure. we use -Wl,-compress-debug-sections.

Thank you, Alexey.

>> >You mentioned that the usability cost of
>> >Split DWARF for your users was too high (or high enough to justify
>> >this alternative work of DWARF-aware linking)? That all seems a bit
>> >surprising to me - though I understand the deployment issues of Split
>> >DWARF do present some challenges to users in more heterogenous
>> >environments than Google's... still, I'd have thought there was some
>> >hope there)
>>
>> Our tools does not support split dwarf yet. Though we plan to implement it.
>> When we would have support of split dwarf then it would be
>> convenient to have easy way to share built debug binaries. llvm-dwp is the
>> answer to this. DWARFLinker could probably be another answer.

>Ah, fair enough - thanks for the context!

> > >> >One way to do that would be to have a CU-local type indirection table.

David Blaikie via llvm-dev

unread,
Jul 28, 2020, 3:29:17 AM7/28/20
to Alexey Lapshin, llvm...@lists.llvm.org

And this is without Split DWARF? Without linker DWARF compression? -
that seems quite a bit surprising, that the deduplication of DWARF
could fit into less space than the wasted/reclaimed space in ranges (&
line)?

Could you double check these numbers & provide a clearer summary?

Here's my attempt at numbers (all with function-sections+gc-sections)...

Split DWARF tests didn't seem meaningful - gc-debuginfo + split DWARF
seemed to drop all the debug info (except gdb_index) so wasn't
working/comparison wasn't meaningful for Apples to Apples, but
included it for comparing gc'd non-split to non-gc'd split (disabled
gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames) (which turns
on by default with Split DWARF because gdb needs it - but a bit of an
unfair comparison without turning on gnu-pubnames/gdb-index in other
build modes too, since it... /shouldn't/ be necessary) which might've
been a factor in the data you were looking at)

* -O0: (baseline, just using strip -g: 356 MB)
* compressed: 25% smaller with gc-debuginfo (481 MB / 641 MB) (407
MB split/non-gc)
* uncompressed: 30% smaller (820 MB / 1.2 GB) (566 MB split/non-gc)
* -O3: (baseline: 116 MB)
* compressed: 16% smaller (361 MB / 462 MB) (283 MB split/non-gc)
* uncompressed: 22% smaller (1022 MB / 1.2 GB) (156 MB split/non-gc)


On Fri, Jun 26, 2020 at 9:28 AM Alexey Lapshin

Alexey Lapshin via llvm-dev

unread,
Jul 28, 2020, 11:56:04 AM7/28/20
to David Blaikie, Alexey Lapshin, llvm...@lists.llvm.org

that was without split dwarf, without linker compression.

>
> Could you double check these numbers & provide a clearer summary?

sure, I would re-check it.

>
> Here's my attempt at numbers (all with function-sections+gc-sections)...
>
> Split DWARF tests didn't seem meaningful - gc-debuginfo + split DWARF
> seemed to drop all the debug info (except gdb_index) so wasn't
> working/comparison wasn't meaningful for Apples to Apples, but
> included it for comparing gc'd non-split to non-gc'd split (disabled
> gnu-pubnames/gdb-index (-gsplit-dwarf -gno-gnu-pubnames) (which turns
> on by default with Split DWARF because gdb needs it - but a bit of an
> unfair comparison without turning on gnu-pubnames/gdb-index in other
> build modes too, since it... /shouldn't/ be necessary) which might've
> been a factor in the data you were looking at)

that might be the case. i.e. clang=997M for split dwarf(from my previous
measurement) might include gnu-pubnames.

would recheck it and if that is the case then it is a unfair comparison.


My point was that "DWARFLinker from lld" takes less space than singleton
split dwarf file+.dwp file.

for -O0 uncompressed:

- .dwp took 1.1G(if I built it correctly), singleton clang(from your
measurements) 566 MB

   overall 1.6G.

- The "DWARFLinker from lld" 820 MB(from your measurements).


So "DWARFLinker from lld" looks two times better.


Anyway, thank you for pointing me to possible mistake. I would recheck
it and update results.


Alexey.

David Blaikie via llvm-dev

unread,
Jul 28, 2020, 12:29:08 PM7/28/20
to Alexey Lapshin, llvm...@lists.llvm.org, Alexey Lapshin

Oh, yeah, even if there are some measurement issues, linked executable
+ .dwp is going to be larger than a linked executable using non-split
DWARF (in v5), since v5 uses all the same representations as non-split
DWARF, and split DWARF adds the indirection overhead of a split file,
etc.

Even without DWARF linking, it's true that split DWARF has overhead
(dwp+executable will be larger than executable non-split).

But maybe we've ended up down a bit of a tangent in any case.

Trying to bring this back to "should this be committed to lld" seems
valuable, and I'm not sure what the right criteria are for that.

Ray's the best person to weigh in on that. My 2c is that I think it
probably is worthwhile, even just as an experiment, assuming it's not
too intrusive to lld.

Alexey Lapshin via llvm-dev

unread,
Jul 31, 2020, 7:02:08 AM7/31/20
to David Blaikie, llvm...@lists.llvm.org, Alexey Lapshin
I think it would be useful to do "removing obsolete debug info"
in the linker. First thing is that it would be the fastest way(no need
to copy data/create temp files/built address map...) Second thing
is that it would be a good separation of concepts. All debug info
processing, currently done in the linker(gdb_index, upcoming
debug_names), could be moved into separate library processing
debug info. When gdb_index/debug_names should be built without
"removing of obsolete debug info" it would have the same
performance results as it currently has.

We decided to give the idea of "removing of obsolete debug info"
another try and are going to implement it as a separate utility
working with built binary. Making it to be multi-thread would
probably show better performance results and then it could
probably be considered as acceptable to use from the linker.

Alexey.

Eric Christopher via llvm-dev

unread,
Jul 31, 2020, 4:17:17 PM7/31/20
to Alexey Lapshin, llvm...@lists.llvm.org, Alexey Lapshin
Hi Alexey,

I'm quite interested in this direction. One thought I had was to incorporate such a library into dsymutil but with support for ELF. If you get a proposal written up I'd love to take a look and comment.

Thanks!

-eric

Alexey Lapshin via llvm-dev

unread,
Aug 3, 2020, 11:35:21 AM8/3/20
to Eric Christopher, llvm...@lists.llvm.org, Alexey Lapshin

Hi Eric, please

yes, I would share the proposal in a separate thread within a week or two.

Shortly: we decided to move in slightly other direction than adding this functionality
into dsymutil. Though if there is a preference to implement it as part of dsymutil
we are OK to do this way.

In its first version, this new utility supposed to receive built binary with debug info
as input(with the new marking for references to removed code sections -1/-2
-https://reviews.llvm.org/D84825) and create a new binary with removed obsolete
debug info according to the above marking. In the next versions, it could be extended
with other debug info optimizations tasks. F.e. generation new index tables, debug info
optimizing... etc...

We considered three options:

1. add new functionality into dsymutil. So that dsymutil behaves differently
    on a non-darwin platform and supports another set of command-line options.

2. add new functionality into llvm-objcopy. llvm-objcopy already supports various
     binary objects formats(MachO,ELF,COFF,wasm). It also has several options
     to work with debug-info.

3. create new utility llvm-dwarfutil which would implement the above functionality
     and reuse DWARFLinker(extracted from dsymutil) library and new library
     ObjectCopy(extracted from llvm-objcopy).

So far our preference is number three. The reason for this is that separate
utility specifically working with debug info looks as good separation of concepts.
Adding another behavior to dsymutil looks not very good. Extending the already
rich interface of llvm-objcopy looks also not very good. Having in mind that actual
implementation would be shared by libraries, the separate utility, working specifically
with debug info, looks like the right choice. That is our current idea.

I would publish the proposal shortly to discuss it.


Thank you, Alexey.

Eric Christopher via llvm-dev

unread,
Aug 5, 2020, 1:02:53 PM8/5/20
to Alexey Lapshin, Jonas Devlieghere, Adrian Prantl, llvm...@lists.llvm.org, Alexey Lapshin
Hi Alexey,



Excellent, thanks :)
 
Shortly: we decided to move in slightly other direction than adding this functionality
into dsymutil. Though if there is a preference to implement it as part of dsymutil
we are OK to do this way.


I have a vague preference since a lot of functionality already exists there on one platform and extending that seems straight forward, however...
 
In its first version, this new utility supposed to receive built binary with debug info
as input(with the new marking for references to removed code sections -1/-2
-https://reviews.llvm.org/D84825) and create a new binary with removed obsolete
debug info according to the above marking. In the next versions, it could be extended
with other debug info optimizations tasks. F.e. generation new index tables, debug info
optimizing... etc...

We considered three options:

1. add new functionality into dsymutil. So that dsymutil behaves differently
    on a non-darwin platform and supports another set of command-line options.

2. add new functionality into llvm-objcopy. llvm-objcopy already supports various
     binary objects formats(MachO,ELF,COFF,wasm). It also has several options
     to work with debug-info.

3. create new utility llvm-dwarfutil which would implement the above functionality
     and reuse DWARFLinker(extracted from dsymutil) library and new library
     ObjectCopy(extracted from llvm-objcopy).

So far our preference is number three. The reason for this is that separate
utility specifically working with debug info looks as good separation of concepts.
Adding another behavior to dsymutil looks not very good. Extending the already
rich interface of llvm-objcopy looks also not very good. Having in mind that actual
implementation would be shared by libraries, the separate utility, working specifically
with debug info, looks like the right choice. That is our current idea.

I would publish the proposal shortly to discuss it.



These are solid arguments - in particular, I agree with not extending llvm-objcopy :)

+Jonas Devlieghere and +Adrian Prantl for dsymutil comments.

My personal thought would be that extending dsymutil should be ok as the functionality goes well with everything else dsymutil does (other than not support ELF which the dsymutil maintainers are on board with last I checked). That said, I definitely think a write-up will be helpful. No matter what I support extracting all of the behavior into libraries and using that somewhere :)

Thanks!

-eric

Jonas Devlieghere via llvm-dev

unread,
Aug 7, 2020, 11:36:26 AM8/7/20
to Eric Christopher, llvm...@lists.llvm.org, Alexey Lapshin
Hi Alexey,

I should've looked at this earlier. I went through the thread again and I've
made some comments, mostly from the dsymutil point of view.

> Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> generation, which does not support DWARF5(only debug_names table). At the
> same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
> which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer
> should be rewritten using DwarfDebug/DwarfFile. Though I am not sure
> whether it would be easy to re-use DwarfDebug/DwarfFile. It would probably
> be necessary to separate some intermediate level of DwarfDebug/DwarfFile.

These classes serve very different purposes. Last time I looked at them there
was very little overlap in functionality. In the compiler we're mostly
concerned with generating the DWARF, while in dsymutil we try to copy
everything we don't need to parse, and fix up what we have to. I don't want
to say it's not possible, but I think supporting DWARF5 in those classes is
going to be a lot less work than trying to reuse the CodeGen variants.

> Measurements show that it is spent ~10 sec in
> llvm::StringMapImpl::LookupBucketFor(). The problem is that the same
> strings, again and again, are added to the string pool. Two attributes
> having the same string value would be analyzed (hash calculated) and
> searched inside the string pool. Even if these strings are already in
> string table(DW_FORM_strp, DW_FORM_strx). The process could be optimized
> for string tables. So that if some string from the string table were
> accessed previously then, it would keep a reference into the string pool.
> This would eliminate a lot of string pool searches.

I'm not sure I fully understand the optimization, but I'd love to speed this
up, if only for dsymutil's sake. I'd love to talk about this in a separate
thread or offline.

> Currently, all object files are analyzed sequentially and cloned
> sequentially. Cloning is started in parallel with analyzing. That scheme
> could be changed: analyzing and cloning could be done in parallel for each
> object file. That requires refactoring of DWARFLinker and making string
> pools and DeclContextTree thread-safe.

I'm less familiar with the way that LLD uses the DWARFOptimizer but this is
not possible for dsymutil as it is trying to deduplicate DIEs from different
compile units.

> I think improving dsymutil is a valuable thing. Though there are several
> directions which might be considered to make it more robust:
>
> 1. support of latest DWARF - DWARF5/DWARF64...

Strong +1 on DWARF5. I haven't had the bandwidth yet to really look at this.
Right now we can't find (at least some) rellocations so we bail out. I'd need
to fix that to assess the current state of things and figure out how much
work would be needed.

I don't think anything in LLVM supports generating DWARF64 though.

> 2. implement multi-threaded execution.

See my earlier comment. At least for the dsymutil case, the current approach
is the best we can do, but I'd love to be proven wrong. :-)

> 3. support of split DWARF.
> 4. implement dsymutil for non-darwin platform.

These two seem to go together. Given the work you did to split off the DWARF
optimization part I think we're closer to this than ever. Thanks again for
doing that.

> We considered three options:
>
> 1. add new functionality into dsymutil. So that dsymutil behaves
> differently on a non-darwin platform and supports another set of
> command-line options.
>
> 2. add new functionality into llvm-objcopy. llvm-objcopy already supports
> various binary objects formats(MachO,ELF,COFF,wasm). It also has several
> options to work with debug-info.
>
> 3. create new utility llvm-dwarfutil which would implement the above
> functionality and reuse DWARFLinker(extracted from dsymutil) library and
> new library ObjectCopy(extracted from llvm-objcopy).
>
> So far our preference is number three. The reason for this is that separate
> utility specifically working with debug info looks as good separation of
> concepts. Adding another behavior to dsymutil looks not very good.

In its current state dsymutil itself is a pretty small tool on top of the
DWARFOptimizer/Linker. I'm curious what the benefits of another tool are
compared to a different frontend (like objcopy) for MachO and ELF. It seems
like that would allow for separation of concerns, while still being able to
share common code without having to push it all the way up into LLVM.

> Extending the already rich interface of llvm-objcopy looks also not very
> good. Having in mind that actual implementation would be shared by
> libraries, the separate utility, working specifically with debug info,
> looks like the right choice. That is our current idea.

> My personal thought would be that extending dsymutil should be ok as the
> functionality goes well with everything else dsymutil does (other than not
> support ELF which the dsymutil maintainers are on board with last I
> checked). That said, I definitely think a write-up will be helpful. No
> matter what I support extracting all of the behavior into libraries and
> using that somewhere :)

Ha, so basically what I was trying to say above.

I look forward to seeing the proposal!

Cheers,
Jonas

Alexey Lapshin via llvm-dev

unread,
Aug 10, 2020, 12:21:50 PM8/10/20
to Jonas Devlieghere, Eric Christopher, llvm...@lists.llvm.org, Alexey Lapshin

Hi Jonas,

Thank you for the comments, please find my answers below...

On 06.08.2020 20:39, Jonas Devlieghere wrote:
Hi Alexey,

I should've looked at this earlier. I went through the thread again and I've
made some comments, mostly from the dsymutil point of view.

> Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> generation, which does not support DWARF5(only debug_names table). At the
> same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
> which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer
> should be rewritten using DwarfDebug/DwarfFile. Though I am not sure
> whether it would be easy to re-use DwarfDebug/DwarfFile. It would probably
> be necessary to separate some intermediate level of DwarfDebug/DwarfFile.

These classes serve very different purposes. Last time I looked at them there
was very little overlap in functionality. In the compiler we're mostly
concerned with generating the DWARF, while in dsymutil we try to copy
everything we don't need to parse, and fix up what we have to. I don't want
to say it's not possible, but I think supporting DWARF5 in those classes is
going to be a lot less work than trying to reuse the CodeGen variants.
I agree, in it`s current state it would be less work to write separate implementation
than reusing CodeGen variants. The bad thing is that in such a case there is a lot of
code duplication:

DwarfStreamer::emitUnitRangesEntries
DwarfDebug::emitDebugARanges
EmitGenDwarfAranges
DWARFYAML::emitDebugAranges

Supporting new standard would require rewriting/modification of all these places. In the ideal world,
having single implementation for the DWARF generation allows changing one place and having
benefits in others. Probably, CodeGen classes could be rewritten and then it would be useful
to write them assuming two use cases - generation from the scratch and copying/updating
existing data. In the end, there would be single implementation which could be reused in
many places. Though, it is indeed a lot of work.



> Measurements show that it is spent ~10 sec in
> llvm::StringMapImpl::LookupBucketFor(). The problem is that the same
> strings, again and again, are added to the string pool. Two attributes
> having the same string value would be analyzed (hash calculated) and
> searched inside the string pool. Even if these strings are already in
> string table(DW_FORM_strp, DW_FORM_strx). The process could be optimized
> for string tables. So that if some string from the string table were
> accessed previously then, it would keep a reference into the string pool.
> This would eliminate a lot of string pool searches.

I'm not sure I fully understand the optimization, but I'd love to speed this
up, if only for dsymutil's sake. I'd love to talk about this in a separate
thread or offline.

The measurements show that quite a big time is taken
by llvm::StringMapImpl::LookupBucketFor(). i.e. searching inside a string
pool takes a significant amount of time. The idea of optimization was to
reduce the number of string pool searches by remembering previous
results. DW_FORM_strp, DW_FORM_strx forms do not keep string itself
but reference a string from a separate table by index. Currently. if there are
duplicated strings of DW_FORM_strp, DW_FORM_strx there would be
two/three/...(one per duplicate) searches in string pool
(llvm::StringMapImpl::LookupBucketFor() would be called). If the position
in the pool would be remembered for the index of the first duplicate
then there would not be necessary to call llvm::StringMapImpl::LookupBucketFor() next times.

But prototyping of that idea did not show any worthful performance improvement.

Some small performance improvement could be achieved if string pools would use
llvm::hash_value(StringRef S) instead of llvm::djbHash().


> Currently, all object files are analyzed sequentially and cloned
> sequentially. Cloning is started in parallel with analyzing. That scheme
> could be changed: analyzing and cloning could be done in parallel for each
> object file. That requires refactoring of DWARFLinker and making string
> pools and DeclContextTree thread-safe.

I'm less familiar with the way that LLD uses the DWARFOptimizer but this is
not possible for dsymutil as it is trying to deduplicate DIEs from different
compile units.
Right. dsymutil is trying to de-duplicate DIEs from different
compile units. That, probably, does not avoid multi-thread implementation:

1. DeclContextTree.getChildDeclContext() should be done thread safe.
    thus, even if CU would be processed in parallel - DIEs could be de-duplicated
    based on DeclContext.
2. UniquingStringPool and OffsetsStringPool should also be done thread safe.
3. Since compilation units would be processed in parallel -
    the size of the compilation unit would not be known until it is fully processed.
    That means that all compilation unit's references should be patched after
    CU content is generated. In the same manner like forward references
    are currently patched(fixupForwardReferences).
4. DWARFStreamer provides a sequential interface. Instead of a single stream
    as the output, there could be generated several outputs for each CU.
    They would be glued together in the end.
my concern is that this tool would have different source data and different set of options.
Having in mind that handling different set of input data and different set of options
means writing the other frontend - it, probably, would be good not to make dsymutil more complex but
to create another small tool. But, If extending dsymutil looks OK - I am OK with it.
Let`s discuss this approach within proposal thread.



> Extending the already rich interface of llvm-objcopy looks also not very
> good. Having in mind that actual implementation would be shared by
> libraries, the separate utility, working specifically with debug info,
> looks like the right choice. That is our current idea.

> My personal thought would be that extending dsymutil should be ok as the
> functionality goes well with everything else dsymutil does (other than not
> support ELF which the dsymutil maintainers are on board with last I
> checked). That said, I definitely think a write-up will be helpful. No
> matter what I support extracting all of the behavior into libraries and
> using that somewhere :)

Ha, so basically what I was trying to say above.

I look forward to seeing the proposal!

yep, would publish it soon.

Thank you, Alexey.

David Blaikie via llvm-dev

unread,
Aug 10, 2020, 1:35:13 PM8/10/20
to Alexey Lapshin, llvm...@lists.llvm.org, Alexey Lapshin

Probably some opportunities to share some code, even if not the whole
generator - might be best to refactor those opportunistically, rather
than a wholesale "change DWARFLinker to use (all) of
lib/CodeGen/AsmPrinter/Dwarf*". Sort of like the approach that's been
taken with lldb's use of libDebugInfoDWARF - picking particular
features that have high overlap and refactoring them to be reusable
between the two different use cases.

Alexey Lapshin via llvm-dev

unread,
Aug 14, 2020, 12:50:19 PM8/14/20
to David Blaikie, llvm...@lists.llvm.org, Alexey Lapshin
One of the problem which complicates such a refactoring is that DWARF
generation classes are closely coupled with AsmPrinter:

llvm/CodeGen/DIE.h

class DIE* {
  void emitValue(const AsmPrinter *Asm, dwarf::Form Form) const;
  unsigned SizeOf(const AsmPrinter *AP, dwarf::Form Form) const;
}

Having access to all of AsmPrinter public data members
complicates DWARF generation:

void DIEInlineString::emitValue(const AsmPrinter *AP, dwarf::Form Form)
const {
  if (Form == dwarf::DW_FORM_string) {
    AP->OutStreamer->emitBytes(S);
    AP->emitInt8(0);
    return;
  }
}

It would be good to do something similar to https://reviews.llvm.org/D76293.
I.e. avoid AsmPrinter dependence using an abstract interface
(llvm/DebugInfo/DWARF/DWARFDebugSection.h):

class DIE* {
  void emitValue(DwarfDebugSection *Dwarf, dwarf::Form Form) const;
  unsigned SizeOf(const DwarfDebugSection *Dwarf, dwarf::Form Form) const;
}

Such separation, could f.e. allow to implement
AsmPrinter::emitDwarfDIE(const DIE &Die)
in some general place(libDebugInfoDWARF) and then be reused by others
(without necessity to link/use AsmPrinter).

https://reviews.llvm.org/D76293 was though to be more general.
But we could probably start from that smaller change:
avoid dependence of DIE* classes on AsmPrinter.

Alexey.


>
>> Supporting new standard would require rewriting/modification of all these places. In the ideal world,
>> having single implementation for the DWARF generation allows changing one place and having
>> benefits in others. Probably, CodeGen classes could be rewritten and then it would be useful
>> to write them assuming two use cases - generation from the scratch and copying/updating
>> existing data. In the end, there would be single implementation which could be reused in
>> many places. Though, it is indeed a lot of work.
>>
>>
>>

David Blaikie via llvm-dev

unread,
Aug 14, 2020, 1:06:53 PM8/14/20
to Alexey Lapshin, llvm...@lists.llvm.org, Alexey Lapshin

As mentioned in https://reviews.llvm.org/D76293#1928139 - probably
best not to put this into libDebugInfoDWARF - that library is
currently for DWARF parsing & LLVM proper only really needs DWARF
emission, so bundling them together may confuse things a bit - in
terms of adding unnecessary dependencies, conflating/confusing the
goals/priorities of the different libraries, etc. (see similar
separations between reading and writing with things like libIR
containing IR asm writing, but libAsmParser containing the parsing
code).

> and then be reused by others
> (without necessity to link/use AsmPrinter).
>
> https://reviews.llvm.org/D76293 was though to be more general.
> But we could probably start from that smaller change:
> avoid dependence of DIE* classes on AsmPrinter.

Sure - that'd be the general idea as we discussed in D76293 & in this
thread: start small, share some pieces & build up the necessary
abstractions (streaming APIs, etc) over the two different use cases,
etc.

Alexey via llvm-dev

unread,
Aug 14, 2020, 1:30:14 PM8/14/20
to David Blaikie, llvm...@lists.llvm.org, Alexey Lapshin
agreed. It would be better to have separate
libDebugInfoGenDWARF for that.


>
>> and then be reused by others
>> (without necessity to link/use AsmPrinter).
>>
>> https://reviews.llvm.org/D76293 was though to be more general.
>> But we could probably start from that smaller change:
>> avoid dependence of DIE* classes on AsmPrinter.
> Sure - that'd be the general idea as we discussed in D76293 & in this
> thread: start small, share some pieces & build up the necessary
> abstractions (streaming APIs, etc) over the two different use cases,
> etc.

yep.

Reply all
Reply to author
Forward
0 new messages