[llvm-dev] [LLD] Support DWARF64, debug_info "sorting"

281 views
Skip to first unread message

Alexander Yermolovich via llvm-dev

unread,
Nov 11, 2020, 12:19:03 AM11/11/20
to llvm...@lists.llvm.org
This year Igor Kudrin put in a lot of work in enabling DWARF64 support in LLVM. At Facebook we are looking into it as one of the options for handling debug information over 4gigs in production environment. One concern is that due to mix of third party libraries and llvm compiled code the final library/binary will have a mix of CU that are DWARF32/64. This is supported by DWARF format. With this mix it is possible that even with DWARF64 enabled one can still encounter relocation overflows errors in LLD if DWARF32 sections happen to be processed towards the end.

One proposal that was discussed in https://reviews.llvm.org/D87011, is to modify LLD linker to arrange debug_info sections so that DWARF32 comes first, and DWARF64 after them. This way as long as DWARF32 sections don't themselves go over 4gigs, the final binary can contain debug information that exceeds 4gig. Which I think will be the common case.

An alternative approach that was proposed by James Henderson is for build system to take care of it, and to use -u to enforce order.
As, I would imagine, most projects of scale are using configurable build system that pulls in all the various dependencies automatically in a multi-language environment. I think the alternative approach will be more fragile than modifying LLD as it relies on a more complex system, and each customer of LLD will have to implement this "sorting" in their own build systems. The use of -u also kind of abuses this flag, and might have unintended consequences. As was pointed out by Wen Lei.
From overhead perspective we only need to access few bytes of DWARF to determine if it's 32 or 64 bits. Customers who need DWARF64, already accept the overhead that it entails.

Any thoughts?

Thank You
Alex

Eric Christopher via llvm-dev

unread,
Nov 11, 2020, 12:30:17 AM11/11/20
to Alexander Yermolovich, Fangrui Song, llvm...@lists.llvm.org
On Wed, Nov 11, 2020 at 12:19 AM Alexander Yermolovich via llvm-dev <llvm...@lists.llvm.org> wrote:
This year Igor Kudrin put in a lot of work in enabling DWARF64 support in LLVM. At Facebook we are looking into it as one of the options for handling debug information over 4gigs in production environment. One concern is that due to mix of third party libraries and llvm compiled code the final library/binary will have a mix of CU that are DWARF32/64. This is supported by DWARF format. With this mix it is possible that even with DWARF64 enabled one can still encounter relocation overflows errors in LLD if DWARF32 sections happen to be processed towards the end.

One proposal that was discussed in https://reviews.llvm.org/D87011, is to modify LLD linker to arrange debug_info sections so that DWARF32 comes first, and DWARF64 after them. This way as long as DWARF32 sections don't themselves go over 4gigs, the final binary can contain debug information that exceeds 4gig. Which I think will be the common case.

An alternative approach that was proposed by James Henderson is for build system to take care of it, and to use -u to enforce order.

+Fangrui Song here for thread visibility

Of these two approaches I think that the linker sorting is probably the one I'd go with for the reasons you list below - I'm particularly sympathetic to not wanting the unintended consequences of using -u here :)

I do worry about slowing down general debug links so a "debug info sorting" option may make sense, or it may not be worth it after measuring the speed difference.

Thanks for bringing this up on the list! :)

-eric
 

As, I would imagine, most projects of scale are using configurable build system that pulls in all the various dependencies automatically in a multi-language environment. I think the alternative approach will be more fragile than modifying LLD as it relies on a more complex system, and each customer of LLD will have to implement this "sorting" in their own build systems. The use of -u also kind of abuses this flag, and might have unintended consequences. As was pointed out by Wen Lei.
From overhead perspective we only need to access few bytes of DWARF to determine if it's 32 or 64 bits. Customers who need DWARF64, already accept the overhead that it entails.

Any thoughts?

Thank You
Alex
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

unread,
Nov 11, 2020, 12:41:39 AM11/11/20
to Eric Christopher, James Henderson, llvm...@lists.llvm.org
+James for context too (always good to include the folks from the
original threads for continuity)

Yeah, my general attitude there was just twofold, one that the
discussion had strayed fairly far from the review (so interested
parties might not see it, both because it's a targeted review thread
on the noisy llvm-commits, and because fo the title not having much
connection to the discussion) and it seemed to be somewhat
abstract/general - and there's a balance there. "We should do this
because I need it" (we shouldn't be implementing features for
especially niche use cases/if they don't generalize) isn't always a
compelling motivation but "we should do this because someone might
need it" isn't either (we shouldn't be implementing features that have
no users).

The major drawback in sorting, is the need to parse DWARF, even a
little bit of it (only the first 4 bytes of a section to tell which
version it is - first 12 if you want to be able to jump over
contributions and check /all/ contributions coming from a given input
object file (it might contain a combination of DWARFv4 and DWARFv5)
and then the hairy uncertainty of which sections to check (do you
check them all? well, all the ones with length prefixes that
communicate DWARF32/64 - some sections don't
(debug_ranges/loc/str/macro for instance, if I recall correctly)...
and if something has some 4 and 5, does it get sorted to the start? I
guess so.

Igor Kudrin via llvm-dev

unread,
Nov 11, 2020, 3:50:17 AM11/11/20
to David Blaikie, Eric Christopher, James Henderson, Alexander Yermolovich, Fangrui Song, llvm...@lists.llvm.org
Thanks Alexander for bringing this up!

> The major drawback in sorting, is the need to parse DWARF, even a little bit
> of it (only the first 4 bytes of a section to tell which version it is - first 12 if you
> want to be able to jump over contributions and check /all/ contributions
> coming from a given input object file (it might contain a combination of
> DWARFv4 and DWARFv5) and then the hairy uncertainty of which sections to
> check (do you check them all? well, all the ones with length prefixes that
> communicate DWARF32/64 - some sections don't
> (debug_ranges/loc/str/macro for instance, if I recall correctly)...
> and if something has some 4 and 5, does it get sorted to the start? I guess so.

Parsing the sections is one possible approach. Another way, maybe an even more "linkish", would be to check relocations that point to the input sections. If all of them are 64-bit, the section can be placed after ones targeted by 32-bit relocations. Checking relocations is probably slower, but the approach may be generalized for other situations, if necessary.

> I do worry about slowing down general debug links so a "debug info
> sorting" option may make sense, or it may not be worth it after measuring
> the speed difference.

The idea is to do nothing if the size of each output debug section is less than 4GiB. Thus, a noticeable linking speed degradation will exist only for projects which would most probably just fail to link otherwise.

Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.

James Henderson via llvm-dev

unread,
Nov 11, 2020, 3:55:23 AM11/11/20
to David Blaikie, llvm...@lists.llvm.org
I assume this comment is meant to say DWARF32/DWARF64, not DWARFv4 and DWARFv5, as the DWARF version (as opposed to the 32/64 bit style) is irrelevant to this, I believe, at least for the current known DWARF standards. Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach. Mixtures would certainly be possible, and there's no guarantee the CUs would be in a nice order with 32-bit blocks before 64-bit blocks. If I follow this to its full conclusion, you could potentially end up with a single .debug_info (.debug_line, .debug_rnglists etc) input section with a mixture of DWARF32/DWARF64 sub-sections, which, if following the reordering approach, the linker might have to split up internally in order to rearrange (aside - there's some interesting crossover with ideas I've been considering regarding the Fragmented DWARF topic discussed elsewhere). Maybe the solution here would be to change producers to produce separate .debug_info sections containing DWARF32 and DWARF64. This would require other tools, like llvm-dwarfdump, to be updated too to handle multiple input .debug_info sections.

I used the -u option more as an example that it might be possible to get things to work the way we want without needing to have the linker do the work. The linker currently has a --symbol-ordering-file option which can be used to request an order for the specified list of symbols. The linker does this by rearranging the input sections to get as close as it can to the requested order. We could maybe implement the same on a file/section basis. It would avoid needing to read the sections themselves, but doesn't solve the "what to do about mixed single input" case directly (though might allow the user to dodge the decision at least).

Other ideas I had involved changing the section header properties. Currently DWARF sections are all SHT_PROGBITS, but we could change that to e.g. SHT_DWARF_32 or similar, and/or use the sh_info field to contain a value that would indicate the 32/64 bit nature. I'm not convinced by these ideas though, as a) I don't know if it translates well to other non-ELF formats, and b) we can't really control the producers of DWARF at this stage to conform.

It would be nice if there was a solution that could be consistently applied across all build systems, linkers and DWARF producers. I don't have one as yet though.

James Henderson via llvm-dev

unread,
Nov 11, 2020, 3:57:26 AM11/11/20
to Igor Kudrin, llvm...@lists.llvm.org
(Igor - I don't know what happened, but your email split the mail thread in gmail for me.)

David Blaikie via llvm-dev

unread,
Nov 11, 2020, 12:46:40 PM11/11/20
to James Henderson, llvm...@lists.llvm.org

Yep! thanks for the correction - had a lot of DWARFv4/v5 on my mind
due to other work, so got the terms jumbled up.

> Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach. Mixtures would certainly be possible, and there's no guarantee the CUs would be in a nice order with 32-bit blocks before 64-bit blocks. If I follow this to its full conclusion, you could potentially end up with a single .debug_info (.debug_line, .debug_rnglists etc) input section with a mixture of DWARF32/DWARF64 sub-sections, which, if following the reordering approach, the linker might have to split up internally in order to rearrange (aside - there's some interesting crossover with ideas I've been considering regarding the Fragmented DWARF topic discussed elsewhere).

I think given this is a pragmatic feature I'd be inclined to say "eh,
sort any input object containing at least one DWARFv4 contribution
before input objects not containing any v4 contribution" - if that
doesn't solve some real world issues/situations, I'd be willing to
revisit this direction/consider more invasive/expensive solutions.

Though, as Eric said - some of this conversation might be better had
in terms of concrete patches with concrete performance measurements.

> Maybe the solution here would be to change producers to produce separate .debug_info sections containing DWARF32 and DWARF64.

That'd involve changing how certain objects were generated - if that's
possible, then I assume it'd be possible to change that generation to
use DWARF64 anyway - in the limit: one might have precompiled binaries
with debug info that one cannot recompile, so any new format options I
doubt are able to address the original/likely use case for this
functionality.

Robinson, Paul via llvm-dev

unread,
Nov 11, 2020, 1:22:04 PM11/11/20
to David Blaikie, James Henderson, llvm...@lists.llvm.org
I was under the impression that *object* order meant a lot to people,
and changing that would have all sorts of unpleasant fallout. If I'm
remember that correctly, sorting DWARF sections really should be its
own thing, separate from object order. Shoving DWARF-64 sections to
the end of the line seems like it would be less problematic than
reordering entire objects, if the linker can handle that in some
reasonably efficient way.
--paulr
> https://urldefense.com/v3/__https://reviews.llvm.org/D87011__;!!JmoZiZGBv3
> RvKRSx!pnSYzjQly_yuEU-ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9Ue7WdoQ$
> >> >> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > llvm...@lists.llvm.org
> >> > https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$

Fangrui Song via llvm-dev

unread,
Nov 11, 2020, 5:59:42 PM11/11/20
to Robinson, Paul, James Henderson, llvm...@lists.llvm.org
(Adding back Cc: which got dropped)

> (Igor - I don't know what happened, but your email split the mail thread in gmail for me.)

The problem is that https://lists.llvm.org/pipermail/llvm-dev/2020-November/146528.html does not have an In-Reply-To: header.
Added Igor to the Cc: list.

If we go down the route (sorting DWARF64 after DWARF32), compared with a
lightweight parse, I'd prefer the relocation based approach: if a .debug_* has
an 64-bit absolute relocation type (e.g. R_X86_64_64).

In LLD, for an input section, we don't know its associated SHT_REL[A] section.
So when adding an orphan section we would have another loop iterating
over inputSections. We can reuse the dependentSections to have this
piece of information (generalizing the existing special case for -r/--emit-relocs)

> This way as long as DWARF32 sections don't themselves go over 4gigs, the final binary can contain debug information that exceeds 4gig.
> Which I think will be the common case.

I would not expect the linker behaves differently when linking a few additional sections change the behavior so drastically
in a not-easily-explainable way. This deserves a dedicated linker option (see below, I have a concern about the inconsistency
with an input section description)

I'm still learning the internals but would expect that mixed DWARF32/DWARF64 is
a problem for LTO. A reloctable link (-r) can combine DWARF32/DWARF64 object
files and potentially nullify the aforementioned relocation based approach
(we probably just want to check the first relocation to save time;
if we link DWARF64 before DWARF32 we may create a .debug_info
which looks like DWARF64 but is actually restricted by DWARF32 relocations)

>> I think given this is a pragmatic feature I'd be inclined to say "eh,
>> sort any input object containing at least one DWARFv4 contribution
>> before input objects not containing any v4 contribution" - if that
>> doesn't solve some real world issues/situations, I'd be willing to
>> revisit this direction/consider more invasive/expensive solutions.
>
>I was under the impression that *object* order meant a lot to people,
>and changing that would have all sorts of unpleasant fallout. If I'm
>remember that correctly, sorting DWARF sections really should be its
>own thing, separate from object order. Shoving DWARF-64 sections to
>the end of the line seems like it would be less problematic than
>reordering entire objects, if the linker can handle that in some
>reasonably efficient way.
>--paulr

This behavior does add some inconsistency to the system:

For an output section description .debug_info 0 : { *(.debug_info) } ,
should the linker sort DWARF32 and DWARF64 components? It it does, the behavior
will be inconsistent with other input section descriptions *(foo)

If there is a magic keyword, say, SORT_BY_MAGIC_DEBUG, and the internal
linker script does something similar to

*(SORT_BY_MAGIC_DEBUG(.debug_info))

then the system is still consistent.

>>
>> Though, as Eric said - some of this conversation might be better had
>> in terms of concrete patches with concrete performance measurements.
>>
>> > Maybe the solution here would be to change producers to produce separate
>> .debug_info sections containing DWARF32 and DWARF64.
>>
>> That'd involve changing how certain objects were generated - if that's
>> possible, then I assume it'd be possible to change that generation to
>> use DWARF64 anyway - in the limit: one might have precompiled binaries
>> with debug info that one cannot recompile, so any new format options I
>> doubt are able to address the original/likely use case for this
>> functionality.
>>
>> > I used the -u option more as an example that it might be possible to get
>> things to work the way we want without needing to have the linker do the
>> work. The linker currently has a --symbol-ordering-file option which can
>> be used to request an order for the specified list of symbols. The linker
>> does this by rearranging the input sections to get as close as it can to
>> the requested order. We could maybe implement the same on a file/section
>> basis. It would avoid needing to read the sections themselves, but doesn't
>> solve the "what to do about mixed single input" case directly (though
>> might allow the user to dodge the decision at least).

Yeah, --symbol-ordering-file applies on both global and local symbols.
Unfortunately no symbols are defined relative to .debug_* sections
(if we don't consider the STT_SECTION symbols, which cannot be used
anyway because .debug_* do not have unique names).

(The usage of -u still requires the user to add archives (they want to
change order) before other object files. In LLD this requires https://reviews.llvm.org/D81052 )

>> > Other ideas I had involved changing the section header properties.
>> Currently DWARF sections are all SHT_PROGBITS, but we could change that to
>> e.g. SHT_DWARF_32 or similar, and/or use the sh_info field to contain a
>> value that would indicate the 32/64 bit nature. I'm not convinced by these
>> ideas though, as a) I don't know if it translates well to other non-ELF
>> formats, and b) we can't really control the producers of DWARF at this
>> stage to conform.

Inventing a new section type is not bad at a first glance. Leveraging it
can remove the inconsistency in the system as well.
Unfortunately linker scripts (as implemented by GNU ld and emulated by LLD)
don't provide a way to match input sections by section type.

If we are going to have many thoughts on the linker side design, might
be worth asking on https://groups.google.com/g/generic-abi as well.
That would have to a separate discussion because the list is moderated
and users who haven't joined the group cannot reply there. If there are
opinions, we can share them with llvm-dev.

Alexander Yermolovich via llvm-dev

unread,
Nov 11, 2020, 7:52:42 PM11/11/20
to Fangrui Song, Robinson, Paul, James Henderson, llvm...@lists.llvm.org
Thanks for feedback.

I agree with patch and numbers this will be a more concrete discussion, but I wanted to judge overall receptiveness to this approach and see maybe there was a better way.

"Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach."
David can you elaborate under which conditions LTO-generated objects will have a mix of DWARF32/64 in same .debug_info? Looking at how dwarf64 was implemented same flag will be used for the entirety of the dwarf output, even if multiple CUs are included.

I think if object does have a mix of CUs that are 32/64, linker can do a best effort ordering, and output a warning. My approach to this is from covering common cases while solving a problem with relocations overflow in large libraries/binaries.


@Fangrui Song
That's a good point with relocations. Although is it always a guarantee a first one will be representative of entire relocation record?
For debug_info even with DWARF32 there can be 64bit relocations.
0000000000000c57  0000001800000001 R_X86_64_64            0000000000000000 .text._"some_mangeled_name" + 0

On one hand since this is only applicable for when DWARF64 is used, special option would be the way to go. Although the user will need to be aware of yet another LLD option. Maybe an error when relocations overflow occur can be modified to display this option along with -fdebug-types-section 

Thank You
Alex

From: Fangrui Song <mas...@google.com>
Sent: Wednesday, November 11, 2020 2:59 PM
To: Robinson, Paul <paul.r...@sony.com>; James Henderson <jh737...@my.bristol.ac.uk>
Cc: David Blaikie <dbla...@gmail.com>; Eric Christopher <echr...@gmail.com>; Alexander Yermolovich <ayer...@fb.com>; Igor Kudrin <iku...@accesssoftek.com>; 'llvm...@lists.llvm.org' <llvm...@lists.llvm.org>

>> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
>> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
>> >> >
>> >> > _______________________________________________
>> >> > LLVM Developers mailing list
>> >> > llvm...@lists.llvm.org

>> bin/mailman/listinfo/llvm-dev__;!!JmoZiZGBv3RvKRSx!pnSYzjQly_yuEU-
>> ng7OYd8nr3h3tSYOjeCwnH7cr3hA73rA8aVlNzOfQPo9hAdGQGg$
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm...@lists.llvm.org

David Blaikie via llvm-dev

unread,
Nov 11, 2020, 8:47:24 PM11/11/20
to Fangrui Song, llvm...@lists.llvm.org

I'm not sure/don't think this rises to that level - if a user is able
to regenerate their object files with some new object
feature/flag/attribute/etc, then they are probably able to generate
them with DWARF64. So this seems more about a linker doing something
that might help users who have DWARF32 backed into some precompiled
objects/libraries/things they otherwise can't change the way it's
built. So it seems to me it's more a linker-doing-something-nice than
linker/object files defining a new mode of interaction.

- Dave

Fangrui Song via llvm-dev

unread,
Nov 11, 2020, 9:10:34 PM11/11/20
to Alexander Yermolovich, llvm...@lists.llvm.org
On 2020-11-12, Alexander Yermolovich wrote:
>Thanks for feedback.
>
>I agree with patch and numbers this will be a more concrete discussion, but I wanted to judge overall receptiveness to this approach and see maybe there was a better way.
>
>"Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach."
>David can you elaborate under which conditions LTO-generated objects will have a mix of DWARF32/64 in same .debug_info? Looking at how dwarf64 was implemented same flag will be used for the entirety of the dwarf output, even if multiple CUs are included.
>
>I think if object does have a mix of CUs that are 32/64, linker can do a best effort ordering, and output a warning. My approach to this is from covering common cases while solving a problem with relocations overflow in large libraries/binaries.
>
>
>@Fangrui Song<mailto:mas...@google.com>

>That's a good point with relocations. Although is it always a guarantee a first one will be representative of entire relocation record?
>For debug_info even with DWARF32 there can be 64bit relocations.
>0000000000000c57 0000001800000001 R_X86_64_64 0000000000000000 .text._"some_mangeled_name" + 0

It may be weaker than "guaranteed": working in practice.

Let's look at sections that reference these large .debug_* sections (.debug_info, .debug_str, .debug_loclists, .debug_rnglists, ...):

* .debug_info: the first relocation references .debug_abbrev, good indicator
* .debug_names references .debug_info: the first relocation (CU offset) is a good indicator
* .debug_aranges references .debug_info: the first relocation (debug_info_offset) is a good indicator
* .debug_str_offsets references .debug_str: the first relocation (.debug_str offset) is a good indicator
* ...

So checking the first relocation is probably sufficient. Even if we miss
something, we can adjust the heuristic, or rather let the compiler generate an
artificial relocation (R_*_NONE), which will always work.

>On one hand since this is only applicable for when DWARF64 is used, special option would be the way to go. Although the user will need to be aware of yet another LLD option. Maybe an error when relocations overflow occur can be modified to display this option along with -fdebug-types-section

I forgot to mention another drawback with .debug_* parsing. In the
presence of compressed debugging information, currently we uncompress
.debug_* on demand. We usually do it when writing the content of the
output section, which means we can potentially discard the uncompressed
buffers after we have done processing with one output section and move
to the next. This trick can potentially save peak memory usage.

However, if we do .debug_* parsing (to decide ordering among DWARF32/DWARF64),
we either cache the result (lose the trick) or end up uncompressing twice.
Neither is good.

I am quite happy with the relocation approach under a linker option. I'd still
want to know generic-abi folks's thoughts, though. James may have prepared something
he wants to share with generic-abi:) Let's wait...

James Henderson via llvm-dev

unread,
Nov 12, 2020, 5:21:00 AM11/12/20
to Fangrui Song, llvm...@lists.llvm.org
On Thu, 12 Nov 2020 at 02:10, Fangrui Song <mas...@google.com> wrote:
On 2020-11-12, Alexander Yermolovich wrote:
>Thanks for feedback.
>
>I agree with patch and numbers this will be a more concrete discussion, but I wanted to judge overall receptiveness to this approach and see maybe there was a better way.
>
>"Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach."
>David can you elaborate under which conditions LTO-generated objects will have a mix of DWARF32/64 in same .debug_info? Looking at how dwarf64 was implemented same flag will be used for the entirety of the dwarf output, even if multiple CUs are included.

Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.

>
>On one hand since this is only applicable for when DWARF64 is used, special option would be the way to go. Although the user will need to be aware of yet another LLD option. Maybe an error when relocations overflow occur can be modified to display this option along with -fdebug-types-section

I am quite happy with the relocation approach under a linker option. I'd still
want to know generic-abi folks's thoughts, though. James may have prepared something
he wants to share with generic-abi:) Let's wait...

I hadn't prepared anything if I'm honest (though if there's widespread agreement that this would be useful, I certainly can - it would have other positive improvements too, reducing the need for tools to rely on section names to identify debug data for example). It was more a case of bouncing ideas off of people to see what they thought. Any discussion we have will probably also need circulating on the DWARF mailing list too, since it is more a DWARF issue than a gABI issue (unless the solution is a new section type). Further refinements to this idea that might make it more appealing to the generic group: `SHT_DEBUG` for the section type name, with the first N bytes of the sh_info used to specify the variant of debug data it represents (e.g. 0x1 for DWARF, 0x2 for SOME_OTHER_STANDARD etc), and the remainder for use as flags as defined by the standard (I'm thinking for DWARF you could encode the 64-bit/32-bit state in there, possibly the section variant (info/rnglists/line etc) and the DWARF version too), on the understanding that consumers like the linker wouldn't combine sections in a potentially broken way. This has the advantage that it could be retrofitted to the existing standard versions, but as has been pointed out, this won't help those with linker scripts - that could only be solved with a new DWARF standard and separate names for 64/32 bit sections, at least if we wanted to avoid the linker needing to do anything beyond reading the section header.

The relocation approach sounds like a reasonable solution for the current situation - even if we do decide to go the route of changing producers to start emitting a new section type/update the standard etc, it doesn't resolve the problem people may currently face.

James Henderson via llvm-dev

unread,
Nov 12, 2020, 5:24:51 AM11/12/20
to Robinson, Paul, llvm...@lists.llvm.org
Object order means quite a lot, but it usually is only important for the loadable data, as it has cache implications. This isn't an issue for debug data, as far as I understand it. Object order also has a number of other effects like what to do with COMDATs, weak symbol resolution, library inputs etc, but these are all link-time behaviour things, and once the right decisions (e.g. which input contributions to use) have been made, the linker could reorder the debug data as it wishes.

Alexander Yermolovich via llvm-dev

unread,
Nov 12, 2020, 7:44:03 PM11/12/20
to Fangrui Song, jh737...@my.bristol.ac.uk, llvm...@lists.llvm.org
Looks like there is an agreement that this path, modifying lld to order sections using relocations, should be explored.
If Igor doesn't object, since he was primary one driving DWARF64 so far, I would like to give it a shot at implementing and collecting some performance numbers. 🙂

Alex


From: James Henderson <jh737...@my.bristol.ac.uk>
Sent: Thursday, November 12, 2020 2:20 AM
To: Fangrui Song <mas...@google.com>
Cc: Alexander Yermolovich <ayer...@fb.com>; Robinson, Paul <paul.r...@sony.com>; David Blaikie <dbla...@gmail.com>; Eric Christopher <echr...@gmail.com>; Igor Kudrin <iku...@accesssoftek.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>

Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"

Igor Kudrin via llvm-dev

unread,
Nov 12, 2020, 10:11:51 PM11/12/20
to Alexander Yermolovich, Fangrui Song, jh737...@my.bristol.ac.uk, llvm...@lists.llvm.org
On 13.11.2020 07:43, Alexander Yermolovich wrote:
> Looks like there is an agreement that this path, modifying lld to order
> sections using relocations, should be explored.
> If Igor doesn't object, since he was primary one driving DWARF64 so far,
> I would like to give it a shot at implementing and collecting some
> performance numbers. 🙂

No objections, definitely. Please, go ahead.

--

Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.​

Fāng-ruì Sòng via llvm-dev

unread,
Nov 13, 2020, 12:03:39 AM11/13/20
to Alexander Yermolovich, llvm...@lists.llvm.org
On Thu, Nov 12, 2020 at 4:43 PM Alexander Yermolovich <ayer...@fb.com> wrote:
>
> Looks like there is an agreement that this path, modifying lld to order sections using relocations, should be explored.
> If Igor doesn't object, since he was primary one driving DWARF64 so far, I would like to give it a shot at implementing and collecting some performance numbers.
>
> Alex
>
> ________________________________
> From: James Henderson <jh737...@my.bristol.ac.uk>
> Sent: Thursday, November 12, 2020 2:20 AM
> To: Fangrui Song <mas...@google.com>
> Cc: Alexander Yermolovich <ayer...@fb.com>; Robinson, Paul <paul.r...@sony.com>; David Blaikie <dbla...@gmail.com>; Eric Christopher <echr...@gmail.com>; Igor Kudrin <iku...@accesssoftek.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>
> Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"

I probably should have mentioned that I had started a prototype:) (And I realized that I could use firstRelocation instead of dependentSections)
What I haven't felt comfortable with is the input section description inconsistency

> https://sourceware.org/pipermail/binutils/2020-November/114099.html
>
> This behavior does add some inconsistency to the system:
>
> For an output section description .debug_info 0 : { *(.debug_info) } , should the linker sort DWARF32 and DWARF64 components? It it does, the behavior will be inconsistent with other input section descriptions *(foo)
>
> If there is a magic keyword, say, SORT_BY_MAGIC_DEBUG, and the internal
> linker script does something similar to
>
>   *(SORT_BY_MAGIC_DEBUG(.debug_info))
>
> then the system is still consistent.

I also started a thread on binutils side yesterday (sent in haste) https://sourceware.org/pipermail/binutils/2020-November/114099.html
(We should give then a chance for design and hope for a common option, at least getting a consensus even if the implementation on their side is of low priority)


>
>
> On Thu, 12 Nov 2020 at 02:10, Fangrui Song <mas...@google.com> wrote:
>
> On 2020-11-12, Alexander Yermolovich wrote:
> >Thanks for feedback.
> >
> >I agree with patch and numbers this will be a more concrete discussion, but I wanted to judge overall receptiveness to this approach and see maybe there was a better way.
> >
> >"Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach."
> >David can you elaborate under which conditions LTO-generated objects will have a mix of DWARF32/64 in same .debug_info? Looking at how dwarf64 was implemented same flag will be used for the entirety of the dwarf output, even if multiple CUs are included.
>
>
> Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.
>
> >
> >On one hand since this is only applicable for when DWARF64 is used, special option would be the way to go. Although the user will need to be aware of yet another LLD option. Maybe an error when relocations overflow occur can be modified to display this option along with -fdebug-types-section
>
> I am quite happy with the relocation approach under a linker option. I'd still
> want to know generic-abi folks's thoughts, though. James may have prepared something
> he wants to share with generic-abi:) Let's wait...
>
>
> I hadn't prepared anything if I'm honest (though if there's widespread agreement that this would be useful, I certainly can - it would have other positive improvements too, reducing the need for tools to rely on section names to identify debug data for example). It was more a case of bouncing ideas off of people to see what they thought. Any discussion we have will probably also need circulating on the DWARF mailing list too, since it is more a DWARF issue than a gABI issue (unless the solution is a new section type). Further refinements to this idea that might make it more appealing to the generic group: `SHT_DEBUG` for the section type name, with the first N bytes of the sh_info used to specify the variant of debug data it represents (e.g. 0x1 for DWARF, 0x2 for SOME_OTHER_STANDARD etc), and the remainder for use as flags as defined by the standard (I'm thinking for DWARF you could encode the 64-bit/32-bit state in there, possibly the section variant (info/rnglists/line etc) and the DWARF version too), on the understanding that consumers like the linker wouldn't combine sections in a potentially broken way. This has the advantage that it could be retrofitted to the existing standard versions, but as has been pointed out, this won't help those with linker scripts - that could only be solved with a new DWARF standard and separate names for 64/32 bit sections, at least if we wanted to avoid the linker needing to do anything beyond reading the section header.
>
> The relocation approach sounds like a reasonable solution for the current situation - even if we do decide to go the route of changing producers to start emitting a new section type/update the standard etc, it doesn't resolve the problem people may currently face.



--
宋方睿

Fāng-ruì Sòng via llvm-dev

unread,
Nov 13, 2020, 2:05:49 AM11/13/20
to Alexander Yermolovich, llvm...@lists.llvm.org
On Thu, Nov 12, 2020 at 9:03 PM Fāng-ruì Sòng <mas...@google.com> wrote:
On Thu, Nov 12, 2020 at 4:43 PM Alexander Yermolovich <ayer...@fb.com> wrote:
>
> Looks like there is an agreement that this path, modifying lld to order sections using relocations, should be explored.
> If Igor doesn't object, since he was primary one driving DWARF64 so far, I would like to give it a shot at implementing and collecting some performance numbers.
>
> Alex
>
> ________________________________
> From: James Henderson <jh737...@my.bristol.ac.uk>
> Sent: Thursday, November 12, 2020 2:20 AM
> To: Fangrui Song <mas...@google.com>
> Cc: Alexander Yermolovich <ayer...@fb.com>; Robinson, Paul <paul.r...@sony.com>; David Blaikie <dbla...@gmail.com>; Eric Christopher <echr...@gmail.com>; Igor Kudrin <iku...@accesssoftek.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>
> Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"

I probably should have mentioned that I had started a prototype:) (And I realized that I could use firstRelocation instead of dependentSections)
What I haven't felt comfortable with is the input section description inconsistency

> https://sourceware.org/pipermail/binutils/2020-November/114099.html
>
> This behavior does add some inconsistency to the system:
>
> For an output section description .debug_info 0 : { *(.debug_info) } , should the linker sort DWARF32 and DWARF64 components? It it does, the behavior will be inconsistent with other input section descriptions *(foo)
>
> If there is a magic keyword, say, SORT_BY_MAGIC_DEBUG, and the internal
> linker script does something similar to
>
>   *(SORT_BY_MAGIC_DEBUG(.debug_info))
>
> then the system is still consistent.

I also started a thread on binutils side yesterday (sent in haste) https://sourceware.org/pipermail/binutils/2020-November/114099.html
(We should give then a chance for design and hope for a common option, at least getting a consensus even if the implementation on their side is of low priority)

 
>
>
> On Thu, 12 Nov 2020 at 02:10, Fangrui Song <mas...@google.com> wrote:
>
> On 2020-11-12, Alexander Yermolovich wrote:
> >Thanks for feedback.
> >
> >I agree with patch and numbers this will be a more concrete discussion, but I wanted to judge overall receptiveness to this approach and see maybe there was a better way.
> >
> >"Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach."
> >David can you elaborate under which conditions LTO-generated objects will have a mix of DWARF32/64 in same .debug_info? Looking at how dwarf64 was implemented same flag will be used for the entirety of the dwarf output, even if multiple CUs are included.
>
>
> Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.
>
> >
> >On one hand since this is only applicable for when DWARF64 is used, special option would be the way to go. Although the user will need to be aware of yet another LLD option. Maybe an error when relocations overflow occur can be modified to display this option along with -fdebug-types-section
>
> I am quite happy with the relocation approach under a linker option. I'd still
> want to know generic-abi folks's thoughts, though. James may have prepared something
> he wants to share with generic-abi:) Let's wait...
>
>
> I hadn't prepared anything if I'm honest (though if there's widespread agreement that this would be useful, I certainly can - it would have other positive improvements too, reducing the need for tools to rely on section names to identify debug data for example). It was more a case of bouncing ideas off of people to see what they thought. Any discussion we have will probably also need circulating on the DWARF mailing list too, since it is more a DWARF issue than a gABI issue (unless the solution is a new section type). Further refinements to this idea that might make it more appealing to the generic group: `SHT_DEBUG` for the section type name, with the first N bytes of the sh_info used to specify the variant of debug data it represents (e.g. 0x1 for DWARF, 0x2 for SOME_OTHER_STANDARD etc), and the remainder for use as flags as defined by the standard (I'm thinking for DWARF you could encode the 64-bit/32-bit state in there, possibly the section variant (info/rnglists/line etc) and the DWARF version too), on the understanding that consumers like the linker wouldn't combine sections in a potentially broken way. This has the advantage that it could be retrofitted to the existing standard versions, but as has been pointed out, this won't help those with linker scripts - that could only be solved with a new DWARF standard and separate names for 64/32 bit sections, at least if we wanted to avoid the linker needing to do anything beyond reading the section header.
>
> The relocation approach sounds like a reasonable solution for the current situation - even if we do decide to go the route of changing producers to start emitting a new section type/update the standard etc, it doesn't resolve the problem people may currently face.



--
宋方睿


--
宋方睿

Wenlei He via llvm-dev

unread,
Nov 13, 2020, 11:35:43 AM11/13/20
to jh737...@my.bristol.ac.uk, Fangrui Song, Alexander Yermolovich, llvm...@lists.llvm.org

>  Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.

 

I don’t think we need to encode dwarf32/64 in IR as attribute for each module. We’re not going to emit mixed dwarf32/64 for merged LTO module anyways, so allowing each module to express its dwarf setting would only introduce burden for LTO to deal with inconsistency (warning?) among input modules. Having a linker switch to pass the setting from driver to LTO sounds better to me.

 

From: llvm-dev <llvm-dev...@lists.llvm.org>


Date: Thursday, November 12, 2020 at 2:21 AM
To: Fangrui Song <mas...@google.com>

Cc: llvm...@lists.llvm.org <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"

 

 

On Thu, 12 Nov 2020 at 02:10, Fangrui Song <mas...@google.com> wrote:

David Blaikie via llvm-dev

unread,
Nov 13, 2020, 12:52:36 PM11/13/20
to Wenlei He, llvm...@lists.llvm.org
On Fri, Nov 13, 2020 at 8:35 AM Wenlei He via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> > Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.
>
>
>
> I don’t think we need to encode dwarf32/64 in IR as attribute for each module. We’re not going to emit mixed dwarf32/64 for merged LTO module anyways, so allowing each module to express its dwarf setting would only introduce burden for LTO to deal with inconsistency (warning?) among input modules. Having a linker switch to pass the setting from driver to LTO sounds better to me.

Usually the issue there is that existing build systems may be setup
only to pass such flags to the compilations, and not to the link
invocations - like DWARF version, we pass that down through IR, emit
warnings/errors when two IR modules with different DWARF versions are
linked together, and then emit only one (the higher, I believe) DWARF
version out the other end.

We aren't 100% consistent on this "anything you could do without LTO,
you shuold be able to do with LTO/passing the same flags to the same
actions" kind of strategy (eg: type units and DWARF compression aren't
passed down through IR - if you want those you have to pass them to
the link invocation (via the clang driver) yourself). So it's more "is
there a systemic use of these flags already for the compilation and
would not supporting it there be a pain"? It's probably not for
DWARF64, since we haven't had any flag to support it at the moment
anyway.

- Dave

>
>
>
> From: llvm-dev <llvm-dev...@lists.llvm.org>
> Date: Thursday, November 12, 2020 at 2:21 AM
> To: Fangrui Song <mas...@google.com>
> Cc: llvm...@lists.llvm.org <llvm...@lists.llvm.org>
> Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"
>
>
>
>
>
> On Thu, 12 Nov 2020 at 02:10, Fangrui Song <mas...@google.com> wrote:
>
> On 2020-11-12, Alexander Yermolovich wrote:
> >Thanks for feedback.
> >
> >I agree with patch and numbers this will be a more concrete discussion, but I wanted to judge overall receptiveness to this approach and see maybe there was a better way.
> >
> >"Whilst the majority of objects will only have a single CU in them, there will be exceptions (LTO-generated objects, -r merged objects etc), so we do need to consider this approach."
> >David can you elaborate under which conditions LTO-generated objects will have a mix of DWARF32/64 in same .debug_info? Looking at how dwarf64 was implemented same flag will be used for the entirety of the dwarf output, even if multiple CUs are included.
>
>
>
> Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.
>
>
>
> >
> >On one hand since this is only applicable for when DWARF64 is used, special option would be the way to go. Although the user will need to be aware of yet another LLD option. Maybe an error when relocations overflow occur can be modified to display this option along with -fdebug-types-section
>
> I am quite happy with the relocation approach under a linker option. I'd still
> want to know generic-abi folks's thoughts, though. James may have prepared something
> he wants to share with generic-abi:) Let's wait...
>
>
>
> I hadn't prepared anything if I'm honest (though if there's widespread agreement that this would be useful, I certainly can - it would have other positive improvements too, reducing the need for tools to rely on section names to identify debug data for example). It was more a case of bouncing ideas off of people to see what they thought. Any discussion we have will probably also need circulating on the DWARF mailing list too, since it is more a DWARF issue than a gABI issue (unless the solution is a new section type). Further refinements to this idea that might make it more appealing to the generic group: `SHT_DEBUG` for the section type name, with the first N bytes of the sh_info used to specify the variant of debug data it represents (e.g. 0x1 for DWARF, 0x2 for SOME_OTHER_STANDARD etc), and the remainder for use as flags as defined by the standard (I'm thinking for DWARF you could encode the 64-bit/32-bit state in there, possibly the section variant (info/rnglists/line etc) and the DWARF version too), on the understanding that consumers like the linker wouldn't combine sections in a potentially broken way. This has the advantage that it could be retrofitted to the existing standard versions, but as has been pointed out, this won't help those with linker scripts - that could only be solved with a new DWARF standard and separate names for 64/32 bit sections, at least if we wanted to avoid the linker needing to do anything beyond reading the section header.
>
>
>
> The relocation approach sounds like a reasonable solution for the current situation - even if we do decide to go the route of changing producers to start emitting a new section type/update the standard etc, it doesn't resolve the problem people may currently face.
>

Fāng-ruì Sòng via llvm-dev

unread,
Nov 13, 2020, 2:05:36 PM11/13/20
to David Blaikie, llvm...@lists.llvm.org
I got replies from Nick Clifton and Michael Matz:
https://sourceware.org/pipermail/binutils/2020-November/114116.html
(and its reply).
I have mentioned (a) the difficulty of the
detecting-DWARF64-by-first-relocation approach and (b) the section
type approach in my reply there
https://sourceware.org/pipermail/binutils/2020-November/114125.html

(a) My prototype has made me feel uneasy with this approach.

<quote>
In DWARF v4 or if .debug_str_offset is not used, it is a problem. A
heuristic is: if an input section in a file is marked DWARF64, we mark
all other .debug_* DWARF64. This makes me feel a bit uneasy because
for an output section description

.debug_str 0 : { *(.debug_str) }

Now the behavior of `*` (or, if we invent a `SORT_*` keyword) is also
dependent on other output sections.
</quote>

(b)
* It needs a section type (either a gABI one or a SHT_GNU_* in GNU
ABI). Seeking for a gABI one is not that I think this is particularly
related to gABI but that I don't want Solaris (which LLVM also
supports) uses a different section type to unnecessarily cause
friction on our implementation
* It needs a clarification on multiple output section descriptions
with the same name.
* It needs a linker script feature to match input sections by type.

--
宋方睿

David Blaikie via llvm-dev

unread,
Nov 13, 2020, 2:17:16 PM11/13/20
to Fāng-ruì Sòng, llvm...@lists.llvm.org
On Fri, Nov 13, 2020 at 11:05 AM Fāng-ruì Sòng <mas...@google.com> wrote:
>
> I got replies from Nick Clifton and Michael Matz:
> https://sourceware.org/pipermail/binutils/2020-November/114116.html
> (and its reply).
> I have mentioned (a) the difficulty of the
> detecting-DWARF64-by-first-relocation approach and (b) the section
> type approach in my reply there
> https://sourceware.org/pipermail/binutils/2020-November/114125.html
>
> (a) My prototype has made me feel uneasy with this approach.
>
> <quote>
> In DWARF v4 or if .debug_str_offset is not used, it is a problem. A
> heuristic is: if an input section in a file is marked DWARF64, we mark
> all other .debug_* DWARF64. This makes me feel a bit uneasy because
> for an output section description
>
> .debug_str 0 : { *(.debug_str) }
>
> Now the behavior of `*` (or, if we invent a `SORT_*` keyword) is also
> dependent on other output sections.
> </quote>
>
> (b)
> * It needs a section type (either a gABI one or a SHT_GNU_* in GNU
> ABI). Seeking for a gABI one is not that I think this is particularly
> related to gABI but that I don't want Solaris (which LLVM also
> supports) uses a different section type to unnecessarily cause
> friction on our implementation

If I'm understawding you correrctly you're suggesting the sorting
behavior would only be implemented if the input object file had some
new attributes in it designating which sections are debug info
sections?

I don't think that's a viable solution to the problem at hand, then -
if someone is able to update their toolchain and rebuild objects with
new attributes, they can probably update the build configuration of
those objects to build them with DWARF64 instead, avoiding the mixed
32/64 problem. I think the solution we're looking for would have to
work with existing precompiled object files using DWARF32 that are in
the wild today, without modification.

Fāng-ruì Sòng via llvm-dev

unread,
Nov 13, 2020, 2:24:36 PM11/13/20
to David Blaikie, llvm...@lists.llvm.org

I know the "no-modification" requirement:) The first paragraph of
https://sourceware.org/pipermail/binutils/2020-November/114125.html
mentioned this.

The section type approach is used this way (in another paragraph):

<quote>
If we invent a keyword (say, TYPE) to match sections by type, we could use

.debug_info 0 : { *(TYPE (SHT_PROGBITS)
.debug_info${RELOCATING+ .gnu.linkonce.wi.*}) }
.debug_info 0 : { *(TYPE (SHT_GNU_DWARF64) .debug_info) }

or

.debug_info 0 : { *(TYPE (SHT_PROGBITS)
.debug_info${RELOCATING+ .gnu.linkonce.wi.*} TYPE (SHT_GNU_DWARF64)
.debug_info) }
</quote>

David Blaikie via llvm-dev

unread,
Nov 13, 2020, 2:29:10 PM11/13/20
to Fāng-ruì Sòng, llvm...@lists.llvm.org

OK - thanks for clarifying. I don't really know much/enough about
linker scripts to comment on the rest of the design, and was just
confused/misunderstanding what was being suggested.

Fāng-ruì Sòng via llvm-dev

unread,
Nov 13, 2020, 3:43:01 PM11/13/20
to David Blaikie, llvm...@lists.llvm.org

For .debug_* in object files:

DWARF32 -> SHT_PROGBITS (unchanged)
DWARF64 -> SHT_DWARF64 or SHT_GNU_DWARF64

In LLD, we will need to allow mixed SHT_PROGBITS and SHT_DWARF64. If
all input sections are SHT_DWARF64, the output section type probably
should also be SHT_DWARF64.
If mixed, SHT_PROGBITS.

FWIW I started a generic-abi thread
https://groups.google.com/g/generic-abi/c/i2Xio-47QdQ ("Reserve a
section type value for DWARF64") to give stakeholders from other ELF
operating systems a chance to participate in the design. I have paid
attention to my wording: a new section type is **not decided yet** on
LLVM/GNU binutils sides. Our discussions on llvm-dev/binutils will
benefit from agreement/disagreement from generic-abi.

Alexander Yermolovich via llvm-dev

unread,
Nov 13, 2020, 8:47:20 PM11/13/20
to Fāng-ruì Sòng, David Blaikie, llvm...@lists.llvm.org
Thanks for doing a diff and asking in other groups.

So if I understand your concern with using first reloc as it relates to .debug_str. 

In DWARF4 for .debug_str is referenced from .debug_info, .debug_type using DW_FORM_strp. For DWARF32 it's 32bit unsigned, for DWARF64 it's 64bit unsigned. So in relocation section for some .debug_info section we will have a relocation entry to patch up DW_FORM_strp. Either R_X86_64_32, or R_X86_64_64, depending on DWARF format. 

A situation we might have is that an input .debug_info section is DWARF32 so it gets ordered apropriatly within output .debug_info section, but the input .debug_str section can be put in the output .debug_str section that is above 4GB. In which case we still hit overflow. So we also need to oder the .debug_str section, except in DWARF4 there is no real clear link, besides looking through .debug_info relocs.

Is that a correct summary?

Also I don't quite understand what the issue is with linker script.

My understanding is that: 
.debug_str 0 : { *(.debug_str) }

Just stays that all .debug_str input sections should go in to .debug_str output section. It doesn't really specify the ordering within the output .debug_str section.

Thank You
Alex

From: Fāng-ruì Sòng <mas...@google.com>
Sent: Friday, November 13, 2020 12:42 PM
To: David Blaikie <dbla...@gmail.com>
Cc: Wenlei He <wen...@fb.com>; Alexander Yermolovich <ayer...@fb.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>; Robinson, Paul <paul.r...@sony.com>; James Henderson <jh737...@my.bristol.ac.uk>; Eric Christopher <echr...@gmail.com>; Igor Kudrin <iku...@accesssoftek.com>

Igor Kudrin via llvm-dev

unread,
Nov 17, 2020, 1:42:31 AM11/17/20
to Fāng-ruì Sòng, David Blaikie, llvm...@lists.llvm.org
On 14.11.2020 3:42, Fāng-ruì Sòng wrote:
> For .debug_* in object files:
>
> DWARF32 -> SHT_PROGBITS (unchanged)
> DWARF64 -> SHT_DWARF64 or SHT_GNU_DWARF64
>
> In LLD, we will need to allow mixed SHT_PROGBITS and SHT_DWARF64. If
> all input sections are SHT_DWARF64, the output section type probably
> should also be SHT_DWARF64.
> If mixed, SHT_PROGBITS.

I am not really sure that we need a new section type. This gets a small
simplification in one part of the linker but adds much more burden of
supporting that to all tools and specs around. And that support would be
required for a rarely used feature, which certainly results that support
to be partial and sometimes erroneous.

If we aim for clarity and not ambiguity, I would suggest considering the
following. The DWARF specs designed so that the data of DWARF32 and
DWARF64 formats can be intermixed together in one section. What section
type should be used for that combination? Should it be another new
section type, something like SHT_DWARF32_DWARF64? Probably not, that
should be a regular SHT_PROGBITS. That means that DWARF64 data can be
contained in a SHT_PROGBITS section. Consequently, the section which
contains DWARF64 data without DWARF32 data should also be SHT_PROGBITS,
because the latter is just a special case of the former.

> FWIW I started a generic-abi thread
> https://groups.google.com/g/generic-abi/c/i2Xio-47QdQ ("Reserve a
> section type value for DWARF64") to give stakeholders from other ELF
> operating systems a chance to participate in the design. I have paid
> attention to my wording: a new section type is **not decided yet** on
> LLVM/GNU binutils sides. Our discussions on llvm-dev/binutils will
> benefit from agreement/disagreement from generic-abi.

Thank you for conducting all these discussions. I hope we will finally
find a way to bring the feature alive.

--
Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.

Fāng-ruì Sòng via llvm-dev

unread,
Nov 17, 2020, 1:51:36 AM11/17/20
to Alexander Yermolovich, llvm...@lists.llvm.org
On 2020-11-14, Alexander Yermolovich wrote:
>Thanks for doing a diff and asking in other groups.
>
>So if I understand your concern with using first reloc as it relates to .debug_str.
>
>In DWARF4 for .debug_str is referenced from .debug_info, .debug_type using DW_FORM_strp. For DWARF32 it's 32bit unsigned, for DWARF64 it's 64bit unsigned. So in relocation section for some .debug_info section we will have a relocation entry to patch up DW_FORM_strp. Either R_X86_64_32, or R_X86_64_64, depending on DWARF format.
>
>A situation we might have is that an input .debug_info section is DWARF32 so it gets ordered apropriatly within output .debug_info section, but the input .debug_str section can be put in the output .debug_str section that is above 4GB. In which case we still hit overflow. So we also need to oder the .debug_str section, except in DWARF4 there is no real clear link, besides looking through .debug_info relocs.

Yes. For most other .debug_*, we can check whether the section has an 64-bit
absolute relocation to decide whether it is DWARF64. .debug_str is different in
that we need to check relocations referencing it. This makes its behavior
dependent on other sections, which is why I feel lost with the relocation
approach: when we write .debug_str 0 : { *(.debug_str) }, we really want the
output section .debug_str can be made up with just information from the input
section descriptions, not random information from other .debug_*

LLD -O1 (default) and GNU ld -O0 enable constant string merge within
SHF_MERGE&&SHF_STRINGS sections. We need to pay attention as if a DWARF32
string gets folded into a DWARF64 string, the section referencing the DWARF32
string can still trigger a relocation overflow.

If we order DWARF32 components before DWARF64 components, with the
llvm::StringTableBuilder usage in LLD, we can make sure DWARF64 strings can get
folded into DWARF32 strings, not the other way around.

>Is that a correct summary?
>
>Also I don't quite understand what the issue is with linker script.
>
>My understanding is that:
>
>.debug_str 0 : { *(.debug_str) }
>
>Just stays that all .debug_str input sections should go in to .debug_str output section. It doesn't really specify the ordering within the output .debug_str section.

There is an assumption that linkers concatenate input sections in input order,
which is required by the ELF specification. There are options which can change
the semantics of `*`: --sort-section, (gold specific) --section-ordering-file,
(LLD specific) --symbol-ordering-file. Like the other options, we should
justify the `*` ordering by assigning appropriate semantics.

>Thank You
>Alex

Fāng-ruì Sòng via llvm-dev

unread,
Nov 17, 2020, 2:06:11 AM11/17/20
to Igor Kudrin, llvm...@lists.llvm.org
On Mon, Nov 16, 2020 at 10:42 PM Igor Kudrin <iku...@accesssoftek.com> wrote:
>
> On 14.11.2020 3:42, Fāng-ruì Sòng wrote:
> > For .debug_* in object files:
> >
> > DWARF32 -> SHT_PROGBITS (unchanged)
> > DWARF64 -> SHT_DWARF64 or SHT_GNU_DWARF64
> >
> > In LLD, we will need to allow mixed SHT_PROGBITS and SHT_DWARF64. If
> > all input sections are SHT_DWARF64, the output section type probably
> > should also be SHT_DWARF64.
> > If mixed, SHT_PROGBITS.
>
> I am not really sure that we need a new section type. This gets a small
> simplification in one part of the linker but adds much more burden of
> supporting that to all tools and specs around. And that support would be
> required for a rarely used feature, which certainly results that support
> to be partial and sometimes erroneous.

This is not a small simplification. If we consider how we would
address .debug_str with the relocation approach,
it would be quite contrived. A reply I just made
https://lists.llvm.org/pipermail/llvm-dev/2020-November/146673.html
detailed why I don't like the conceptual model: the behavior is
dependent on other sections.
I also felt bad as I had to do string comparison on ".debug_"
(https://reviews.llvm.org/D91404)

I've chatted with gdb folks: gdb (gdb/dwarf2/read.c) is agnostic about
the section type. (They just cannot handle multiple .debug_info
sections.)
SHT_PROGBITS is a default (or "use this if nothing more specific
exists") section type that probably no tool will specifically check
the type.
For many tools, they don't understand DWARF64 yet - there is little
they need to adapt when they add DWARF64 support.
Currently the only problem I can think of is readelf -S (from my
understanding of
https://sourceware.org/pipermail/binutils/2020-November/114116.html ,
Nick will be happy if I write a GNU readelf patch to make the section
header table dump look good:) )

> If we aim for clarity and not ambiguity, I would suggest considering the
> following. The DWARF specs designed so that the data of DWARF32 and
> DWARF64 formats can be intermixed together in one section. What section
> type should be used for that combination? Should it be another new
> section type, something like SHT_DWARF32_DWARF64? Probably not, that
> should be a regular SHT_PROGBITS. That means that DWARF64 data can be
> contained in a SHT_PROGBITS section. Consequently, the section which
> contains DWARF64 data without DWARF32 data should also be SHT_PROGBITS,
> because the latter is just a special case of the former.

I suggest we follow the `canMergeToProgbits` logic in LLD: mixed
SHT_DWARF64 and SHT_PROGBITS get SHT_PROGBITS. This is the existing
rule for many other section types.
Conceptually, the combined section should impose the rigid restriction
when it is further combined with other sections.
This is probably an extra argument that we don't need SHT_DWARF32.

> > FWIW I started a generic-abi thread
> > https://groups.google.com/g/generic-abi/c/i2Xio-47QdQ ("Reserve a
> > section type value for DWARF64") to give stakeholders from other ELF
> > operating systems a chance to participate in the design. I have paid
> > attention to my wording: a new section type is **not decided yet** on
> > LLVM/GNU binutils sides. Our discussions on llvm-dev/binutils will
> > benefit from agreement/disagreement from generic-abi.
>
> Thank you for conducting all these discussions. I hope we will finally
> find a way to bring the feature alive.

Thanks to James and Pavel for quoting the standard.
https://groups.google.com/g/generic-abi/c/i2Xio-47QdQ
It seems that we've made progress on the thread and may be able to get
a generic value.
I am following up with binutils folks
https://sourceware.org/pipermail/binutils/2020-November/
I hope we can get to a design which satisfies all parties.


> --
> Best Regards,
> Igor Kudrin
> C++ Developer, Access Softek, Inc.

--
宋方睿

James Henderson via llvm-dev

unread,
Nov 17, 2020, 3:17:50 AM11/17/20
to Fāng-ruì Sòng, llvm...@lists.llvm.org
Thinking about it, it would probably be wise to raise this discussion on the DWARF mailing list too. They might want to put an addendum in the spec/DWARF wiki/somewhere appropriate along the lines of "ELF support for DWARF64", which can be retroactively applied to existing standards. The committee may also have some thoughts on how tools are expected to work with DWARF64 and DWARF32 mixtures.

James

Igor Kudrin via llvm-dev

unread,
Nov 17, 2020, 4:05:42 AM11/17/20
to Fāng-ruì Sòng, llvm...@lists.llvm.org

On 17.11.2020 14:05, Fāng-ruì Sòng wrote:
> On Mon, Nov 16, 2020 at 10:42 PM Igor Kudrin <iku...@accesssoftek.com> wrote:
>>
>> On 14.11.2020 3:42, Fāng-ruì Sòng wrote:
>>> For .debug_* in object files:
>>>
>>> DWARF32 -> SHT_PROGBITS (unchanged)
>>> DWARF64 -> SHT_DWARF64 or SHT_GNU_DWARF64
>>>
>>> In LLD, we will need to allow mixed SHT_PROGBITS and SHT_DWARF64. If
>>> all input sections are SHT_DWARF64, the output section type probably
>>> should also be SHT_DWARF64.
>>> If mixed, SHT_PROGBITS.
>>
>> I am not really sure that we need a new section type. This gets a small
>> simplification in one part of the linker but adds much more burden of
>> supporting that to all tools and specs around. And that support would be
>> required for a rarely used feature, which certainly results that support
>> to be partial and sometimes erroneous.
>
> This is not a small simplification. If we consider how we would
> address .debug_str with the relocation approach,
> it would be quite contrived. A reply I just made
> https://lists.llvm.org/pipermail/llvm-dev/2020-November/146673.html
> detailed why I don't like the conceptual model: the behavior is
> dependent on other sections.

I do not see real issues in the dependency on other sections. Section
7.4, p. 198, DWARFv5 states: "The 32-bit and 64-bit DWARF format
conventions must not be intermixed within a single compilation unit."
From this, we can conclude that if a ".debug_info" section contains
only 64-bit relocations, all other debug info sections in the object
file are in the 64-bit DWARF format. This can be simplified to checking
just the first relocation of this section, as you have made in
https://reviews.llvm.org/D91404, and then spreading the assessment to
all other debug sections in the object file. Sure, this heuristic will
not work for some clumsy partially linked relocatable objects with a
mixture of DWARF64 and DWARF32 data, but I guess they can be ignored for
the first shot, because, firstly, their creation is usually under full
control of the user and, secondly, the heuristic can be extended to
check all the relocations if that will be ever necessary.

> I also felt bad as I had to do string comparison on ".debug_"
> (https://reviews.llvm.org/D91404)

Well, that is a common way to find sections with debugging information, no?

> I've chatted with gdb folks: gdb (gdb/dwarf2/read.c) is agnostic about
> the section type. (They just cannot handle multiple .debug_info
> sections.)
> SHT_PROGBITS is a default (or "use this if nothing more specific
> exists") section type that probably no tool will specifically check
> the type.
> For many tools, they don't understand DWARF64 yet - there is little
> they need to adapt when they add DWARF64 support.
> Currently the only problem I can think of is readelf -S (from my
> understanding of
> https://sourceware.org/pipermail/binutils/2020-November/114116.html ,
> Nick will be happy if I write a GNU readelf patch to make the section
> header table dump look good:) )

My main concern is that the new type is mostly useless for everyone,
except that it provides a small hint for the linker, which is not very
important because the same information can be easily and reliably
gathered in place. Is that hint really so useful that deserves special
support on the ELF specs level?

--
Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.

Robinson, Paul via llvm-dev

unread,
Nov 17, 2020, 12:20:50 PM11/17/20
to Fāng-ruì Sòng, Alexander Yermolovich, llvm...@lists.llvm.org


> -----Original Message-----
> From: Fāng-ruì Sòng <mas...@google.com>
> Sent: Tuesday, November 17, 2020 1:51 AM
> To: Alexander Yermolovich <ayer...@fb.com>
> Cc: David Blaikie <dbla...@gmail.com>; Wenlei He <wen...@fb.com>; llvm-
> d...@lists.llvm.org; Robinson, Paul <paul.r...@sony.com>; James
> Henderson <jh737...@my.bristol.ac.uk>; Eric Christopher
> <echr...@gmail.com>; Igor Kudrin <iku...@accesssoftek.com>
> Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"
>
This is a problem only if the .debug_str section *by itself* exceeds 4GB;
are we anticipating that will happen IRL? The section is just a string
section, by itself it has no 32/64 format.

If the .debug_str section *by itself* exceeds 4GB, then yes any string
with a 32-bit reference to it must be in the first 4GB. Strings that
have only 64-bit references to them can be sorted to the end of the
section, if necessary. I wouldn't think anyone guarantees or cares
about the order of strings within a string section.

But I think this would be the very last thing to care about, with regard
to DWARF-64 concerns.
--paulr

Fāng-ruì Sòng via llvm-dev

unread,
Nov 17, 2020, 7:49:42 PM11/17/20
to Igor Kudrin, llvm...@lists.llvm.org

I don't like the "if .debug_info looks like DWARF64, mark .debug_str
approach" because it makes a section's behavior dependent on other
sections in the input file, diverging a lot from the current way
existing output section descriptions/input section descriptions are
handled. This is about system consistency that I care a lot and really
don't want to break giving that we have compelling alternative designs.

>>I also felt bad as I had to do string comparison on ".debug_"
>>(https://reviews.llvm.org/D91404)
>
>Well, that is a common way to find sections with debugging information, no?
>
>>I've chatted with gdb folks: gdb (gdb/dwarf2/read.c) is agnostic about
>>the section type. (They just cannot handle multiple .debug_info
>>sections.)
>>SHT_PROGBITS is a default (or "use this if nothing more specific
>>exists") section type that probably no tool will specifically check
>>the type.
>>For many tools, they don't understand DWARF64 yet - there is little
>>they need to adapt when they add DWARF64 support.
>>Currently the only problem I can think of is readelf -S (from my
>>understanding of
>>https://sourceware.org/pipermail/binutils/2020-November/114116.html ,
>>Nick will be happy if I write a GNU readelf patch to make the section
>>header table dump look good:) )
>
>My main concern is that the new type is mostly useless for everyone,
>except that it provides a small hint for the linker, which is not very
>important because the same information can be easily and reliably
>gathered in place. Is that hint really so useful that deserves special
>support on the ELF specs level?

It is not useless. Avoiding string comparison on ".debug_" is one thing
(sometimes this can improve performance a bit). The linker complexity is
another. As I mentioned in my previous reply, "SHT_PROGBITS+SHT_DWARF64
=> SHT_DWARF64" makes the scheme automatically work with relocatable links.

There are 0x60000000 generic section type values and 0x10000000 OS
specific values. The resource is abundant.

On https://groups.google.com/g/generic-abi/c/i2Xio-47QdQ , Cary Coutant
has made an alternative proposal by adding a new section flag.
Personally I prefer a section type a section flag with reasons explained
by Ali Bahrami. People are still arguing but I appreciate that the theme
of the discussion is that people acknowledge the use cases and agree to
use an appropriate ELF-level thing, instead of adding ad-hoc rules to
the linker.

Fāng-ruì Sòng via llvm-dev

unread,
Nov 18, 2020, 1:07:45 AM11/18/20
to James Henderson, llvm...@lists.llvm.org
In https://groups.google.com/g/generic-abi/c/i2Xio-47QdQ (you need to
join the group before making a post)
Cary Coutant raised yet another idea: whether we can use ".debug64" as
the section prefix. I like the idea because of:

* It is immediately obvious whether DWARF64 is used and whether
DWARF32 is used along with DWARF64.
* In a relocatable link mixing DWARF32 and DWARF64 sections, DWARF32
and DWARF64 sections will naturally not get mixed. (For a relocation
based approach, if DWARF64 is the first input section, the output may
appear as a "DWARF64" because the proposed approach only checks the
first relocation)

On the other hand,

* It is (slightly) non-conformant because of the different section names.
* Tooling support. Some commonly used consumers have recognized
DWARF64. We'll have to teach these tools about new section names. The
number of sections to recognize has doubled. This may result in a fair
amount of complexity (DWARFContext/MCObjectFileInfo/llvm-dwarfdump
-debug* options/ld.lld --gdb-index are things I can immediately think
of).

On balance I think this is not as good as the section type idea.

--

Igor Kudrin via llvm-dev

unread,
Nov 18, 2020, 2:32:19 AM11/18/20
to Robinson, Paul, Fāng-ruì Sòng, Alexander Yermolovich, llvm...@lists.llvm.org
On 18.11.2020 0:20, Robinson, Paul wrote:
> This is a problem only if the .debug_str section *by itself* exceeds 4GB;
> are we anticipating that will happen IRL? The section is just a string
> section, by itself it has no 32/64 format.
>
> If the .debug_str section *by itself* exceeds 4GB, then yes any string
> with a 32-bit reference to it must be in the first 4GB. Strings that
> have only 64-bit references to them can be sorted to the end of the
> section, if necessary. I wouldn't think anyone guarantees or cares
> about the order of strings within a string section.
>
> But I think this would be the very last thing to care about, with regard
> to DWARF-64 concerns.

I guess that the relative size of the ".debug_str" section may vary and depends on the source code, particular build environment, and lots of other circumstances. I've checked some fresh built samples and always see that the section is usually close in size to ".debug_info" and sometimes even bigger. So, this section must be ordered similarly as all other debugging info sections.

--
Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.

James Henderson via llvm-dev

unread,
Nov 18, 2020, 3:51:49 AM11/18/20
to Fāng-ruì Sòng, llvm...@lists.llvm.org
I was thinking about whether I liked the section name idea myself. I think a slight refinement could be to have linkers combine .debug and .debug64 sections into one output section (in non -r links), so that end consumers (debuggers etc) don't have to worry about it. However, conformance is still a concern to me as we cannot really retrofit the existing standard versions, and the section names themselves are in the standard. That means that tools that otherwise would work might stop working when presented with a "new" DWARFv3/4/5 output that it in theory could otherwise handle. This applies to both debuggers who don't know about the support and tools like llvm-dwarfdump which work on intermediate objects until they get updated. One final concern with the section name approach is that there are tools that look for the debug sections in general (e.g. llvm-objcopy --strip-debug), which use the prefix ".debug_" to identify such sections, rather than .debug, so .debug64 would be a bad name to use (although you could do .debug_64_info or .debug_info_64 probably safely).

Pavel Labath via llvm-dev

unread,
Nov 18, 2020, 4:14:12 AM11/18/20
to jh737...@my.bristol.ac.uk, Fāng-ruì Sòng, llvm...@lists.llvm.org
On 18/11/2020 09:51, James Henderson via llvm-dev wrote:
> I was thinking about whether I liked the section name idea myself. I
> think a slight refinement could be to have linkers combine .debug and
> .debug64 sections into one output section (in non -r links), so that end
> consumers (debuggers etc) don't have to worry about it.

I was just thinking about this as well, and I think that we should still
have linkers merge the .debug_ and .debug64 sections in the final links
(essentially replacing the section type-based matching in the original
proposal with section name-based matching).

It's true that the .debug64 names are not really conforming, but otoh,
the reason why the SHT_DWARF64 idea was conforming is that DWARF is very
hand-wavy when it comes to how it interacts with linkers and object file
formats.

pl

Teresa Johnson via llvm-dev

unread,
Nov 18, 2020, 11:07:12 AM11/18/20
to David Blaikie, llvm...@lists.llvm.org
On Fri, Nov 13, 2020 at 9:52 AM David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:
On Fri, Nov 13, 2020 at 8:35 AM Wenlei He via llvm-dev
<llvm...@lists.llvm.org> wrote:
>
> >  Thinking about it, I wouldn't expect an LTO generated object itself to have a mixture of DWARF32/64, although I guess the 32/64 bit state could be encoded in the IR (I am not familiar enough with it to know if it actually is or not). It might be necessary to find ways to configure LTO to generate DWARF64, possibly via a link-time option.
>
>
>
> I don’t think we need to encode dwarf32/64 in IR as attribute for each module. We’re not going to emit mixed dwarf32/64 for merged LTO module anyways, so allowing each module to express its dwarf setting would only introduce burden for LTO to deal with inconsistency (warning?) among input modules. Having a linker switch to pass the setting from driver to LTO sounds better to me.

Usually the issue there is that existing build systems may be setup
only to pass such flags to the compilations, and not to the link
invocations - like DWARF version, we pass that down through IR, emit
warnings/errors when two IR modules with different DWARF versions are
linked together, and then emit only one (the higher, I believe) DWARF
version out the other end.

We aren't 100% consistent on this "anything you could do without LTO,
you shuold be able to do with LTO/passing the same flags to the same
actions" kind of strategy (eg: type units and DWARF compression aren't
passed down through IR - if you want those you have to pass them to
the link invocation (via the clang driver) yourself). So it's more "is
there a systemic use of these flags already for the compilation and
would not supporting it there be a pain"? It's probably not for
DWARF64, since we haven't had any flag to support it at the moment
anyway.

Haven't followed this whole discussion closely, but +1 on this. Sounds like a good use of module flags that automatically pick the "max" value on merge. For anything that in a non-LTO build would only need to be passed to the compile step and not affect the link, we should strive to do the same with LTO (there are some legacy things that are passed through the driver at link time, but want to avoid new cases).

Thanks,
Teresa


--
Teresa Johnson | Software Engineer | tejo...@google.com |

Alexander Yermolovich via llvm-dev

unread,
Nov 18, 2020, 1:35:09 PM11/18/20
to Fāng-ruì Sòng, Igor Kudrin, llvm...@lists.llvm.org
My concern with using section type, is that it does modify ELF format spec, and can break various tools that rely on this information. This sems somewhat of a heavy handed approach to solving this problem.

Alternatively, if we do want to go with something more official then just doing it in a linker using first reloc, why not use sh_info? Seems like it's made for providing an extra information for each section_type. In this case .debug_*.

With it we have a current behavior of using names which as far as I can tell the default for figuring out debug sections. If producer provides this extra information the linker can improve layout to help with debug sections overflows, if producer doesn't provide this information, then it's a current behavior which also adheres to DWARF spec that says if DWARF32 and DWARF64 the onus is on the user.

Alex

From: Fāng-ruì Sòng <mas...@google.com>
Sent: Tuesday, November 17, 2020 4:49 PM
To: Igor Kudrin <iku...@accesssoftek.com>
Cc: David Blaikie <dbla...@gmail.com>; Wenlei He <wen...@fb.com>; Alexander Yermolovich <ayer...@fb.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>; Robinson, Paul <paul.r...@sony.com>; James Henderson <jh737...@my.bristol.ac.uk>; Eric Christopher <echr...@gmail.com>

Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"
>in https://reviews.llvm.org/D91404 , and then spreading the assessment

Cary Coutant via llvm-dev

unread,
Nov 18, 2020, 3:13:27 PM11/18/20
to Alexander Yermolovich, llvm...@lists.llvm.org
> My concern with using section type, is that it does modify ELF format spec, and can break various tools that rely on this information. This sems somewhat of a heavy handed approach to solving this problem.
>
> Alternatively, if we do want to go with something more official then just doing it in a linker using first reloc, why not use sh_info? Seems like it's made for providing an extra information for each section_type. In this case .debug_*.

No, sh_info is a poor fit for this. Its usage is implied by the
sh_type, and, if used, always contains either a symbol table index or
a section table index. A section flag would be far more appropriate.

-cary

Cary Coutant via llvm-dev

unread,
Nov 18, 2020, 3:15:47 PM11/18/20
to jh737...@my.bristol.ac.uk, llvm...@lists.llvm.org
> Thinking about it, it would probably be wise to raise this discussion on the DWARF mailing list too. They might want to put an addendum in the spec/DWARF wiki/somewhere appropriate along the lines of "ELF support for DWARF64", which can be retroactively applied to existing standards. The committee may also have some thoughts on how tools are expected to work with DWARF64 and DWARF32 mixtures.

Eric, Paul, and I are members of the DWARF committee, so you can be
assured that we will bring this issue up there!

-cary


-cary

Cary Coutant via llvm-dev

unread,
Nov 18, 2020, 3:21:43 PM11/18/20
to Igor Kudrin, llvm...@lists.llvm.org
> > If the .debug_str section *by itself* exceeds 4GB, then yes any string
> > with a 32-bit reference to it must be in the first 4GB. Strings that
> > have only 64-bit references to them can be sorted to the end of the
> > section, if necessary. I wouldn't think anyone guarantees or cares
> > about the order of strings within a string section.
> >
> > But I think this would be the very last thing to care about, with regard
> > to DWARF-64 concerns.
>
> I guess that the relative size of the ".debug_str" section may vary and depends on the source code, particular build environment, and lots of other circumstances. I've checked some fresh built samples and always see that the section is usually close in size to ".debug_info" and sometimes even bigger. So, this section must be ordered similarly as all other debugging info sections.

I agree with Paul, and I'm surprised by your findings. Can you post
some actual numbers? In my experience, .debug_str is an order of
magnitude smaller than .debug_info. Perhaps the binaries you're
looking at haven't been string-merged?

-cary

Igor Kudrin via llvm-dev

unread,
Nov 19, 2020, 1:00:06 AM11/19/20
to Cary Coutant, llvm...@lists.llvm.org
On 19.11.2020 03:21, Cary Coutant wrote:
>>> If the .debug_str section *by itself* exceeds 4GB, then yes any string
>>> with a 32-bit reference to it must be in the first 4GB. Strings that
>>> have only 64-bit references to them can be sorted to the end of the
>>> section, if necessary. I wouldn't think anyone guarantees or cares
>>> about the order of strings within a string section.
>>>
>>> But I think this would be the very last thing to care about, with regard
>>> to DWARF-64 concerns.
>>
>> I guess that the relative size of the ".debug_str" section may vary and depends on the source code, particular build environment, and lots of other circumstances. I've checked some fresh built samples and always see that the section is usually close in size to ".debug_info" and sometimes even bigger. So, this section must be ordered similarly as all other debugging info sections.
>
> I agree with Paul, and I'm surprised by your findings. Can you post
> some actual numbers? In my experience, .debug_str is an order of
> magnitude smaller than .debug_info. Perhaps the binaries you're
> looking at haven't been string-merged?

As an example, here are the numbers for a fresh built CLANG in Debug
mode on Ubuntu 20.04 using GCC 10.2:

$ readelf -SW clang-12 | grep debug | awk '{print $2, $6}' | column -t
.debug_aranges 00c240
.debug_info 34d1fd
.debug_abbrev 00841f
.debug_line 023093
.debug_str 53685f
.debug_ranges 00c300

As you can see, ".debug_str" is visibly bigger than ".debug_info". Of
course, CLANG does not suffer from DWARF32 limits. This is just a
relatively large project, which anyone can easily check by themselves.

--
Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.​

Alexander Yermolovich via llvm-dev

unread,
Nov 19, 2020, 1:18:25 PM11/19/20
to Igor Kudrin, Cary Coutant, llvm...@lists.llvm.org
Debug info section is still only 3.4 MB. 
Just to add another data point. Internally for statically linked library where debug info section varies from 3.3GB to 4GB, the debug_strng section varies from 1.8GB to 2.2GB.

I guess fundamental question is as debug_info grows from MB to GB, how well does debug_str scales.

Alex

From: Igor Kudrin <iku...@accesssoftek.com>
Sent: Wednesday, November 18, 2020 9:59 PM
To: Cary Coutant <ccou...@gmail.com>
Cc: Robinson, Paul <paul.r...@sony.com>; Fāng-ruì Sòng <mas...@google.com>; Alexander Yermolovich <ayer...@fb.com>; llvm...@lists.llvm.org <llvm...@lists.llvm.org>

Subject: Re: [llvm-dev] [LLD] Support DWARF64, debug_info "sorting"
Reply all
Reply to author
Forward
0 new messages