_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
No I haven't. Thank you for the pointer.Looks like the problem of the inverted edges was discussed there. But I guess my bigger question is this: why do we still create one big .eh_frame even if -ffunction-sections is given?When the option is given, Clang creates .text, .rela.text and .gcc_exception_table sections for each function, but it still creates a monolithic .eh_frame that covers all function sections, which seems odd to me.
The .eh_frame section (which is basically a DWARF .debug_frame section) was not designed with deduplication/gc in mind. I haven't studied it closely, but it looks like the bulk of it is frame descriptions which are divided up basically per-function, with some common overhead factored out. If you want to put each per-function part into its own ELF section, there's overhead for that which you are more aware of than I am, and then either you need to replicate the common part into each per-function section or accept a relocation from each per-function section into the separate common section.
Looking at my latest clang build in Ubuntu, the executable has 96320 frame descriptions of which all but one use the same common part; in this case, that common part is 24 bytes. The size is not fixed, but is guaranteed to be a multiple of the target address size, and it probably can't be any smaller than 24 on a normal machine. This might help give you some estimates about the size effect of different choices.
HTH,
--paulr
The .eh_frame section (which is basically a DWARF .debug_frame section) was not designed with deduplication/gc in mind. I haven't studied it closely, but it looks like the bulk of it is frame descriptions which are divided up basically per-function, with some common overhead factored out. If you want to put each per-function part into its own ELF section, there's overhead for that which you are more aware of than I am, and then either you need to replicate the common part into each per-function section or accept a relocation from each per-function section into the separate common section.
Looking at my latest clang build in Ubuntu, the executable has 96320 frame descriptions of which all but one use the same common part; in this case, that common part is 24 bytes. The size is not fixed, but is guaranteed to be a multiple of the target address size, and it probably can't be any smaller than 24 on a normal machine. This might help give you some estimates about the size effect of different choices.
Hi,
There will be problems with eh_frame_hdr. Eh_frame_hdr is needed to use the binary search instead of the linear search. Having eh_frame per a function will cause no eh_frame_hdr or multiple eh_frame_hdr and will degrade search from binary to linear.
As we create eh_frame_hdr in most cases there is no problem to filter out garbage eh_frame sections. If there is information about unused symbols, the implementation is very simple. BTW there is no need to do full decoding of eh_frame records to remove garbage.
Paul is right there will be code size overhead. Eh_frame is usually created per a compilation module with common information in CFI. Multiple eh_frames will cause a lot of redundant CFI. There might be a case when the total size of redundant CFIs will be greater than the total size of removed garbage.
Thanks,
Evgeny Astigeevich
The Arm Compiler Optimization team
Hi,
There will be problems with eh_frame_hdr. Eh_frame_hdr is needed to use the binary search instead of the linear search. Having eh_frame per a function will cause no eh_frame_hdr or multiple eh_frame_hdr and will degrade search from binary to linear.
As we create eh_frame_hdr in most cases there is no problem to filter out garbage eh_frame sections. If there is information about unused symbols, the implementation is very simple. BTW there is no need to do full decoding of eh_frame records to remove garbage.
Paul is right there will be code size overhead. Eh_frame is usually created per a compilation module with common information in CFI. Multiple eh_frames will cause a lot of redundant CFI. There might be a case when the total size of redundant CFIs will be greater than the total size of removed garbage.
Note that at least on MIPS you pretty much have to do that anyway to
convert absolute addresses info PC-relative references due to the f**ked
up intra-section constraints.
Joerg
The section is created by the linker, it doesn't matter from an input
perspective.
Joerg
Hi Rui,
It is my fault. I misread your RFC. Now I see it is about to do this in the compiler.
Yes, a linker does all needed magic. It combines all eh_frames, removes garbage and creates eh_frame_hdr.
And yes, your proposal will simplify garbage collection. The main advantage is that you do not need to parse eh_frames.
Thanks,
Evgeny
It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?
So, is there any difference whether it knows that in one place
or two?
Hi Igor,
> It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?
Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to.
> So, is there any difference whether it knows that in one place or two?
What do you mean “one place or two”? If .eh_frame_hdr is not created a linker does not need to parse .eh_frame sections. It simply merges them into one section. The format of .eh_frame allows to do this without parsing .eh_frame sections.
Thanks,
Evgeny Astigeevich
From: Igor Kudrin <ikudr...@gmail.com>
Date: Thursday, 9 November 2017 at 11:29
To: Rui Ueyama <ru...@google.com>, Evgeny Astigeevich <Evgeny.As...@arm.com>
Cc: "llvm...@lists.llvm.org" <llvm...@lists.llvm.org>, nd <n...@arm.com>
Subject: Re: [llvm-dev] [RFC] Making .eh_frame more linker-friendly
Hi Evgeny,
> Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to.
> What do you mean “one place or two”?
If I understand it right, the RFC is about helping a linker to eliminate unneeded .eh_frame items when performing GC. But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way, does it make sense to make this change to simplify only
a small part of a linker?
> But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way,
> does it make sense to make this change to simplify only a small part of a linker?
For huge C++ projects this could improve link time if GC is a bottleneck. It will also improve eh_frame_hdr build time because you don’t spend time on parsing garbage. However a linker will have to have two versions of GC: one with parsing eh_frames and another without parsing. There can be input object files where .eh_frame is not split.
-Evgeny
From: Igor Kudrin <iku...@accesssoftek.com>
Date: Friday, 10 November 2017 at 12:23
To: Evgeny Astigeevich <Evgeny.As...@arm.com>
On the other hand, in this case, you have to deal with lots of small sections. I can't say for sure, which variant is quicker. Anyway, lld doesn't do deep parsing of .eh_frame sections currently, nor need it for GC. Just splits them into FDEs and CIEs and
then analyzes corresponding relocations. Thus, the amount of required work is more or less the same, whether we have separate sections or have to split the monolithic one.
I tried to investigate how to implement this and suddenly found that Rafael
looks already tried to to the same in 2015: https://marc.info/?l=llvm-commits&m=144683596826489.
Basing on comments, approach worked but had slowdowns for bfd and crashed
gold (or made it slower too). I can try to investigate this again and either reimplement
or ressurrect that code to check how multiple .eh_frame sections behave
nowadays, but few things are unclear for me.
Rafael, you wrote in description that you would not land that patch that time, do
you also think now we should try this again ? (since we have production ready LLD now).
(Assuming we want to try this again)
What are we going to do with possible slowdowns of GNU linkers ? I guess some option can be introduced
to keep old behavior for them, but than also not sure how LLD should handle 2 different types of
inputs (with single/multiple .eh_frame).
Any considerations of what should we try/check at first ? (I guess perfomance numbers for bfd/gold/LLD
is major point of interest, size of output, anything else ?).
George.
Clarification: not the same, but something close. At least there are multiple .eh_frame sections in output.
As far I understand what we need for this experiment is for each text section emit its own .eh_frame and set
SHF_LINK_ORDER and sh_link field for the latter pointing to .text section. Looks code from patch above
can be adopted to do that.
>Thank you for taking a look. I think that the answer depends on how much slower GNU linkers are with separate .eh_frame sections. If it is not too slow, it may make >sense to generate split .eh_frame sections unconditionally. Otherwise, we might want to add a new option so that clang doesn't produce split .eh_frame sections by >default.
I'll start investigating the implementation and try to update/reimplement and check that all.
(Can probably take some time because want to investigate llvm/mc code a little as it is new for me).
Just finished lib/MC patch that creates .eh_frame section for each text sections,
sets SHF_LINK_ORDER flag and proper sh_link field.
Patch based on Rafael's code mentioned earlier in this thread, uploaded it just
in case anyone is interested to look at current version,
it is here: https://reviews.llvm.org/D40352.
Now going to start testing of how it works with GNU and LLD linkers.
George.
George.
>In that way only x64 target is affected and SHT_X86_64_UNWIND sections are known to be .eh_frame so can be
George.
>What we gonna do next ? My plan is to prepare demo patch for LLD to stop parsing .eh_frames
>
>George.
Demo patch for LLD that stops parsing .eh_frame during GC stage
is: https://reviews.llvm.org/D40484. With it LLD GC code is slightly simpler.
I tested it both together with lib/mc patch for emiting multiple .eh_frames
(https://reviews.llvm.org/D40352) and along, it looks it has no any visible effect on
perfomance by itself.
George.
I've committed a patch in gold that should fix this problem:
https://sourceware.org/ml/binutils/2017-11/msg00541.html
Can you try gold again with this patch applied? You should at least
get a little further.
If it still doesn't work, could I trouble you for a sample object file?
-cary
I'll try it soon and return with results, thanks !
Just a small clarification: your commit message saying that "LLVM is experimenting with placing
.eh_frame sections in the COMDAT group", that is true for original experiment which faced this gold issue
(https://marc.info/?l=llvm-commits&m=144683596826489), current approach I am experimented
with (https://reviews.llvm.org/D40352) sets sh_link of .eh_frame to corresponding .text section instead.
Though anyways issue looks to be that gold did not like to have multiple .eh_frame sections and that
is what both approaches faced with and your patch seeems fixes that.
George.
I can confirm your patch fixes gold behavior.
I built latest binutils and performed benchmark tests again today.
Following versions of linkers were used:
* GNU ld (GNU Binutils) 2.29.51.20171129
* GNU gold (GNU Binutils 2.29.51.20171129) 1.14
* LLD 6.0.0 (trunk 319302) (compatible with GNU linkers)
--no-threads was set for gold and LLD tests.
Clang link time with single .eh_frame section in objects.
* ld.bfd: 2,940055212 seconds time elapsed ( +- 0,17% ) (t1)
* ld.gold: 0,994370076 seconds time elapsed ( +- 0,11% ) (t2)
* LLD: 0,445566042 seconds time elapsed ( +- 0,32% ) (t3)
Clang link time with multiple .eh_frame sections in objects.
(latest diff 3 of https://reviews.llvm.org/D40352 applied to produce them).
* ld.bfd: 3,792698701 seconds time elapsed ( +- 0,19% ) (1,29*t1)
* ld.gold: 1,135187654 seconds time elapsed ( +- 0,10% ) (1,1416*t2)
* LLD: 0,506076638 seconds time elapsed ( +- 0,31% ) (1,1358*t3)
I am very interested in reviving this.
Did anyone get any further with these ideas?
@Grimar: Did you do any profiling of the code? Were the slowdowns
you were seeing fundamental (i.e. due to IO) or could a more optimal
implementation reduce the slowdown? Did you do any end to end
timings for compilation + link time?
The same issues arise for all metadata sections:
.eh_frame
.debug_*
.stack_sizes
etc...
In our proprietary linker we've had to implement special handling
for each of these sections, which we'd love to be able to remove or
reduce.
One fundamental problem is overhead. I posted about
this on the gabi list:
https://groups.google.com/d/msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
Take the .stack_sizes section. There is an llvm bug
which suggests that we should produce multiple .stack_size
sections rather than one monolithic .stack_size section:
https://bugs.llvm.org/show_bug.cgi?id=36717.
However, for .stack_sizes the payload is on average 10 bytes
per function. Splitting into multiple sections on elf x86-64 adds
a overhead of 128 bytes per function. An order of magnitude
increase. However, I don't know of any way to avoid this increase
without adding special case code to the linker?
Another thought is that although the gnu linkers are a concern
upstream, on our platform (and others where we are fully in control),
we could use this approach for .eh_frame. We would be able to test
and maintain the separate code paths in the backend.
>@Grimar: Did you do any profiling of the code? Were the slowdowns
>you were seeing fundamental (i.e. due to IO) or could a more optimal
>implementation reduce the slowdown? Did you do any end to end
>timings for compilation + link time?
George.