[llvm-dev] [RFC] Making .eh_frame more linker-friendly

Rui Ueyama via llvm-dev

unread,

Oct 25, 2017, 9:42:48 PM10/25/17

to llvm-dev

Hi,

Many linkers including lld have a feature to eliminate unused sections from output to make output smaller (which is essentially a mark-sweep gc where sections are vertices and relocations are edges). lld and GNU gold have yet another feature, ICF, to merge functions by contents to save more space.

When we remove or merge a function, we want to eliminate its exception handling information as well. But that isn't very easy to do due to the format of .eh_frame. Here are reasons:

1. Linkers have to parse, split, eliminate exception handling information for dead functions, and then reconstruct an .eh_frame section. It is tedious, and it doesn't feel very much like a task that linkers have to do (linkers usually handle sections as opaque blobs and are agnostic of section contents.) That is contrary to other data where section is the atomic unit of inclusion/elimination.

2. From the viewpoint of gc, .eh_frame has reverse edges to sections. Usually, if section A depends on section B, there's a relocation in A pointing to B. But that isn't the case for .eh_frame, but opposite. If section A has exception handling information in .eh_frame section B, B has a relocation against A. This makes implementing a gc tricky, and when it is combined to (1), it is more tricky.

3. Comparing .eh_frame contents for equivalence is hard. In order to merge functions by contents, we need to verify that their exception handling information is also the same, but doing it isn't easy given the current .eh_frame format.

So, I don't feel .eh_frame needed to be designed that way. Maybe we can improve. Here is my rough idea:

1. We can emit an .eh_frame section for each .text section. So, if you pass -ffunction-sections, the resulting object file would have multiple .eh_frame sections. This makes .eh_frame a unit of garbage collection and eliminates the need to parse .eh_frame contents. It also makes it very easy to compare .eh_frame sections for function merging.

2. Make each .eh_frame section have a link to its .text section. We could set a section index of a .text section to its corresponding .eh_frame's sh_link field. This would make gc much easier. (If text section A is pointed by an .eh_frame section B via sh_link, that A is alive means B is alive. It is still reverse, but this is much more manageable.)

I think doing the above things doesn't break the compatibility with existing linkers, and new linkers can take advantage of the format that is more friendly to the linker. I don't think of any obvious disadvantage of doing them, except that we would have more sections, but I may be wrong as I'm no expert of .eh_frame.

What do you guys think?

Reid Kleckner via llvm-dev

unread,

Oct 26, 2017, 12:47:26 PM10/26/17

to Rui Ueyama, llvm-dev

Have you seen the discussion of SHF_LINK_ORDER on the generic-abi@ mailing list? I think it implements exactly what you describe. My understanding is that ARM EHABI leverages this for the same purpose.

https://groups.google.com/forum/#!topic/generic-abi/_CbBM6T6WeM

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Rui Ueyama via llvm-dev

unread,

Oct 26, 2017, 2:20:04 PM10/26/17

to Reid Kleckner, llvm-dev

No I haven't. Thank you for the pointer.

Looks like the problem of the inverted edges was discussed there. But I guess my bigger question is this: why do we still create one big .eh_frame even if -ffunction-sections is given?

When the option is given, Clang creates .text, .rela.text and .gcc_exception_table sections for each function, but it still creates a monolithic .eh_frame that covers all function sections, which seems odd to me.

Reid Kleckner via llvm-dev

unread,

Oct 26, 2017, 2:55:41 PM10/26/17

to Rui Ueyama, llvm-dev

On Thu, Oct 26, 2017 at 11:19 AM, Rui Ueyama <ru...@google.com> wrote:

No I haven't. Thank you for the pointer.

Looks like the problem of the inverted edges was discussed there. But I guess my bigger question is this: why do we still create one big .eh_frame even if -ffunction-sections is given?

When the option is given, Clang creates .text, .rela.text and .gcc_exception_table sections for each function, but it still creates a monolithic .eh_frame that covers all function sections, which seems odd to me.

I agree, we should fix it. :)

Robinson, Paul via llvm-dev

unread,

Oct 26, 2017, 2:58:30 PM10/26/17

to Rui Ueyama, Reid Kleckner, llvm...@lists.llvm.org

The .eh_frame section (which is basically a DWARF .debug_frame section) was not designed with deduplication/gc in mind. I haven't studied it closely, but it looks like the bulk of it is frame descriptions which are divided up basically per-function, with some common overhead factored out. If you want to put each per-function part into its own ELF section, there's overhead for that which you are more aware of than I am, and then either you need to replicate the common part into each per-function section or accept a relocation from each per-function section into the separate common section.

Looking at my latest clang build in Ubuntu, the executable has 96320 frame descriptions of which all but one use the same common part; in this case, that common part is 24 bytes. The size is not fixed, but is guaranteed to be a multiple of the target address size, and it probably can't be any smaller than 24 on a normal machine. This might help give you some estimates about the size effect of different choices.

HTH,

--paulr

Rui Ueyama via llvm-dev

unread,

Oct 26, 2017, 3:43:57 PM10/26/17

to Robinson, Paul, llvm...@lists.llvm.org

On Thu, Oct 26, 2017 at 11:58 AM, Robinson, Paul <paul.r...@sony.com> wrote:

The .eh_frame section (which is basically a DWARF .debug_frame section) was not designed with deduplication/gc in mind. I haven't studied it closely, but it looks like the bulk of it is frame descriptions which are divided up basically per-function, with some common overhead factored out. If you want to put each per-function part into its own ELF section, there's overhead for that which you are more aware of than I am, and then either you need to replicate the common part into each per-function section or accept a relocation from each per-function section into the separate common section.

Looking at my latest clang build in Ubuntu, the executable has 96320 frame descriptions of which all but one use the same common part; in this case, that common part is 24 bytes. The size is not fixed, but is guaranteed to be a multiple of the target address size, and it probably can't be any smaller than 24 on a normal machine. This might help give you some estimates about the size effect of different choices.

Yes, .eh_frame section consists of one or more CIE records followed by one or more FDE records. Common information in FDEs is factored out to an CIE to save space. So, if you split an .eh_frame into multiple smaller .eh_frame, you end up having more CIEs.

But the good news is that even existing linkers deduplicate CIEs by contents (that's why you saw only one CIE record in your executable, even though each input object file has at least one CIE record.) So the linked executables/DSOs would be the same size.

Evgeny Astigeevich via llvm-dev

unread,

Oct 26, 2017, 4:13:38 PM10/26/17

to Robinson, Paul, Rui Ueyama, Reid Kleckner, llvm...@lists.llvm.org, nd

Hi,

There will be problems with eh_frame_hdr. Eh_frame_hdr is needed to use the binary search instead of the linear search. Having eh_frame per a function will cause no eh_frame_hdr or multiple eh_frame_hdr and will degrade search from binary to linear.

As we create eh_frame_hdr in most cases there is no problem to filter out garbage eh_frame sections. If there is information about unused symbols, the implementation is very simple. BTW there is no need to do full decoding of eh_frame records to remove garbage.

Paul is right there will be code size overhead. Eh_frame is usually created per a compilation module with common information in CFI. Multiple eh_frames will cause a lot of redundant CFI. There might be a case when the total size of redundant CFIs will be greater than the total size of removed garbage.

Thanks,

Evgeny Astigeevich

The Arm Compiler Optimization team

Rui Ueyama via llvm-dev

unread,

Oct 26, 2017, 4:44:05 PM10/26/17

to Evgeny Astigeevich, llvm...@lists.llvm.org, nd

On Thu, Oct 26, 2017 at 1:13 PM, Evgeny Astigeevich <Evgeny.As...@arm.com> wrote:

Hi,

There will be problems with eh_frame_hdr. Eh_frame_hdr is needed to use the binary search instead of the linear search. Having eh_frame per a function will cause no eh_frame_hdr or multiple eh_frame_hdr and will degrade search from binary to linear.

Linkers would combine .eh_frame sections into one .eh_frame, so that's not an issue, no?

As we create eh_frame_hdr in most cases there is no problem to filter out garbage eh_frame sections. If there is information about unused symbols, the implementation is very simple. BTW there is no need to do full decoding of eh_frame records to remove garbage.

Paul is right there will be code size overhead. Eh_frame is usually created per a compilation module with common information in CFI. Multiple eh_frames will cause a lot of redundant CFI. There might be a case when the total size of redundant CFIs will be greater than the total size of removed garbage.

As I wrote in the previous message, I don't think there's a size issue in link results because even existing linkers merge CIEs by contents.

Joerg Sonnenberger via llvm-dev

unread,

Oct 26, 2017, 5:15:48 PM10/26/17

to llvm...@lists.llvm.org

On Wed, Oct 25, 2017 at 06:42:10PM -0700, Rui Ueyama via llvm-dev wrote:
> 1. Linkers have to parse, split, eliminate exception handling information
> for dead functions, and then reconstruct an .eh_frame section.

Note that at least on MIPS you pretty much have to do that anyway to
convert absolute addresses info PC-relative references due to the f**ked
up intra-section constraints.

Joerg

Joerg Sonnenberger via llvm-dev

unread,

Oct 26, 2017, 5:17:39 PM10/26/17

to llvm...@lists.llvm.org

On Thu, Oct 26, 2017 at 08:13:22PM +0000, Evgeny Astigeevich via llvm-dev wrote:
> There will be problems with eh_frame_hdr.

The section is created by the linker, it doesn't matter from an input
perspective.

Joerg

Evgeny Astigeevich via llvm-dev

unread,

Oct 26, 2017, 7:12:26 PM10/26/17

to Rui Ueyama, llvm...@lists.llvm.org, nd

Hi Rui,

It is my fault. I misread your RFC. Now I see it is about to do this in the compiler.

Yes, a linker does all needed magic. It combines all eh_frames, removes garbage and creates eh_frame_hdr.

And yes, your proposal will simplify garbage collection. The main advantage is that you do not need to parse eh_frames.

Thanks,

Evgeny

Igor Kudrin via llvm-dev

unread,

Nov 9, 2017, 6:29:32 AM11/9/17

to Rui Ueyama, Evgeny Astigeevich, llvm...@lists.llvm.org, nd

It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?

So, is there any difference whether it knows that in one place or two?

Best Regards,
Igor Kudrin
C++ Developer, Access Softek, Inc.

Evgeny Astigeevich via llvm-dev

unread,

Nov 10, 2017, 6:41:56 AM11/10/17

to Igor Kudrin, llvm...@lists.llvm.org, nd

Hi Igor,

> It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?

Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to.

> So, is there any difference whether it knows that in one place or two?

What do you mean “one place or two”? If .eh_frame_hdr is not created a linker does not need to parse .eh_frame sections. It simply merges them into one section. The format of .eh_frame allows to do this without parsing .eh_frame sections.

Thanks,

Evgeny Astigeevich

From: Igor Kudrin <ikudr...@gmail.com>
Date: Thursday, 9 November 2017 at 11:29
To: Rui Ueyama <ru...@google.com>, Evgeny Astigeevich <Evgeny.As...@arm.com>
Cc: "llvm...@lists.llvm.org" <llvm...@lists.llvm.org>, nd <n...@arm.com>
Subject: Re: [llvm-dev] [RFC] Making .eh_frame more linker-friendly

It sounds like the linker has to be aware of the .eh_frame section details to be able to generate .eh_frame_hdr and eliminate duplicate CIEs, right?

Igor Kudrin via llvm-dev

unread,

Nov 10, 2017, 7:23:28 AM11/10/17

to Evgeny Astigeevich, llvm...@lists.llvm.org, nd

Hi Evgeny,

> Yes, a linker needs some details but not all of them. It needs to know sizes of records and initial locations (PC Begin) to find out which functions FDEs belong to.

So, it still needs some details. Not all of them, but anyway, handling of .eh_frame sections is still a special case, even if we split all the content at compile time.

> What do you mean “one place or two”?

If I understand it right, the RFC is about helping a linker to eliminate unneeded .eh_frame items when performing GC. But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way, does it make sense to make this change to simplify only a small part of a linker?

Best Regards,

Igor Kudrin

C++ Developer, Access Softek, Inc.

From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Evgeny Astigeevich via llvm-dev <llvm...@lists.llvm.org>
Sent: Friday, November 10, 2017 6:41 PM
To: Igor Kudrin
Cc: llvm...@lists.llvm.org; nd

Evgeny Astigeevich via llvm-dev

unread,

Nov 10, 2017, 8:27:29 AM11/10/17

to Igor Kudrin, llvm...@lists.llvm.org, nd

> But if we still need to deal with CIEs and generate .eh_frame_hdr in a special way,

> does it make sense to make this change to simplify only a small part of a linker?

For huge C++ projects this could improve link time if GC is a bottleneck. It will also improve eh_frame_hdr build time because you don’t spend time on parsing garbage. However a linker will have to have two versions of GC: one with parsing eh_frames and another without parsing. There can be input object files where .eh_frame is not split.

-Evgeny

From: Igor Kudrin <iku...@accesssoftek.com>

Date: Friday, 10 November 2017 at 12:23
To: Evgeny Astigeevich <Evgeny.As...@arm.com>

Igor Kudrin via llvm-dev

unread,

Nov 10, 2017, 9:23:23 AM11/10/17

to Evgeny Astigeevich, llvm...@lists.llvm.org, nd

On the other hand, in this case, you have to deal with lots of small sections. I can't say for sure, which variant is quicker. Anyway, lld doesn't do deep parsing of .eh_frame sections currently, nor need it for GC. Just splits them into FDEs and CIEs and then analyzes corresponding relocations. Thus, the amount of required work is more or less the same, whether we have separate sections or have to split the monolithic one.

Best Regards,

Igor Kudrin

C++ Developer, Access Softek, Inc.

From: Evgeny Astigeevich <Evgeny.As...@arm.com>
Sent: Friday, November 10, 2017 8:27 PM
To: Igor Kudrin
Cc: nd; llvm...@lists.llvm.org

Rui Ueyama via llvm-dev

unread,

Nov 14, 2017, 3:52:17 AM11/14/17

to Igor Kudrin, llvm...@lists.llvm.org, nd

Keeping .eh_frame separated should still simplifies the linker because until the last step of building .eh_frame and .eh_frame_hdr, we don't really need to parse .eh_frame sections. So, if we have separate .eh_frame sections on -ffunction-sections, all we have to do is (1) garbage-collect sections and (2) construct .eh_frame and .eh_frame_hdr sections from live .eh_frame sections. At step 1, .eh_frame can be handled as a blob of data. That is much simpler than (1) parsing all .eh_frame sections beforehand, (2) garbage-collect .eh_frame shards, and (3) re-construct .eh_frame and .eh_frame_hdr from .eh_frame shards.

In addition to that, keeping .eh_frame separated should greatly simplifies the logic of identical code folding when comparing not only function contents but exception handler information.

George Rimar via llvm-dev

unread,

Nov 20, 2017, 10:56:53 AM11/20/17

to rafael.e...@gmail.com, Rui Ueyama, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>Keeping .eh_frame separated should still simplifies the linker because
>until the last step of building .eh_frame and .eh_frame_hdr, we don't
>really need to parse .eh_frame sections. So, if we have separate .eh_frame
>sections on -ffunction-sections, all we have to do is (1) garbage-collect
>sections and (2) construct .eh_frame and .eh_frame_hdr sections from live
>.eh_frame sections. At step 1, .eh_frame can be handled as a blob of data.
>That is much simpler than (1) parsing all .eh_frame sections beforehand,
>(2) garbage-collect .eh_frame shards, and (3) re-construct .eh_frame and
>.eh_frame_hdr from .eh_frame shards.
>
>In addition to that, keeping .eh_frame separated should greatly simplifies
>the logic of identical code folding when comparing not only function
>contents but exception handler information.

I tried to investigate how to implement this and suddenly found that Rafael
looks already tried to to the same in 2015: https://marc.info/?l=llvm-commits&m=144683596826489.

Basing on comments, approach worked but had slowdowns for bfd and crashed
gold (or made it slower too). I can try to investigate this again and either reimplement
or ressurrect that code to check how multiple .eh_frame sections behave
nowadays, but few things are unclear for me.

Rafael, you wrote in description that you would not land that patch that time, do
you also think now we should try this again ? (since we have production ready LLD now).

(Assuming we want to try this again)
What are we going to do with possible slowdowns of GNU linkers ? I guess some option can be introduced
to keep old behavior for them, but than also not sure how LLD should handle 2 different types of
inputs (with single/multiple .eh_frame).

Any considerations of what should we try/check at first ? (I guess perfomance numbers for bfd/gold/LLD
is major point of interest, size of output, anything else ?).

George.

George Rimar via llvm-dev

unread,

Nov 20, 2017, 12:13:23 PM11/20/17

to rafael.e...@gmail.com, Rui Ueyama, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>I tried to investigate how to implement this and suddenly found that Rafael
>looks already tried to to the same in 2015: https://marc.info/?l=llvm-commits&m=144683596826489.

Clarification: not the same, but something close. At least there are multiple .eh_frame sections in output.

As far I understand what we need for this experiment is for each text section emit its own .eh_frame and set
SHF_LINK_ORDER and sh_link field for the latter pointing to .text section. Looks code from patch above
can be adopted to do that.

Rui Ueyama via llvm-dev

unread,

Nov 21, 2017, 2:52:08 AM11/21/17

to George Rimar, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

Thank you for taking a look. I think that the answer depends on how much slower GNU linkers are with separate .eh_frame sections. If it is not too slow, it may make sense to generate split .eh_frame sections unconditionally. Otherwise, we might want to add a new option so that clang doesn't produce split .eh_frame sections by default.

Either way, my aim is to make lld handle only split .eh_frame sections to do gc-sections or ICF. Supporting both non-split and split section doesn't simplify it at all and therefore doesn't make sense. It'd rather increase the work we need to do.

George Rimar via llvm-dev

unread,

Nov 21, 2017, 3:41:53 AM11/21/17

to Rui Ueyama, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>Thank you for taking a look. I think that the answer depends on how much slower GNU linkers are with separate .eh_frame sections. If it is not too slow, it may make >sense to generate split .eh_frame sections unconditionally. Otherwise, we might want to add a new option so that clang doesn't produce split .eh_frame sections by >default.

I'll start investigating the implementation and try to update/reimplement and check that all.

(Can probably take some time because want to investigate llvm/mc code a little as it is new for me).

>

>Either way, my aim is to make lld handle only split .eh_frame sections to do gc-sections or ICF. Supporting both non-split and split section doesn't simplify it at all and >therefore doesn't make sense. It'd rather increase the work we need to do.

Yes and that is my concern. I think we can't just drop supporting optimization of objects produced not by clang/llvm-mc.

If we drop supporting split section case, we will lose it and need to be sure that is fine for users of LLD.

That probably the question is if other compilers will want to do the something same or not.

Anyways that is a bit forward question probably, let me do experiments with it for start to check the idea.

George.

George Rimar via llvm-dev

unread,

Nov 22, 2017, 8:44:51 AM11/22/17

to llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

Just finished lib/MC patch that creates .eh_frame section for each text sections,

sets SHF_LINK_ORDER flag and proper sh_link field.

Patch based on Rafael's code mentioned earlier in this thread, uploaded it just

in case anyone is interested to look at current version,

it is here: https://reviews.llvm.org/D40352.

Now going to start testing of how it works with GNU and LLD linkers.

George.

George Rimar via llvm-dev

unread,

Nov 23, 2017, 10:16:58 AM11/23/17

to rafael.e...@gmail.com, Rui Ueyama, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

I performed tests basing on first diff of https://reviews.llvm.org/D40352.

(Creates .eh_frame for each .text.*, sets SHF_LINK_ORDER and .sh_link of created

.eh_frame section to point to corresponding .text.)

With use of GNU ld (GNU Binutils) 2.29.51.20171006 it reports errors when linking sample apps:

~/LLVM/Release/bin/clang++ test.cpp -ffunction-sections -o test.o

/usr/local/bin/ld: .eh_frame has both ordered [`.eh_frame' in /tmp/test-dbc52e.o] and unordered

[`.eh_frame' in /usr/lib/gcc/x86_64-linux-gnu/5.4.1/../../../x86_64-linux-gnu/crt1.o] sections

/usr/local/bin/ld: final link failed: Bad value

Looks it's code explicitly restricts mixing sections with link order flag and without:

https://github.com/gittup/binutils/blob/gittup/bfd/elflink.c#L9991

With GNU gold (GNU Binutils 2.29.51.20171006) 1.14 have an assert:

~/LLVM/Release/bin/clang++ test.cpp -ffunction-sections -o test.o

/usr/local/bin/ld: internal error in layout_eh_frame_section, at .././../gold/object.cc:1309

It is that place: https://github.com/gittup/binutils/blob/gittup/gold/object.cc#L1372

Did not investigate it, but it looks it is place (https://sourceware.org/ml/binutils/2009-06/msg00097.html)

mentioned in comment for https://marc.info/?l=llvm-commits&m=144683596826489.

LLD ~head fails here https://github.com/llvm-mirror/lld/blob/master/ELF/InputFiles.cpp#L392 as

we are trying to do cast to InputSection, though .eh_frame is not regular InputSection but EhInputSection.

Basing on all above I supposed it should be fine to not emit SHF_LINK_ORDER flags for start.

I changed implementation so that it emits multiple .eh_frames and sets sh_link field for

SHT_X86_64_UNWIND sections, but don't set the SHF_LINK_ORDER flag. In that way only

x64 target is affected and SHT_X86_64_UNWIND sections are known to be .eh_frame so can be

handled on linker side not by name, what is also nice for start it seems.

(Uploaded current version as diff 2 here: https://reviews.llvm.org/D40352)

With such change LLD and bfd links fine, and I was able to test clang link time. Clang does not

use exceptions, but x64 ABI still requires .eh_frame sections for functions, and result objects

contains many of them so I think it is reasonable test. What I did: built clang with and without

patch, used result compiler binaries for producing 2 sets of release clang objects

(flags were -fPIC -std=c++11 -ffunction-sections -fdata-sections), used them for testing link time

with use of ld.bfd and LLD.

Results (30 runs for each):

1) ld.bfd link time before patch: 2,922310802 seconds time elapsed ( +- 0,13% )

Size: 80 667 832 bytes

2) ld.bfd link time after patch: 3,773565826 seconds time elapsed ( +- 0,13% )

Size: 80 667 832 bytes

3) LLD link time before patch: 0,419247946 seconds time elapsed ( +- 0,48% )

Size: 80 738 240 bytes

4) LLD link time after patch: 0,460139012 seconds time elapsed ( +- 0,44% )

Size: 80 738 240 bytes

There is no difference in size (that is probably expected as linkers do deduplication optimization, so looks right for me),

slowdown for bfd is about 29%, for LLD about 9.7%.

What we gonna do next ? My plan is to prepare demo patch for LLD to stop parsing .eh_frames

for GC step and benchmark the results with use of -gc-sections. It also can show amount of LLD

code reduction/simplification we can have.

George.

George Rimar via llvm-dev

unread,

Nov 23, 2017, 10:26:18 AM11/23/17

to rafael.e...@gmail.com, Rui Ueyama, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>In that way only x64 target is affected and SHT_X86_64_UNWIND sections are known to be .eh_frame so can be

>handled on linker side not by name, what is also nice for start it seems.

>(Uploaded current version as diff 2 here: https://reviews.llvm.org/D40352)

Taking this back. If we will emit multiple eh_frame we anyways anyways will need either use some flag or

handle ".eh_frame" by name for other targets, so doing that for SHT_X86_64_UNWIND did not have much sence.

George.

George Rimar via llvm-dev

unread,

Nov 27, 2017, 7:45:51 AM11/27/17

to rafael.e...@gmail.com, Rui Ueyama, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>What we gonna do next ? My plan is to prepare demo patch for LLD to stop parsing .eh_frames

>for GC step and benchmark the results with use of -gc-sections. It also can show amount of LLD

>code reduction/simplification we can have.

>

>George.

Demo patch for LLD that stops parsing .eh_frame during GC stage

is: https://reviews.llvm.org/D40484. With it LLD GC code is slightly simpler.

I tested it both together with lib/mc patch for emiting multiple .eh_frames

(https://reviews.llvm.org/D40352) and along, it looks it has no any visible effect on

perfomance by itself.

George.

Cary Coutant via llvm-dev

unread,

Nov 28, 2017, 9:16:48 PM11/28/17

to George Rimar, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

> With GNU gold (GNU Binutils 2.29.51.20171006) 1.14 have an assert:
> ~/LLVM/Release/bin/clang++ test.cpp -ffunction-sections -o test.o
> /usr/local/bin/ld: internal error in layout_eh_frame_section, at
> .././../gold/object.cc:1309
> It is that place:
> https://github.com/gittup/binutils/blob/gittup/gold/object.cc#L1372
> Did not investigate it, but it looks it is place
> (https://sourceware.org/ml/binutils/2009-06/msg00097.html)
> mentioned in comment for
> https://marc.info/?l=llvm-commits&m=144683596826489.

I've committed a patch in gold that should fix this problem:

https://sourceware.org/ml/binutils/2017-11/msg00541.html

Can you try gold again with this patch applied? You should at least
get a little further.

If it still doesn't work, could I trouble you for a sample object file?

-cary

George Rimar via llvm-dev

unread,

Nov 29, 2017, 3:42:37 AM11/29/17

to Cary Coutant, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>> With GNU gold (GNU Binutils 2.29.51.20171006) 1.14 have an assert:
>> ~/LLVM/Release/bin/clang++ test.cpp -ffunction-sections -o test.o
>> /usr/local/bin/ld: internal error in layout_eh_frame_section, at
>> .././../gold/object.cc:1309
>> It is that place:
>> https://github.com/gittup/binutils/blob/gittup/gold/object.cc#L1372
>> Did not investigate it, but it looks it is place
>> (https://sourceware.org/ml/binutils/2009-06/msg00097.html)
>> mentioned in comment for
>> https://marc.info/?l=llvm-commits&m=144683596826489.
>
>I've committed a patch in gold that should fix this problem:
>
> https://sourceware.org/ml/binutils/2017-11/msg00541.html
>
>Can you try gold again with this patch applied? You should at least
>get a little further.
>
>If it still doesn't work, could I trouble you for a sample object file?
>
>-cary

I'll try it soon and return with results, thanks !

Just a small clarification: your commit message saying that "LLVM is experimenting with placing
.eh_frame sections in the COMDAT group", that is true for original experiment which faced this gold issue
(https://marc.info/?l=llvm-commits&m=144683596826489), current approach I am experimented
with (https://reviews.llvm.org/D40352) sets sh_link of .eh_frame to corresponding .text section instead.
Though anyways issue looks to be that gold did not like to have multiple .eh_frame sections and that
is what both approaches faced with and your patch seeems fixes that.

George.

George Rimar via llvm-dev

unread,

Nov 29, 2017, 7:59:06 AM11/29/17

to Cary Coutant, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>>> With GNU gold (GNU Binutils 2.29.51.20171006) 1.14 have an assert:
>>> ~/LLVM/Release/bin/clang++ test.cpp -ffunction-sections -o test.o
>>> /usr/local/bin/ld: internal error in layout_eh_frame_section, at
>>> .././../gold/object.cc:1309
>>> It is that place:
>>> https://github.com/gittup/binutils/blob/gittup/gold/object.cc#L1372
>>> Did not investigate it, but it looks it is place
>>> (https://sourceware.org/ml/binutils/2009-06/msg00097.html)
>>> mentioned in comment for
>>> https://marc.info/?l=llvm-commits&m=144683596826489.
>>
>>I've committed a patch in gold that should fix this problem:
>>
>> https://sourceware.org/ml/binutils/2017-11/msg00541.html
>>
>>Can you try gold again with this patch applied? You should at least
>>get a little further.
>>
>>If it still doesn't work, could I trouble you for a sample object file?
>>
>>-cary
>
>I'll try it soon and return with results, thanks !

I can confirm your patch fixes gold behavior.

I built latest binutils and performed benchmark tests again today.
Following versions of linkers were used:
* GNU ld (GNU Binutils) 2.29.51.20171129
* GNU gold (GNU Binutils 2.29.51.20171129) 1.14
* LLD 6.0.0 (trunk 319302) (compatible with GNU linkers)

--no-threads was set for gold and LLD tests.

Clang link time with single .eh_frame section in objects.
* ld.bfd: 2,940055212 seconds time elapsed ( +- 0,17% ) (t1)
* ld.gold: 0,994370076 seconds time elapsed ( +- 0,11% ) (t2)
* LLD: 0,445566042 seconds time elapsed ( +- 0,32% ) (t3)

Clang link time with multiple .eh_frame sections in objects.
(latest diff 3 of https://reviews.llvm.org/D40352 applied to produce them).
* ld.bfd: 3,792698701 seconds time elapsed ( +- 0,19% ) (1,29*t1)
* ld.gold: 1,135187654 seconds time elapsed ( +- 0,10% ) (1,1416*t2)
* LLD: 0,506076638 seconds time elapsed ( +- 0,31% ) (1,1358*t3)

bd1976 llvm via llvm-dev

unread,

Mar 28, 2018, 11:11:00 AM3/28/18

to George Rimar, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

I am very interested in reviving this.

Did anyone get any further with these ideas?

@Grimar: Did you do any profiling of the code? Were the slowdowns

you were seeing fundamental (i.e. due to IO) or could a more optimal

implementation reduce the slowdown? Did you do any end to end

timings for compilation + link time?

The same issues arise for all metadata sections:

.eh_frame

.debug_*

.stack_sizes

etc...

In our proprietary linker we've had to implement special handling

for each of these sections, which we'd love to be able to remove or

reduce.

One fundamental problem is overhead. I posted about

this on the gabi list:

https://groups.google.com/d/msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ

Take the .stack_sizes section. There is an llvm bug

which suggests that we should produce multiple .stack_size

sections rather than one monolithic .stack_size section:

https://bugs.llvm.org/show_bug.cgi?id=36717.

However, for .stack_sizes the payload is on average 10 bytes

per function. Splitting into multiple sections on elf x86-64 adds

a overhead of 128 bytes per function. An order of magnitude

increase. However, I don't know of any way to avoid this increase

without adding special case code to the linker?

Another thought is that although the gnu linkers are a concern

upstream, on our platform (and others where we are fully in control),

we could use this approach for .eh_frame. We would be able to test

and maintain the separate code paths in the backend.

George Rimar via llvm-dev

unread,

Mar 28, 2018, 11:44:44 AM3/28/18

to bd1976 llvm, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

>@Grimar: Did you do any profiling of the code? Were the slowdowns

>you were seeing fundamental (i.e. due to IO) or could a more optimal

>implementation reduce the slowdown? Did you do any end to end

>timings for compilation + link time?

No, as far I remember I did not profile this. All results I had were about linker

timing for linking clang (posted in this thread).

I think the slowdown is natural. The more input sections we have the slower we are.

Since LLD is very fast by nature, honestly I do not really believe there is a huge

chance to significantly boost the observed ~12(?) percents slowdown.

Otherwise, that would probably be done earlier for the common case.

George.

George Rimar via llvm-dev

unread,

Apr 30, 2018, 12:44:09 PM4/30/18

to bd1976 llvm, llvm...@lists.llvm.org, n...@arm.com, llvm-dev...@lists.llvm.org

I performed additional benchmarking today and also profiled this.

Previously I tried to link clang. I decided to take something much larger today.

So what I did was: took chromium sources and built 2 sets of objects.

One of them was built with vanilla clang and one with clang+https://reviews.llvm.org/D40352 patch applied.

As a result, the second set of object contained multiple .eh_frames (for each text section),

and the first set was just regular set.

Link times after 200 runs were:

1) ~2332ms for the regular set.

2) ~2464ms for "D40352" set.

The difference is about 6%. Does not look too scary as ~10% for clang link time I had earlier actually.

Note that I did not apply any other patches than D40352. For example, there is a draft patch for linker

that could use the benefit of objects with multiple .eh_frames to significantly simplify and even slightly

improve the -gc-sections implementation: https://reviews.llvm.org/D40484.

It could save some time for the case with GC probably.

Also, I tried to profile the difference observed but did not found any visible bottlenecks.

We seem just become a bit slower everywhere in the linker, I believe this is caused by natural

reasons: more sections, larger inputs -> slower result.

(I shared both reproduce sets used here: https://drive.google.com/open?id=15tGIypHOATiodxISRCAbg5iiGTBFXAtc).

Best regards,
George | Developer | Access Softek, Inc

От: George Rimar
Отправлено: 28 марта 2018 г. 18:44
Кому: bd1976 llvm
Копия: Cary Coutant; llvm...@lists.llvm.org; n...@arm.com; llvm-dev...@lists.llvm.org
Тема: Re: [llvm-dev] [RFC] Making .eh_frame more linker-friendly

Reply all

Reply to author

Forward