[llvm-dev] Make LLD output COFF relocatable object file (like ELF's -r does). How much work is required to implement this?

54 views
Skip to first unread message

kyra via llvm-dev

unread,
Oct 9, 2017, 11:02:50 AM10/9/17
to llvm...@lists.llvm.org
Hi,

How far are we from having '-r' in the LLD COFF linker?
I'd try to implement this if not too much effort is required.
Any suggestions and/or pointers?

Cheers,
Kyra


_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Rui Ueyama via llvm-dev

unread,
Oct 9, 2017, 7:40:05 PM10/9/17
to kyra, llvm-dev
As far as I know, no one has ever tried to add the -r option to the lld COFF linker. It shouldn't be super hard to add it to the COFF linker, but from our experience of implementing it to lld ELF linker, I can say that it was tricky and somewhat fragile. We had to add a number of small pieces of code here and there.

We wanted to support it in the ELF linker because that's an existing feature and people are actually using it. Otherwise, we wouldn't have added it. So, what is the motivation of adding the feature to the COFF linker? I don't think MSVC linker supports it.

(For those who are not familiar with -r,  the option is to make the linker emit a .o file instead of an executable or a shared library. With the option, you can combine multiple object files into one object file.)

kyra via llvm-dev

unread,
Oct 10, 2017, 4:37:07 AM10/10/17
to Rui Ueyama, llvm-dev
TL;DR:
I'm trying to evaluate if LLD can be used with GHC (Glasgow Haskell
Compiler) on Windows.

Haskell binary code is usually deployed in "packages". A package
typically provides static library(ies) and optionally – shared
library(ies) and/or prelinked ('ld -r') object file. The latter is the
best way to satisfy GHC runtime linker, since it requires no separate
compile/link pass (as shared library requires), and is much faster to
consume by GHC runtime linker than a static library.

Long story:

To prevent linking unused code GHC have always been supported splitting
intermediate assembly which is horribly slow when compiling. Now GHC
supports a direct analogue of '-ffunction-sections' ('-split-sections'
in GHC parlance), which dramatically improves compile times, but now BFD
linker is horribly slow on the files with a *lot* of sections. In the
*nix world they have gold linker, in the windows world we have nothing
other than GNU BFD ld ATM.

GHC on Windows uses Mingw tools and LLD doesn't fit into Mingw ecosystem
yet (I know that some support have creeped into LLD recently, but it is
still far from being complete), moreover, when assemling GHC native
codegen output, GNU assembler produces peculiar non-standard COFF files
(with 0x11 relocations), and finally binutils doesn't (and probably
never would) support bigobj extension in the 32-bit case.

Windows GHC relies heavily on GCC, especially it's runtime system's code
is full of gnu-isms, but Clang has a unique ability to combine gnu-ish
frontend with ms-ish backend, I've experimented a bit and have concluded
that replacing GCC as a C compiler/system assembler with Clang in GHC on
Windows is very much doable.

GHC uses object file combining ('ld -r') when C stubs/wrappers
generation is triggered, these stubs/wrappers are compiled with gcc and
are linked back into the 'main' object file. In the MS world this use
case can easily be satisfied by packing the object files into a library
since MS linker looks into libraries both when linking final exe/dll
*and/or* creating another library (i.e. when creating another library it
unpacks all object files from all libraries it is fed with, and repacks
them into the output library, llvm-lib doesn't support this ATM, and
AFAIR LLVM developers are aware of this).

But my question is motivated by another important use-case: when
packaging compiled Haskell code it is very desirable to provide not only
a static library, but also to partially link this library's object
modules into the one big object file, which can further be consumed by
GHC runtime linker. GHC runtime linker can link binary code in any form,
but linking static library is much slower than linking the single object
file.

> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>

Rui Ueyama via llvm-dev

unread,
Oct 10, 2017, 2:00:57 PM10/10/17
to kyra, llvm-dev
Thank you for your detailed explanation!

On Tue, Oct 10, 2017 at 1:36 AM, kyra <ky...@mail.ru> wrote:
TL;DR:
I'm trying to evaluate if LLD can be used with GHC (Glasgow Haskell Compiler) on Windows.

Haskell binary code is usually deployed in "packages". A package typically provides static library(ies) and optionally – shared library(ies) and/or prelinked ('ld -r') object file. The latter is the best way to satisfy GHC runtime linker, since it requires no separate compile/link pass (as shared library requires), and is much faster to consume by GHC runtime linker than a static library.

I'm not sure if I understand correctly. If my understanding is correct, you are saying that GHC can link either .o or .so at runtime, which sounds a bit odd because .o is not designed for dynamic linking. Am I missing something?

I also do not understand why only static libraries need "compile/link pass" -- they at least don't need a compile pass, as they contain compiled .o files, and they indeed need a link pass, but that's also true for a single big .o file generated by -r, no? After all, in order to link against a .a file, I think you need to pull out a .o file from a .a and do whatever you need to do to link a single big .o file.

Long story:

To prevent linking unused code GHC have always been supported splitting intermediate assembly which is horribly slow when compiling. Now GHC supports a direct analogue of '-ffunction-sections' ('-split-sections' in GHC parlance), which dramatically improves compile times, but now BFD linker is horribly slow on the files with a *lot* of sections. In the *nix world they have gold linker, in the windows world we have nothing other than GNU BFD ld ATM.

GHC on Windows uses Mingw tools and LLD doesn't fit into Mingw ecosystem yet (I know that some support have creeped into LLD recently, but it is still far from being complete), moreover, when assemling GHC native codegen output, GNU assembler produces peculiar non-standard COFF files (with 0x11 relocations), and finally binutils doesn't (and probably never would) support bigobj extension in the 32-bit case.

Windows GHC relies heavily on GCC, especially it's runtime system's code is full of gnu-isms, but Clang has a unique ability to combine gnu-ish frontend with ms-ish backend, I've experimented a bit and have concluded that replacing GCC as a C compiler/system assembler with Clang in GHC on Windows is very much doable.

GHC uses object file combining ('ld -r') when C stubs/wrappers generation is triggered, these stubs/wrappers are compiled with gcc and are linked back into the 'main' object file. In the MS world this use case can easily be satisfied by packing the object files into a library since MS linker looks into libraries both when linking final exe/dll *and/or* creating another library (i.e. when creating another library it unpacks all object files from all libraries it is fed with, and repacks them into the output library, llvm-lib doesn't support this ATM, and AFAIR LLVM developers are aware of this).

I have an in-progress patch to add the feature to llvm-lib. I didn't have time to finish it, but it is on the table, and needs to be done for compatibility with MSVC lib.exe.
 
But my question is motivated by another important use-case: when packaging compiled Haskell code it is very desirable to provide not only a static library, but also to partially link this library's object modules into the one big object file, which can further be consumed by GHC runtime linker. GHC runtime linker can link binary code in any form, but linking static library is much slower than linking the single object file.

IIUC, GHC is faster when handling .a files compared to a prelinked big .o file, even if they contain the same binary code/data. But it sounds like an artifact of the current implementation of GHC, because, in theory, there's no reason the former is much inefficient than the latter. If that's the case, doesn't it make more sense to improve GHC?

kyra via llvm-dev

unread,
Oct 10, 2017, 3:41:58 PM10/10/17
to Rui Ueyama, llvm-dev
On 10/10/2017 9:00 PM, Rui Ueyama wrote:
> I'm not sure if I understand correctly. If my understanding is
> correct, you are saying that GHC can link either .o or .so at runtime,
> which sounds a bit odd because .o is not designed for dynamic linking.
> Am I missing something?
Yes, GHC runtime linker *does* link .o files not only doing all
necessary relocations but also creating trampolines for "far" code to
fulfill "small" memory model.

> I also do not understand why only static libraries need "compile/link
> pass" -- they at least don't need a compile pass, as they contain
> compiled .o files, and they indeed need a link pass, but that's also
> true for a single big .o file generated by -r, no? After all, in order
> to link against a .a file, I think you need to pull out a .o file from
> a .a and do whatever you need to do to link a single big .o file.

Don't quite understand this.
The idea is that when creating a package you should *at the very least*
provide a static library a client can statically link against. You
optionally may create a shared library for a client to link against, but
to do so you should *recompile* the whole package because things differ
now (this is how GHC works), – you can't simply link all your existing
object code (what you've produced the static library from) into this
shared library. But if you want to provide the single prelinked *.o file
(for GHC runtime linker consumption) you need no to perform any extra
compile step, you simply link all your object files (exactly those which
went to the package's static library) into this *.o file with 'ld -r'.

> IIUC, GHC is faster when handling .a files compared to a prelinked big
> .o file, even if they contain the same binary code/data. But it sounds
> like an artifact of the current implementation of GHC, because, in
> theory, there's no reason the former is much inefficient than the
> latter. If that's the case, doesn't it make more sense to improve GHC?

No. GHC **runtime** linker is much slower when handling *.a files (and
this is exactly the culprit of this whole story) since it goes through
the whole archive and links each object module separately doing all
resolutions and relocations and trampolines, than when linking already
prelinked big *.o file.

There are, perhaps, some confusions related to what GHC *runtime* linker
is. GHC runtime linker goes out into the scene when either GHC is used
interactively, or GHC encounters the code which it has to execute at
compile time (Template Haskell/quasiquotations). Thus GHC compiler must
link some external code during it's own run time.

HTH.

Cheers,
Kyra

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org

http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Rui Ueyama via llvm-dev

unread,
Oct 10, 2017, 4:01:41 PM10/10/17
to kyra, llvm-dev
On Tue, Oct 10, 2017 at 12:41 PM, kyra <ky...@mail.ru> wrote:
On 10/10/2017 9:00 PM, Rui Ueyama wrote:
I'm not sure if I understand correctly. If my understanding is correct, you are saying that GHC can link either .o or .so at runtime, which sounds a bit odd because .o is not designed for dynamic linking. Am I missing something?
Yes, GHC runtime linker *does* link .o files not only doing all necessary relocations but also creating trampolines for "far" code to fulfill "small" memory model.

I also do not understand why only static libraries need "compile/link pass" -- they at least don't need a compile pass, as they contain compiled .o files, and they indeed need a link pass, but that's also true for a single big .o file generated by -r, no? After all, in order to link against a .a file, I think you need to pull out a .o file from a .a and do whatever you need to do to link a single big .o file.
Don't quite understand this.
The idea is that when creating a package you should *at the very least* provide a static library a client can statically link against. You optionally may create a shared library for a client to link against, but to do so you should *recompile* the whole package because things differ now (this is how GHC works), – you can't simply link all your existing object code (what you've produced the static library from) into this shared library. But if you want to provide the single prelinked *.o file (for GHC runtime linker consumption) you need no to perform any extra compile step, you simply link all your object files (exactly those which went to the package's static library) into this *.o file with 'ld -r'.

IIUC, GHC is faster when handling .a files compared to a prelinked big .o file, even if they contain the same binary code/data. But it sounds like an artifact of the current implementation of GHC, because, in theory, there's no reason the former is much inefficient than the latter. If that's the case, doesn't it make more sense to improve GHC?
No. GHC **runtime** linker is much slower when handling *.a files (and this is exactly the culprit of this whole story) since it goes through the whole archive and links each object module separately doing all resolutions and relocations and trampolines, than when linking already prelinked big *.o file.

Looks like I still do not understand why a .a can be much slower than a prelinked .o. As far as I understand, "ld -r" doesn't reduce amount of data that much. It doesn't reduce the number of relocations, as relocations in input object files are basically passed through to the output. It doesn't reduce the number of symbols that much, as the combined object file contains a union of all symbols appeared in the input files. So, I think the amount of data in a .a is essentially the same as a prelinked .o. I wonder what can make a difference in speed.

Reid Kleckner via llvm-dev

unread,
Oct 10, 2017, 5:20:49 PM10/10/17
to Rui Ueyama, llvm-dev
On Tue, Oct 10, 2017 at 1:01 PM, Rui Ueyama via llvm-dev <llvm...@lists.llvm.org> wrote:
No. GHC **runtime** linker is much slower when handling *.a files (and this is exactly the culprit of this whole story) since it goes through the whole archive and links each object module separately doing all resolutions and relocations and trampolines, than when linking already prelinked big *.o file.

Looks like I still do not understand why a .a can be much slower than a prelinked .o. As far as I understand, "ld -r" doesn't reduce amount of data that much. It doesn't reduce the number of relocations, as relocations in input object files are basically passed through to the output. It doesn't reduce the number of symbols that much, as the combined object file contains a union of all symbols appeared in the input files. So, I think the amount of data in a .a is essentially the same as a prelinked .o. I wonder what can make a difference in speed.

I can't speak for Haskell, but ld -r can be useful for speeding up C++ links, because it acts as a pre-merging step for duplicate comdats. Consider a library that uses many instantiations of the same template with the same type. An archive will contain many copies of the template, but a relocated object file will only contain one.

kyra via llvm-dev

unread,
Oct 10, 2017, 5:21:54 PM10/10/17
to Rui Ueyama, llvm-dev
On 10/10/2017 11:01 PM, Rui Ueyama wrote:
> Looks like I still do not understand why a .a can be much slower than
> a prelinked .o. As far as I understand, "ld -r" doesn't reduce amount
> of data that much. It doesn't reduce the number of relocations, as
> relocations in input object files are basically passed through to the
> output. It doesn't reduce the number of symbols that much, as the
> combined object file contains a union of all symbols appeared in the
> input files. So, I think the amount of data in a .a is essentially the
> same as a prelinked .o. I wonder what can make a difference in speed.
Ah, good point.

Only now have I realized that my perception of link times was formed
when no '-split-sections'  option existed. The corresponding option was
'-split-obs' and typical package 's static library contained thousands
object modules.

For example:
The latest official GHC 8.2.1 release "base" package's static library
built with '-split-objs' contains 25631 object modules. The static
library size is 28MB, prelinked object file size is 15MB.
My own custom built GHC ghc-8.3.20170619 release "base" package's static
library built with '-split-sections' (instead of '-split-objs') contains
228 object modules only. The static library size is 22MB, prelinked
object file size is 15MB.

Thus, when working with "-split-sections" libraries we won't, perhaps,
see that big differences in link times (remember we mean GHC runtime
linker here) between these libraries and their prelinked object
counterparts.

Thus, perhaps, having '-r' option in COFF LLD is becoming much less
important than I though before.

Reply all
Reply to author
Forward
0 new messages