[LLVMdev] [lld] Undefined symbols postprocessing

Denis Protivensky

unread,

Feb 18, 2015, 4:41:11 AM2/18/15

to llvm-dev

Hi everyone,

In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
during
static linking because they may be used by relocations (R_ARM_TLS_IE32) or
by some other stuff like STT_GNU_IFUNC symbols.
The problem is that now symbols are added in a declarative way by
specifying in ExecutableWriter::addDefaultAtoms() override.
At that stage, there's no way to determine if additional symbols are
required.
But libraries providing optimizations like STT_GNU_IFUNC
(glibc, for example) expect the GOT symbol to be defined, so the linking
process
fails in Resolver::resolve() if the symbol is not found.

I propose to add the ability to ignore undefined symbols during initial
resolution, and then postprocess only those undefines for the second time
after the pass manager execution.

Technically, this shouldn't be a problem:
- there will be a new option in the linking context that should signal
that the postprocessing of undefined symbols should be performed.
- if postprocessing option is set, newly added symbols will be collected
in the MergedFile returned by the Resolver, and then only those new symbols
will take part in the resolution process very similar to what
Resolver::resolve() does.
- available implementations will not break and keep working without use of
postprocessing feature.

So my proposal is to move from the declarative style towards imperative
and more flexible approach. Of course, there's a downside as the code
loses some of its regularity and becomes more volatile, but in the end -
we have tests to cover such things and ensure everything works as expected.

Any ideas?

- Denis Protivensky.

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Joerg Sonnenberger

unread,

Feb 18, 2015, 8:50:28 AM2/18/15

to llv...@cs.uiuc.edu

On Wed, Feb 18, 2015 at 01:38:15AM -0800, Denis Protivensky wrote:
> The problem is that now symbols are added in a declarative way by
> specifying in ExecutableWriter::addDefaultAtoms() override.
> At that stage, there's no way to determine if additional symbols are
> required.

Correct, this is actually quite a bit more fundamental. If you check
various test cases, you will find symbol table polllution with unused
items like __tls_get_addr.

> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.

Do you want to do that before or after dead code elimination?

Joerg

Shankar Easwaran

unread,

Feb 18, 2015, 11:43:38 AM2/18/15

to Denis Protivensky, llvm-dev

On 2/18/2015 3:38 AM, Denis Protivensky wrote:
> Hi everyone,
>
> In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
> during
> static linking because they may be used by relocations (R_ARM_TLS_IE32) or
> by some other stuff like STT_GNU_IFUNC symbols.
> The problem is that now symbols are added in a declarative way by
> specifying in ExecutableWriter::addDefaultAtoms() override.
> At that stage, there's no way to determine if additional symbols are
> required.
> But libraries providing optimizations like STT_GNU_IFUNC
> (glibc, for example) expect the GOT symbol to be defined, so the linking
> process
> fails in Resolver::resolve() if the symbol is not found.
>
> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.

I came across this same problem, and was planning on adding a
notifyUndefinedSymbol to the LinkingContext, if the linker wants to add
a defined symbol and coalesce it, it would be possible.

Do you think this will work for your case too ?

>
> Technically, this shouldn't be a problem:
> - there will be a new option in the linking context that should signal
> that the postprocessing of undefined symbols should be performed.
> - if postprocessing option is set, newly added symbols will be collected
> in the MergedFile returned by the Resolver, and then only those new symbols
> will take part in the resolution process very similar to what
> Resolver::resolve() does.
> - available implementations will not break and keep working without use of
> postprocessing feature.
>
> So my proposal is to move from the declarative style towards imperative
> and more flexible approach. Of course, there's a downside as the code
> loses some of its regularity and becomes more volatile, but in the end -
> we have tests to cover such things and ensure everything works as expected.
>
> Any ideas?
>
> - Denis Protivensky.
>
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation

Denis Protivensky

unread,

Feb 19, 2015, 5:03:40 AM2/19/15

to Shankar Easwaran, Joerg Sonnenberger, llvm-dev

Joerg:

> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.

Do you want to do that before or after dead code elimination?

I think dead code elimination should be performed after all possible object code modifications done by lld. Therefore, it should be done after undefines' postprocessing as well.

Shankar:

> I propose to add the ability to ignore undefined symbols during initial
> resolution, and then postprocess only those undefines for the second time
> after the pass manager execution.
I came across this same problem, and was planning on adding a
notifyUndefinedSymbol to the LinkingContext, if the linker wants to add
a defined symbol and coalesce it, it would be possible.

Do you think this will work for your case too ?

With this option, I don't see:
- how to postpone processing and reaction on undefines. If the callback is called from within Resolver::resolve(), you should react on it immediately, because otherwise the code will still fail in Resolver::resolve().
- how to know if a symbol is needed within the callback body. The need of any symbol is determined in some other place. So I need to keep a sort of indication (boolean flags, whatever) to know which symbols are really needed.
- the exact interface of notifyUndefinedSymbol callback. If it receives `StringRef` name of the undefined symbol, what reaction should be? Should it return new symbols to add back to the caller as `const Atom*`?

Thanks,
Denis.

Shankar Easwaran

unread,

Feb 19, 2015, 10:08:11 AM2/19/15

to Denis Protivensky, Joerg Sonnenberger, llvm-dev

On 2/19/2015 3:58 AM, Denis Protivensky wrote:
> Joerg:
>> I propose to add the ability to ignore undefined symbols during initial
>> resolution, and then postprocess only those undefines for the second time
>> after the pass manager execution.
> Do you want to do that before or after dead code elimination?
> I think dead code elimination should be performed after all possible object code modifications done by lld. Therefore, it should be done after undefines' postprocessing as well.

Gnu does dead code elimination before undefines are reported. So if a
function is not called and it has a undefined reference its would not be
an undef.

>
> Shankar:
>> I propose to add the ability to ignore undefined symbols during initial
>> resolution, and then postprocess only those undefines for the second time
>> after the pass manager execution.
> I came across this same problem, and was planning on adding a
> notifyUndefinedSymbol to the LinkingContext, if the linker wants to add
> a defined symbol and coalesce it, it would be possible.
>
> Do you think this will work for your case too ?
> With this option, I don't see:
> - how to postpone processing and reaction on undefines. If the callback is called from within Resolver::resolve(), you should react on it immediately, because otherwise the code will still fail in Resolver::resolve().
> - how to know if a symbol is needed within the callback body. The need of any symbol is determined in some other place. So I need to keep a sort of indication (boolean flags, whatever) to know which symbols are really needed.
> - the exact interface of notifyUndefinedSymbol callback. If it receives `StringRef` name of the undefined symbol, what reaction should be? Should it return new symbols to add back to the caller as `const Atom*`?

notifyUndefinedSymbol will allow the context to coalesce the undefined
atom with a defined atom.

Atom *notifyUndefinedSymbol(StringRef name) could be the interface.

> Thanks,
> Denis.

Shankar Easwaran

unread,

Feb 19, 2015, 12:17:43 PM2/19/15

to Denis Protivensky, Joerg Sonnenberger, llvm-dev

+ Nick

Rui Ueyama

unread,

Feb 19, 2015, 3:51:18 PM2/19/15

to Denis Protivensky, llvm-dev

On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <dproti...@accesssoftek.com> wrote:

Hi everyone,

In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
during
static linking because they may be used by relocations (R_ARM_TLS_IE32) or
by some other stuff like STT_GNU_IFUNC symbols.
The problem is that now symbols are added in a declarative way by
specifying in ExecutableWriter::addDefaultAtoms() override.
At that stage, there's no way to determine if additional symbols are
required.
But libraries providing optimizations like STT_GNU_IFUNC
(glibc, for example) expect the GOT symbol to be defined, so the linking
process
fails in Resolver::resolve() if the symbol is not found.

I don't know if this is directly applicable to your problem, but for PE/COFF I needed to add symbols conditionally. If you have a function func and if there's a reference to __imp_func, linker needs to create a data containing the address of func as __imp_func content. It's rarely used, so I wanted to create the __imp_ atom only when there's an unresolved reference to that symbol.

What I did at that moment is to define a (virtual) library file which dynamically creates an atom. The virtual library file is added at end of the input file list, and if the core linker looks it up for a symbol starting __imp_, the library creates an object file containing the symbol on the fly and returns it.

My experience of doing that is that worked but might have been too tricky. If this trick is directly applicable to your problem, you may want to do that. If not, I'm perhaps okay with your suggestion (although I didn't think about that hard yet.)

Thanks

Joerg Sonnenberger

unread,

Feb 19, 2015, 7:52:42 PM2/19/15

to llvm-dev

On Thu, Feb 19, 2015 at 01:58:59AM -0800, Denis Protivensky wrote:
> > Do you want to do that before or after dead code elimination?
> I think dead code elimination should be performed after all possible
> object code modifications done by lld. Therefore, it should be done
> after undefines' postprocessing as well.

How do you then make sure to not export redundant symbols? Consider
_GLOBAL_OFFSET_TABLE_ -- if the only user is in a dead function, it
should not be in the symbol table. Same for __tls_get_addr.

Joerg

Denis Protivensky

unread,

Feb 20, 2015, 1:42:45 AM2/20/15

to Shankar Easwaran, llvm-dev

Shankar,

Okay, I guessed the correct interface.
But what about the moment at which the function is called?
If it's called from Resolver::resolve(), it doesn't make any difference to me as I cannot determine the need of specific symbols at that time.

- Denis.

Denis Protivensky

unread,

Feb 20, 2015, 2:02:18 AM2/20/15

to Joerg Sonnenberger, Rui Ueyama, llvm-dev

Joerg:

How do you then make sure to not export redundant symbols? Consider
_GLOBAL_OFFSET_TABLE_ -- if the only user is in a dead function, it
should not be in the symbol table. Same for __tls_get_addr.

I agree that dead code elimination needs additional consideration, but my problem is that lld pollutes the symbol table inserting symbols unconditionally. I'd want to find a solution to this problem first as it generates even more redundant symbols right now.

Rui:

I don't know if this is directly applicable to your problem, but for PE/COFF I needed to add symbols conditionally. If you have a function func and if there's a reference to __imp_func, linker needs to create a data containing the address of func as __imp_func content. It's rarely used, so I wanted to create the __imp_ atom only when there's an unresolved reference to that symbol.

What I did at that moment is to define a (virtual) library file which dynamically creates an atom. The virtual library file is added at end of the input file list, and if the core linker looks it up for a symbol starting __imp_, the library creates an object file containing the symbol on the fly and returns it.

My experience of doing that is that worked but might have been too tricky. If this trick is directly applicable to your problem, you may want to do that. If not, I'm perhaps okay with your suggestion (although I didn't think about that hard yet.)

Looks like your trick won't work for me, because the virtual library you add is parsed in the Resolver::resolve() method where I don't have enough knowledge whether to add specific symbols or not. My problem is that I can only do it in the relocation pass (or some other pass if needed), which goes after symbol resolution.

Thanks,
Denis.

Rui Ueyama

unread,

Feb 20, 2015, 2:23:24 PM2/20/15

to Denis Protivensky, llvm-dev

On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <dproti...@accesssoftek.com> wrote:

I'm fine with the basic idea of allowing undefined symbols in the first resolver pass. A few questions about the implementation.

- How do you know which atom is newly added and which is not? Once an atom is added to a MutableFile, there's no easy way to recognize that, I guess.

- Does the second resolver pass need to be run after all other passes? Why don't you run the resolver once, and then call some externally-given function (from the resolver) to get a list of atoms that needs to be added to the result, and then resolve again, all inside the resolver?

Denis Protivensky

unread,

Feb 23, 2015, 3:30:07 AM2/23/15

to Rui Ueyama, llvm-dev

Rui, see inline.

The Resolver returns Resolver::MergedFile type as a result of call to resolve(), and we can override its addAtom method to put newly added atoms to a special separate collection which then may be examined for undefines.

- Does the second resolver pass need to be run after all other passes? Why don't you run the resolver once, and then call some externally-given function (from the resolver) to get a list of atoms that needs to be added to the result, and then resolve again, all inside the resolver?

Since we have a chance to determine newly added atoms after resolution, I don't see why to complicate the process with external functions and additional call dependencies. It all can be done by adding second resolve()-like function call in the Driver::link() after PassManager run.

Rui Ueyama

unread,

Feb 23, 2015, 1:47:52 PM2/23/15

to Denis Protivensky, llvm-dev

If we run the second pass immediately after the resolver runs, even without returning to the caller of the resolver, the resolver looks like a single pass process from outside, although internally it iterate twice. It makes post processing passes simpler if you can get all atoms that needs to be written to file just by calling the resolver once instead of getting incomplete set.

I guess you need to run some passes after the second resolver invocation. For example, you need to call the OrderPass to reorder newly added atoms to desired places. If resolver resolves everything in one invocation, you wouldn't think about that, too.

Shankar Easwaran

unread,

Feb 23, 2015, 2:45:51 PM2/23/15

to Denis Protivensky, Joerg Sonnenberger, Rui Ueyama, llvm-dev

Not sure why you want to call the Resolver again, wouldnt this API
suffice ? You can have a new API in the symbol table class called from
the resolver, with the list of undefined symbols.

Shankar Easwaran

Michael Spencer

unread,

Feb 23, 2015, 5:58:55 PM2/23/15

to Denis Protivensky, llvm-dev

On Thu, Feb 19, 2015 at 10:40 PM, Denis Protivensky
<dproti...@accesssoftek.com> wrote:
> Shankar,
>
> Okay, I guessed the correct interface.
> But what about the moment at which the function is called?
> If it's called from Resolver::resolve(), it doesn't make any difference to
> me as I cannot determine the need of specific symbols at that time.
>
> - Denis.

None of the symbols we are looking up require the full resolver, and
they are all special linker symbols. I propose two things.

1. Provide a hook as per what Shankar suggested for the resolver. User
references to linker defined symbols such as _GLOBAL_OFFSET_TABLE_ get
created and possibly deadstripped here. The linking context owns the
atom.
2. The ELFLinkingContext gains <Atom
*getOrCreateLinkerDefinedAtom(StringRef);>. This can be used in passes
to get the symbols. The hook in (1) would call this to create the
atoms.

This gives a single place where linker defined atoms are actually
created, and allows correct deadstripping and object file references
without doing multiple resolver passes.

- Michael Spencer

Nick Kledzik

unread,

Feb 23, 2015, 11:20:30 PM2/23/15

to Michael Spencer, Denis Protivensky, llvm-dev

On Feb 23, 2015, at 2:52 PM, Michael Spencer <bigch...@gmail.com> wrote:

> On Thu, Feb 19, 2015 at 10:40 PM, Denis Protivensky
> <dproti...@accesssoftek.com> wrote:
>> Shankar,
>>
>> Okay, I guessed the correct interface.
>> But what about the moment at which the function is called?
>> If it's called from Resolver::resolve(), it doesn't make any difference to
>> me as I cannot determine the need of specific symbols at that time.
>>
>> - Denis.
>
> None of the symbols we are looking up require the full resolver, and
> they are all special linker symbols. I propose two things.
>
> 1. Provide a hook as per what Shankar suggested for the resolver. User
> references to linker defined symbols such as _GLOBAL_OFFSET_TABLE_ get
> created and possibly deadstripped here. The linking context owns the
> atom.
> 2. The ELFLinkingContext gains <Atom
> *getOrCreateLinkerDefinedAtom(StringRef);>. This can be used in passes
> to get the symbols. The hook in (1) would call this to create the
> atoms.
>
> This gives a single place where linker defined atoms are actually
> created, and allows correct deadstripping and object file references
> without doing multiple resolver passes.

As Rui showed, we already have this abstraction. The linking context just adds a magic ArchiveFile. When queried for any “linker defined symbol”, the magic ArchiveFile instantiates the atoms needed.

This is how mach-o handles linker defined symbols like __dso_handle.

-Nick

Denis Protivensky

unread,

Feb 25, 2015, 7:44:34 AM2/25/15

to Nick Kledzik, llvm-dev

Okay, I understood that you're proposing to add all undefined symbols during the resolution step, and not try to collect extra symbols during execution and then check if some undefines left (as I originally planned).
This sounds reasonable as in any case we must have all undefines resolved in order to continue the linking process.

Concerning the implementation, why not to add this virtual archive file to the OutputELFWriter (or even to ExecutableWriter) since we already have a method to add specific files to the linking process?
We may then expose a simple interface to the descendants of the writers to give chance to handle undefines.

Also, do we need this special symbol handling for any cases other than static linking of the executable?

- Denis.

Shankar Easwaran

unread,

Feb 25, 2015, 9:29:20 AM2/25/15

to Denis Protivensky, Nick Kledzik, llvm-dev

Adding it to the OutputELFWriter sounds good.

On 2/25/2015 6:40 AM, Denis Protivensky wrote:

Okay, I understood that you're proposing to add all undefined symbols during the resolution step, and not try to collect extra symbols during execution and then check if some undefines left (as I originally planned).
This sounds reasonable as in any case we must have all undefines resolved in order to continue the linking process.

Concerning the implementation, why not to add this virtual archive file to the OutputELFWriter (or even to ExecutableWriter) since we already have a method to add specific files to the linking process?
We may then expose a simple interface to the descendants of the writers to give chance to handle undefines.

Also, do we need this special symbol handling for any cases other than static linking of the executable?

- Denis.

On 02/24/2015 06:44 AM, Nick Kledzik wrote:


On Feb 23, 2015, at 2:52 PM, Michael Spencer <bigch...@gmail.com><mailto:bigch...@gmail.com> wrote:

On Thu, Feb 19, 2015 at 10:40 PM, Denis Protivensky

<dproti...@accesssoftek.com><mailto:dproti...@accesssoftek.com> wrote:

LLV...@cs.uiuc.edu<mailto:LLV...@cs.uiuc.edu>         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list

LLV...@cs.uiuc.edu<mailto:LLV...@cs.uiuc.edu>         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Denis Protivensky

unread,

Mar 2, 2015, 6:51:59 AM3/2/15

to Shankar Easwaran, llvm-dev

Shankar,

Back when we started the discussion, you mentioned that you also wanted to take care of this case.
Will you do that or otherwise I'll work on it?

- Denis.

Shankar Easwaran

unread,

Mar 2, 2015, 11:50:05 AM3/2/15

to Denis Protivensky, llvm-dev

Denis,

Go ahead as you have already got a concensus on the current design.

Shankar Easwaran

> LLV...@cs.uiuc.edu<mailto:LLV...@cs.uiuc.edu><mailto:LLV...@cs.uiuc.edu><mailto:LLV...@cs.uiuc.edu> http://llvm.cs.uiuc.edu

> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list

> LLV...@cs.uiuc.edu<mailto:LLV...@cs.uiuc.edu><mailto:LLV...@cs.uiuc.edu><mailto:LLV...@cs.uiuc.edu> http://llvm.cs.uiuc.edu

> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu<mailto:LLV...@cs.uiuc.edu> http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>

> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation
>
>

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation

_______________________________________________
LLVM Developers mailing list

LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reply all

Reply to author

Forward