mark function symbols following different call abi

84 views
Skip to first unread message

szabol...@arm.com

unread,
Jan 15, 2019, 9:27:10 AM1/15/19
to Generic System V Application Binary Interface
I wonder if a bit of the st_other field in a symbol table entry may
be reserved for marking the symbol not to be resolved lazily
(a.k.a bind now).

This would require changing the gABI. The rationale is to solve a
problem in the generic ABI that may affect multiple targets so the
solution is consistent.

If that is not possible then i wonder what's the recommended way to
mark symbols in a psABI.

The problem:

As processor architectures continue to evolve there is a need to
introduce multiple call abis or call abi extensions (e.g. because of
a new SIMD instruction set extension).

The compiler can generate code appropriately in the caller and callee
based on the type of the function, however the dynamic linker does
not know the call abi of a symbol and may clobber registers that it
shouldn't at the first call during on demand symbol resolution with
lazy binding. Fixing the dynamic linker to handle all call abis may
be possible, but can be expensive or introduce backward compatibility
problems (e.g. increase stack usage breaking existing binaries).

There might be other tools (ltrace?) that need to know about call
abis, so ideally symbols that don't follow the base PCS would be
marked in some way. (But full type information is not needed as in
DWARF.)

Such markings belong to the psABI of the target, but it should be
tied to the symbol table and currently the available mechanisms are
limited:
- symbol table (very few processor specific parts).
- new section referring to symbol indexes (generic tools like strip
need to be aware how to handle such section).
- encode symbol information somewhere that passes through tools
(e.g. symbol names, may cause troubles as it does not follow the
generic ELF logic).
- use relocations (either each relocation needs a new variant which
is not scalable or one new relocation is needed per call abi where
only the symbol index is relevant, existing tools know how to deal
with relocations, but may need intrusive changes e.g. if the new
relocations disturb linker relaxations).

Updating dynamic linker on a system is more difficult in practice
than updating the compiler or linker so ideally a backward compatible
solution would not require a dynamic linker change.

This is possible e.g. if lazy binding can be disabled on a per symbol
basis in the static linker. But that still requires a marking in
relocatable ELF objects for those symbols. (So the marking can be
propagated from the compiler to the static linker.)

(Disabling lazy binding for entire ELF modules can cause too much
overhead or compatibility problems.)

I have a psABI solution that involves a new static relocation, but a
generic solution may be cleaner.

Thoughts?

Ali Bahrami

unread,
Jan 15, 2019, 4:17:27 PM1/15/19
to gener...@googlegroups.com
On 01/15/19 07:16, szabol...@arm.com wrote:
> I wonder if a bit of the st_other field in a symbol table entry may
> be reserved for marking the symbol not to be resolved lazily
> (a.k.a bind now).
>
> This would require changing the gABI. The rationale is to solve a
> problem in the generic ABI that may affect multiple targets so the
> solution is consistent.
>
> If that is not possible then i wonder what's the recommended way to
> mark symbols in a psABI.
>
> The problem:
>
> As processor architectures continue to evolve there is a need to
> introduce multiple call abis or call abi extensions (e.g. because of
> a new SIMD instruction set extension). >
> The compiler can generate code appropriately in the caller and callee
> based on the type of the function, however the dynamic linker does
> not know the call abi of a symbol and may clobber registers that it
> shouldn't at the first call during on demand symbol resolution with
> lazy binding. Fixing the dynamic linker to handle all call abis may
> be possible, but can be expensive or introduce backward compatibility
> problems (e.g. increase stack usage breaking existing binaries).
...
> Updating dynamic linker on a system is more difficult in practice
> than updating the compiler or linker so ideally a backward compatible
> solution would not require a dynamic linker change.

I guess I don't see how you make something like this
work without some runtime linker change. I'm also skeptical
that tweaks needed for one platform would always generalize
to another. There's not a lot of room in st_other, and this
sounds pretty platform dependent. I'd need to see a detailed
description of how it would really work before I was convinced
that it was simple and stable (not continually evolving) enough
to really belong in the gABI. I'll be interested to hear what
others think.

One way to add platform specific features like this is to
introduce a new platform specific section type, using the
sh_link field to point at the related symbol table, and
probably with a related platform specific dynamic section
tag that lets the runtime linker find that section. In Solaris,
we have an OS-specific (ELFOSABI_SOLARIS) section named
.SUNW_syminfo that works that way, which might serve as an
example:

https://docs.oracle.com/cd/E37838_01/html/E36783/chapter7-17.html#scrolltoc

The runtime linker finds this section via the DT_SYMINFO element
in the .dynamic section. (Yes, this would better have been named
DT_SUNW_SYMINFO, but it predates the gABI). This section has
one element per symbol, in the same positions as the symbol
table. The runtime linker combines the information from both
sections to form its view of each symbol. .SUNW_syminfo was
originally invented to support direct bindings, but as it has
a flags field with sufficient room for expansion, we've since
given it other uses as well.

Another, simpler, example of a section that parallels a
symbol table is SHT_SUNW_versym (known as SHT_GNU_versym
in the ELFOSABI_GNU world).

If the issue is really about the runtime linker saving
and restoring the right registers across PLTs, without
encountering excess overhead, then I wonder if you can't
solve it in some other manner, outside of ELF? For instance,
sparcv9 PLTS can require saving and restoring the floating
point registers, which has undeniable overhead, but the
hardware sets the FPRS register when floating point is
actually used, and the runtime linker uses that to skip
the floating point registers in code that's not using them.
On the other hand, for X86, we just bite the bullet and save
SIMD related registers that may or may not be in use, rather
than adding complexity.

- Ali

Florian Weimer

unread,
Jan 15, 2019, 4:43:42 PM1/15/19
to szabol...@arm.com, Generic System V Application Binary Interface
* szabolcs nagy:

> If that is not possible then i wonder what's the recommended way to
> mark symbols in a psABI.

POWER ELFv2 uses something in st_st_other to mark localentry functions:

/* Check that optimized plt call stubs for localentry:0 functions
are not being satisfied by a non-zero localentry symbol. */
if (map->l_info[DT_PPC64(OPT)]
&& (map->l_info[DT_PPC64(OPT)]->d_un.d_val & PPC64_OPT_LOCALENTRY) != 0
&& refsym->st_info == ELFW(ST_INFO) (STB_GLOBAL, STT_FUNC)
&& (STO_PPC64_LOCAL_MASK & refsym->st_other) == 0
&& (STO_PPC64_LOCAL_MASK & sym->st_other) != 0)
_dl_error_localentry (map, refsym);

> The problem:
>
> As processor architectures continue to evolve there is a need to
> introduce multiple call abis or call abi extensions (e.g. because of
> a new SIMD instruction set extension).

An extremely low-tech solution would involve an attribute similar to
__attribute__ ((noplt)) which also mangles the name differently, so that
you get a linker failure if you try to call the function with the wrong
prototype.

> The compiler can generate code appropriately in the caller and callee
> based on the type of the function, however the dynamic linker does
> not know the call abi of a symbol and may clobber registers that it
> shouldn't at the first call during on demand symbol resolution with
> lazy binding. Fixing the dynamic linker to handle all call abis may
> be possible, but can be expensive or introduce backward compatibility
> problems (e.g. increase stack usage breaking existing binaries).

It is very hard to support truly arbitrary ABIs for lazy binding. You
need kernel support for that. An arbitrary ABI might use the stack
pointer register for something else than a stack, for example. (Such
things actually exist on i386; MLton is an example.) In such cases, you
cannot perform the symbol binding from the same process.

> There might be other tools (ltrace?) that need to know about call
> abis, so ideally symbols that don't follow the base PCS would be
> marked in some way. (But full type information is not needed as in
> DWARF.)

Tools can always save too much of the register file to stay compatible
with all ABIs.

Thanks,
Florian

Cary Coutant

unread,
Jan 15, 2019, 5:52:38 PM1/15/19
to Generic System V Application Binary Interface
You could use a separate relocation as you suggest, or you could use a
processor-specific symbol type for functions with the special calling
conventions. HP-UX, in fact, uses STT_PARISC_MILLI for "millicode"
routines, which had non-standard calling conventions. (HP's ELF spec
reads: "Millicode routines are identified separately from standard
functions so that the linker can build export stubs for standard
functions but not for millicode routines.")

I think your use case matches the intended purpose of the symbol type
field. STT_FUNC designates a symbol as the entry point of a function
that can be assumed to follow the "normal" calling conventions for the
platform, enabling certain operations such as PLT redirection and lazy
binding. If you don't want to enable those, you could simply use
STT_OBJECT or STT_NOTYPE; if you want to enable similar, but different
operations, then you should use a new symbol type. (It is unfortunate
that there are so few values reserved between STT_LOOS and STT_HIPROC,
but there should be enough for you.)

I'd be extremely reluctant to endorse the use of st_other bits for
this purpose. For one thing, I suspect that enough platforms are using
those unassigned bits for processor-specific meanings that they are,
for all practical purposes, unavailable to the gABI.

-cary
> --
> You received this message because you are subscribed to the Google Groups "Generic System V Application Binary Interface" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to generic-abi...@googlegroups.com.
> To post to this group, send email to gener...@googlegroups.com.
> Visit this group at https://groups.google.com/group/generic-abi.
> For more options, visit https://groups.google.com/d/optout.

Szabolcs Nagy

unread,
Jan 16, 2019, 5:26:44 AM1/16/19
to gener...@googlegroups.com, Cary Coutant, nd
On 15/01/2019 22:52, Cary Coutant wrote:
> You could use a separate relocation as you suggest, or you could use a
> processor-specific symbol type for functions with the special calling
> conventions. HP-UX, in fact, uses STT_PARISC_MILLI for "millicode"
> routines, which had non-standard calling conventions. (HP's ELF spec
> reads: "Millicode routines are identified separately from standard
> functions so that the linker can build export stubs for standard
> functions but not for millicode routines.")
>
> I think your use case matches the intended purpose of the symbol type
> field. STT_FUNC designates a symbol as the entry point of a function
> that can be assumed to follow the "normal" calling conventions for the
> platform, enabling certain operations such as PLT redirection and lazy
> binding. If you don't want to enable those, you could simply use
> STT_OBJECT or STT_NOTYPE; if you want to enable similar, but different
> operations, then you should use a new symbol type. (It is unfortunate
> that there are so few values reserved between STT_LOOS and STT_HIPROC,
> but there should be enough for you.)

that example is helpful.

i was considering using

STT_AARCH64_FUNC_NOW
STT_AARCH64_GNU_IFUNC_NOW

my concern was that on gnu systems there is already STT_GNU_IFUNC
which also needs such a value and any similar future extension.
(there are 3 OS specific values, one of which is used on GNU
and 3 processor specific values which we don't use yet).

this is risky in case of future extensions even if we don't try to
mark all different call abis, only base vs non-base PCS functions.

and both linker and dynamic linker code handles STT_{GNU_I}FUNC
specially in many cases most of which will need some change so
i thought adding such STT_ would require big changes in tools,
but if there is precedent that probably helps.

>
> I'd be extremely reluctant to endorse the use of st_other bits for
> this purpose. For one thing, I suspect that enough platforms are using
> those unassigned bits for processor-specific meanings that they are,
> for all practical purposes, unavailable to the gABI.

i see.. then probably the gABI should reserve some of those
bits to os/psABI use?

thanks.

Szabolcs Nagy

unread,
Jan 16, 2019, 6:04:12 AM1/16/19
to gener...@googlegroups.com, Ali Bahrami, nd
a plt stub references a pltgot entry which has a jumpslot
relocation pointing to it.

they are also in a particular layout so at runtime the jumpslot
relocation can be found knowing the pltgot entry (needed for
symbol lookup).

the jumpslot relocations are lazy resolved because they are
in the DT_JMPREL, DT_PLTRELSZ range.

so to make a symbol bind now all we need is to change the
plt stub so it references some other got entry which has
another jumpslot relocation but this time it's outside of
the DT_JMPREL, DT_PLTRELSZ list, so it will be resolved at
load time.

unfortunately the original pltgot entry and jumpslot relocation
has to be kept, not to disturb the layout for other plts, so
this is wasteful, but should work backward compatibly on all
standard conforming dynamic linkers. (objdump and other tools
that try to display or process plt stubs may find the original
jumpslot relocation instead of the new one but that's fine.)

(it's a bit of a hack as the bind now requirement is not
described in the elf format, but forced out by changing
the plt stub and its relocation)

>
> One way to add platform specific features like this is to
> introduce a new platform specific section type, using the
> sh_link field to point at the related symbol table, and
> probably with a related platform specific dynamic section
> tag that lets the runtime linker find that section. In Solaris,
> we have an OS-specific (ELFOSABI_SOLARIS) section named
> .SUNW_syminfo that works that way, which might serve as an
> example:
>
>     https://docs.oracle.com/cd/E37838_01/html/E36783/chapter7-17.html#scrolltoc
>
> The runtime linker finds this section via the DT_SYMINFO element
> in the .dynamic section. (Yes, this would better have been named
> DT_SUNW_SYMINFO, but it predates the gABI). This section has
> one element per symbol, in the same positions as the symbol
> table. The runtime linker combines the information from both
> sections to form its view of each symbol. .SUNW_syminfo was
> originally invented to support direct bindings, but as it has
> a flags field with sufficient room for expansion, we've since
> given it other uses as well.

yeah this is probably the most future proof solution,
but it requires more engineering than i hoped for
(i assume several tools need to know about such section
to be able to update the symbol indexes in case the
symbol table changes).

>
> Another, simpler, example of a section that parallels a
> symbol table is SHT_SUNW_versym (known as SHT_GNU_versym
> in the ELFOSABI_GNU world).
>
> If the issue is really about the runtime linker saving
> and restoring the right registers across PLTs, without
> encountering excess overhead, then I wonder if you can't
> solve it in some other manner, outside of ELF? For instance,
> sparcv9 PLTS can require saving and restoring the floating
> point registers, which has undeniable overhead, but the
> hardware sets the FPRS register when floating point is
> actually used, and the runtime linker uses that to skip
> the floating point registers in code that's not using them.
> On the other hand, for X86, we just bite the bullet and save
> SIMD related registers that may or may not be in use, rather
> than adding complexity.

unfortunately the architecture does not have a way to
efficiently tell the dynamic linker if some cpu extension
is in use.

i believe x86 has xsave which allows to save register state
such that it knows about what extensions are in use, however
it's still not optimal: can cause stack overflow during
lazy binding in existing binaries when they are run on
systems with bigger register state (the kind of problem
i want to avoid: on aarch64 sve register state can be very
big, depends on the underlying hw implementation).

thanks.

>
> - Ali
>

Szabolcs Nagy

unread,
Jan 16, 2019, 6:34:16 AM1/16/19
to gener...@googlegroups.com, Florian Weimer, nd
On 15/01/2019 21:43, Florian Weimer wrote:
> * szabolcs nagy:
>
>> If that is not possible then i wonder what's the recommended way to
>> mark symbols in a psABI.
>
> POWER ELFv2 uses something in st_st_other to mark localentry functions:
>
> /* Check that optimized plt call stubs for localentry:0 functions
> are not being satisfied by a non-zero localentry symbol. */
> if (map->l_info[DT_PPC64(OPT)]
> && (map->l_info[DT_PPC64(OPT)]->d_un.d_val & PPC64_OPT_LOCALENTRY) != 0
> && refsym->st_info == ELFW(ST_INFO) (STB_GLOBAL, STT_FUNC)
> && (STO_PPC64_LOCAL_MASK & refsym->st_other) == 0
> && (STO_PPC64_LOCAL_MASK & sym->st_other) != 0)
> _dl_error_localentry (map, refsym);
>
>> The problem:
>>
>> As processor architectures continue to evolve there is a need to
>> introduce multiple call abis or call abi extensions (e.g. because of
>> a new SIMD instruction set extension).
>
> An extremely low-tech solution would involve an attribute similar to
> __attribute__ ((noplt)) which also mangles the name differently, so that
> you get a linker failure if you try to call the function with the wrong
> prototype.

yes, i tried noplt, but i could not make non-PIC code efficient
on aarch64: currently if the address of a function is taken
that introduces a PLT entry in a (non-PIC) executable, and to
avoid that GOT relocs are needed, and i didn't see a way to
optimize calls via GOT entries away in the linker in case of
static linking.

since i want to use this for simd math function calls that are
performance critical, i'd prefer a solution that does not
regresses static linked performance.

>
>> The compiler can generate code appropriately in the caller and callee
>> based on the type of the function, however the dynamic linker does
>> not know the call abi of a symbol and may clobber registers that it
>> shouldn't at the first call during on demand symbol resolution with
>> lazy binding. Fixing the dynamic linker to handle all call abis may
>> be possible, but can be expensive or introduce backward compatibility
>> problems (e.g. increase stack usage breaking existing binaries).
>
> It is very hard to support truly arbitrary ABIs for lazy binding. You
> need kernel support for that. An arbitrary ABI might use the stack
> pointer register for something else than a stack, for example. (Such
> things actually exist on i386; MLton is an example.) In such cases, you
> cannot perform the symbol binding from the same process.

ok. i'm not concerned with arbitrary abis, just sane ones
that can interoperate reasonably (i think on x86_64 icc
supports call abis that are currently not supported in the
glibc dynamic linker, but could be in principle, so the user
either has to static link or -z now to use those).

Szabolcs Nagy

unread,
Jan 16, 2019, 9:25:46 AM1/16/19
to gener...@googlegroups.com, Florian Weimer, nd
On 16/01/2019 11:34, Szabolcs Nagy wrote:
> On 15/01/2019 21:43, Florian Weimer wrote:
>> An extremely low-tech solution would involve an attribute similar to
>> __attribute__ ((noplt)) which also mangles the name differently, so that
>> you get a linker failure if you try to call the function with the wrong
>> prototype.
>
> yes, i tried noplt, but i could not make non-PIC code efficient
> on aarch64: currently if the address of a function is taken
> that introduces a PLT entry in a (non-PIC) executable, and to
> avoid that GOT relocs are needed, and i didn't see a way to
> optimize calls via GOT entries away in the linker in case of
> static linking.
>
> since i want to use this for simd math function calls that are
> performance critical, i'd prefer a solution that does not
> regresses static linked performance.

ah sorry, the problem was not just performance,
but that the linker needs to know in case the address
of a function is taken that it should not create a
plt and use that as the address (but emit dynamic
relocs as with -shared linking of pic code).

so this approach already needs a way to propagate per
symbol information from the compiler to the linker
via relocatable objects.

Cary Coutant

unread,
Jan 16, 2019, 3:06:06 PM1/16/19
to Szabolcs Nagy, gener...@googlegroups.com, nd
> i was considering using
>
> STT_AARCH64_FUNC_NOW
> STT_AARCH64_GNU_IFUNC_NOW
>
> my concern was that on gnu systems there is already STT_GNU_IFUNC
> which also needs such a value and any similar future extension.
> (there are 3 OS specific values, one of which is used on GNU
> and 3 processor specific values which we don't use yet).
>
> this is risky in case of future extensions even if we don't try to
> mark all different call abis, only base vs non-base PCS functions.
>
> and both linker and dynamic linker code handles STT_{GNU_I}FUNC
> specially in many cases most of which will need some change so
> i thought adding such STT_ would require big changes in tools,
> but if there is precedent that probably helps.

I wouldn't worry about exhausting the supply of reserved values --
with that logic, no one would ever use any of the ones that are still
available. Such extensions are rare, and we'll come up with something
when and if we need to.

One thing to consider -- do you really need to support the combination
of IFUNC and your new symbol type?

-cary

Szabolcs Nagy

unread,
Jan 18, 2019, 2:32:19 PM1/18/19
to gener...@googlegroups.com, Florian Weimer, nd
On 15/01/2019 21:43, Florian Weimer wrote:
> * szabolcs nagy:
>
>> If that is not possible then i wonder what's the recommended way to
>> mark symbols in a psABI.
>
> POWER ELFv2 uses something in st_st_other to mark localentry functions:
>
> /* Check that optimized plt call stubs for localentry:0 functions
> are not being satisfied by a non-zero localentry symbol. */
> if (map->l_info[DT_PPC64(OPT)]
> && (map->l_info[DT_PPC64(OPT)]->d_un.d_val & PPC64_OPT_LOCALENTRY) != 0
> && refsym->st_info == ELFW(ST_INFO) (STB_GLOBAL, STT_FUNC)
> && (STO_PPC64_LOCAL_MASK & refsym->st_other) == 0
> && (STO_PPC64_LOCAL_MASK & sym->st_other) != 0)
> _dl_error_localentry (map, refsym);

if some processor targets already snatched st_other values,
then i might do the same to solve my issue on aarch64.
(but i still think marking non-base-pcs calls is useful
more generically, at least on gnu os level.)

i'm looking at STT_ symbol type and R_ static reloc based
solutions too, but st_other may cause the least amount of
problems even though it's not strictly gABI conform.

for a gnu-gabi solution, probably a new syminfo like section
is the cleanest, but it would require designing a lot of
details about how to update/merge/interpret such a section.

H.J. Lu

unread,
Jan 18, 2019, 2:53:24 PM1/18/19
to Generic System V Application Binary Interface, Florian Weimer, nd
On x86, we are doing our best to avoid lazy binding
for -fno-plt and noplt attribute, even when function
address is used. Can you try it with GCC 8 on x86?

--
H.J.

Florian Weimer

unread,
Jan 18, 2019, 2:59:26 PM1/18/19
to H.J. Lu, Generic System V Application Binary Interface, nd
* H. J. Lu:
I think the x86 approach will not work for Aarch64 because the PLT and
non-PLT call sites are too different (and there is no support for
PC-relative addressing with sufficiently large displacements).

Thanks,
Florian

Szabolcs Nagy

unread,
Jan 21, 2019, 5:53:19 AM1/21/19
to gener...@googlegroups.com, Florian Weimer, H.J. Lu, nd
noplt does not avoid lazy binding on x86 either.

simple calls work (on aarch64 that already fails, the closest
we can get is to use GOT indirect calls for noplt, which is:

- address computation of the GOT entry,
- load of the function pointer,
- call or jump,

and to be able to relax this in the linker into a direct call
when the address is known at link time, we would have to
always keep these instructions together which means slower
code e.g. for calls in a loop. on x86 this is a single insn).

however taking the address of a noplt function still does not
work without further changes, i don't see how the linker can
avoid plt in the funcptrs case below:

void f_default(void);
void f_noplt(void) __attribute__((noplt));

void (*p_default)(void) = f_default;
void (*p_noplt)(void) = f_noplt;

void g(void *);

int ptrs(void)
{
g((void*)p_default);
g((void*)p_noplt); // ??
return 0;
}

int funcptrs(void)
{
g((void*)f_default);
g((void*)f_noplt); // needs plt for address
return 0;
}

int calls(void)
{
f_default();
f_noplt(); // works without plt
return 0;
}


with -O -fno-PIC compiles to

ptrs:
subq $8, %rsp
movq p_default(%rip), %rdi
call g
movq p_noplt(%rip), %rdi // lazy unless R_X86_64_64
call g
movl $0, %eax
addq $8, %rsp
ret
funcptrs:
subq $8, %rsp
movl $f_default, %edi
call g
movl $f_noplt, %edi // needs an address (PLT)
call g
movl $0, %eax
addq $8, %rsp
ret
calls:
subq $8, %rsp
call f_default
call *f_noplt@GOTPCREL(%rip) // ok, non-lazy
movl $0, %eax
addq $8, %rsp
ret
p_noplt:
.quad f_noplt
p_default:
.quad f_default

Florian Weimer

unread,
Jan 21, 2019, 6:12:54 AM1/21/19
to Szabolcs Nagy, gener...@googlegroups.com, H.J. Lu, nd
* Szabolcs Nagy:

> however taking the address of a noplt function still does not
> work without further changes, i don't see how the linker can
> avoid plt in the funcptrs case below:

You need to look at the final link, not the relocatable object file. My
understanding is that address-significant function references are never
lazily bound on x86, with our without PLT.

Thanks,
Florian

H.J. Lu

unread,
Jan 21, 2019, 8:25:30 AM1/21/19
to Florian Weimer, Szabolcs Nagy, gener...@googlegroups.com, nd
That is correct. See "Linker Optimization" chapter in x86-64 psABI.

--
H.J.

Szabolcs Nagy

unread,
Jan 21, 2019, 10:59:23 AM1/21/19
to H.J. Lu, Florian Weimer, nd, gener...@googlegroups.com
it only works for direct calls or if the compiler
can optimize a call via funcptr into a direct call
(e.g. compiler can see p=&func when compiling p()).

i don't need to look at the final link: if both the
definition and the indirect call via funcptr are in
an extern module, then the linker cannot possibly
fix this up: it needs to put an absolute address in
the (non-pic) code and that will be the PLT address.

but here is a complete example with single stepping
gdb into the lazy resolver:

$ cat lib.c
__attribute__((noplt)) void f_noplt(void) {}
void g(void p(void)) {p();}
$ cat main.c
void f_noplt(void) __attribute__((noplt));
void (*p_noplt)(void) = f_noplt;
void g(void (*)(void));

int main()
{
g(p_noplt); // lazy: linker sets p_noplt to PLT address
g(f_noplt); // lazy: linker sets mov immediate to PLT address
f_noplt(); // non-lazy
return 0;
}
$ gcc -c -O -fPIC -o lib.o lib.c
$ gcc -shared -o lib.so lib.o
$ gcc -c -O -fno-PIC -o main.o main.c
$ gcc -no-pie -o main main.o ./lib.so
$ readelf -aW main | grep f_noplt
0000000000600ff0 0000000500000006 R_X86_64_GLOB_DAT 0000000000400530 f_noplt + 0
0000000000601018 0000000500000007 R_X86_64_JUMP_SLOT 0000000000400530 f_noplt + 0
5: 0000000000400530 0 FUNC GLOBAL DEFAULT UND f_noplt
49: 0000000000400530 0 FUNC GLOBAL DEFAULT UND f_noplt
$ objdump -rdw main |grep -A9 '<main>:'
0000000000400657 <main>:
400657: 48 83 ec 08 sub $0x8,%rsp
40065b: 48 8b 3d de 09 20 00 mov 0x2009de(%rip),%rdi # 601040 <p_noplt>
400662: e8 e9 fe ff ff callq 400550 <g@plt>
400667: bf 30 05 40 00 mov $0x400530,%edi
40066c: e8 df fe ff ff callq 400550 <g@plt>
400671: ff 15 79 09 20 00 callq *0x200979(%rip) # 600ff0 <f_noplt>
400677: b8 00 00 00 00 mov $0x0,%eax
40067c: 48 83 c4 08 add $0x8,%rsp
400680: c3 retq
$ gdb --quiet ./main
Reading symbols from ./main...(no debugging symbols found)...done.
(gdb) b g
Breakpoint 1 at 0x400550
(gdb) r
Starting program: /data/A/noplt/main

Breakpoint 1, 0x00007ffff7bd85ec in g () from ./lib.so
(gdb) si
0x00007ffff7bd85f0 in g () from ./lib.so
(gdb)
0x0000000000400530 in f_noplt@plt ()
(gdb)
0x0000000000400536 in f_noplt@plt ()
(gdb)
0x000000000040053b in f_noplt@plt ()
(gdb)
0x0000000000400520 in ?? ()
(gdb)
0x0000000000400526 in ?? ()
(gdb)
0x00007ffff7def200 in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb)

H.J. Lu

unread,
Jan 21, 2019, 11:53:24 AM1/21/19
to Szabolcs Nagy, Florian Weimer, nd, gener...@googlegroups.com
ix86_force_load_from_GOT_p failed to check noplt attribute:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954

I am testing the enclosed patch.

--
H.J.
---
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8abff99cc62..9861db74ca2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15189,10 +15189,13 @@ ix86_force_load_from_GOT_p (rtx x)
{
return ((TARGET_64BIT || HAVE_AS_IX86_GOT32X)
&& !TARGET_PECOFF && !TARGET_MACHO
- && !flag_plt && !flag_pic
+ && !flag_pic
&& ix86_cmodel != CM_LARGE
&& GET_CODE (x) == SYMBOL_REF
&& SYMBOL_REF_FUNCTION_P (x)
+ && (!flag_plt
+ || lookup_attribute ("noplt",
+ DECL_ATTRIBUTES (SYMBOL_REF_DECL (x))))
&& !SYMBOL_REF_LOCAL_P (x));
}

Szabolcs Nagy

unread,
Jan 21, 2019, 1:44:13 PM1/21/19
to H.J. Lu, nd, Florian Weimer, gener...@googlegroups.com
On 21/01/2019 16:52, H.J. Lu wrote:
> ix86_force_load_from_GOT_p failed to check noplt attribute:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954
>
> I am testing the enclosed patch.

if &f_noplt is compiled like with -fPIC then it will go via
a GOT entry, which i think works (because linker will emit
a R_*_GLOB_DAT dynamic reloc instead of filling it with the
PLT address).

but p_noplt (global pointer initialized to &f_noplt) still
does not work: the static linker cannot tell the difference
between p_noplt and p_default in:

.data
p_noplt:
.quad f_noplt
p_default:
.quad f_default

so either both get a R_*_64 reloc (like -pie linking)
or both get the PLT address (current -no-pie linking i
think, which means lazy binding).

on x86_64 always using a dynamic reloc here may not be too
bad (but still requires linker changes that will affect
code which does not use noplt), on aarch64 GOT accesses
cannot be relaxed into efficient code for static linking,
so i wasn't looking into this further.

H.J. Lu

unread,
Jan 21, 2019, 2:18:53 PM1/21/19
to Szabolcs Nagy, nd, Florian Weimer, gener...@googlegroups.com
On Mon, Jan 21, 2019 at 10:44 AM Szabolcs Nagy <Szabol...@arm.com> wrote:
>
> On 21/01/2019 16:52, H.J. Lu wrote:
> > ix86_force_load_from_GOT_p failed to check noplt attribute:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954
> >
> > I am testing the enclosed patch.
>
> if &f_noplt is compiled like with -fPIC then it will go via
> a GOT entry, which i think works (because linker will emit
> a R_*_GLOB_DAT dynamic reloc instead of filling it with the
> PLT address).
>
> but p_noplt (global pointer initialized to &f_noplt) still
> does not work: the static linker cannot tell the difference
> between p_noplt and p_default in:
>
> .data
> p_noplt:
> .quad f_noplt
> p_default:
> .quad f_default
>
> so either both get a R_*_64 reloc (like -pie linking)
> or both get the PLT address (current -no-pie linking i
> think, which means lazy binding).

On x86, it means PLT, but not lazy binding, since there should be no
R_*_JUMP_SLOT.

> on x86_64 always using a dynamic reloc here may not be too
> bad (but still requires linker changes that will affect
> code which does not use noplt), on aarch64 GOT accesses
> cannot be relaxed into efficient code for static linking,
> so i wasn't looking into this further.
>
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 8abff99cc62..9861db74ca2 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -15189,10 +15189,13 @@ ix86_force_load_from_GOT_p (rtx x)
> > {
> > return ((TARGET_64BIT || HAVE_AS_IX86_GOT32X)
> > && !TARGET_PECOFF && !TARGET_MACHO
> > - && !flag_plt && !flag_pic
> > + && !flag_pic
> > && ix86_cmodel != CM_LARGE
> > && GET_CODE (x) == SYMBOL_REF
> > && SYMBOL_REF_FUNCTION_P (x)
> > + && (!flag_plt
> > + || lookup_attribute ("noplt",
> > + DECL_ATTRIBUTES (SYMBOL_REF_DECL (x))))
> > && !SYMBOL_REF_LOCAL_P (x));
> > }
>


--
H.J.

Szabolcs Nagy

unread,
Jan 22, 2019, 6:32:00 AM1/22/19
to H.J. Lu, nd, Florian Weimer, gener...@googlegroups.com
On 21/01/2019 19:18, H.J. Lu wrote:
> On Mon, Jan 21, 2019 at 10:44 AM Szabolcs Nagy <Szabol...@arm.com> wrote:
>> On 21/01/2019 16:52, H.J. Lu wrote:
>>> ix86_force_load_from_GOT_p failed to check noplt attribute:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954
>>>
>>> I am testing the enclosed patch.
>>
>> if &f_noplt is compiled like with -fPIC then it will go via
>> a GOT entry, which i think works (because linker will emit
>> a R_*_GLOB_DAT dynamic reloc instead of filling it with the
>> PLT address).
>>
>> but p_noplt (global pointer initialized to &f_noplt) still
>> does not work: the static linker cannot tell the difference
>> between p_noplt and p_default in:
>>
>> .data
>> p_noplt:
>> .quad f_noplt
>> p_default:
>> .quad f_default
>>
>> so either both get a R_*_64 reloc (like -pie linking)
>> or both get the PLT address (current -no-pie linking i
>> think, which means lazy binding).
>
> On x86, it means PLT, but not lazy binding, since there should be no
> R_*_JUMP_SLOT.

it seems p_noplt does not create a PLT on x86_64
with the bfd linker.

p_noplt gets an R_*_64 dynamic reloc already if there
is no PLT otherwise, so with your gcc patch noplt
disables lazy binding.

but e.g. x86_64 ld.gold behaves as i expected above:
p_noplt forces a PLT creation and gets initialized
to the PLT address. this is the linker behaviour on
aarch64, which means disabling lazy binding requires
linker change, a compiler fix is not enough.

(but we went off topic)
Reply all
Reply to author
Forward
0 new messages