RFC: Dynamic tags for Procedure Linkage Table

101 views
Skip to first unread message

H.J. Lu

unread,
Apr 21, 2023, 12:39:30 PM4/21/23
to x86-64-abi
Define dynamic tags for Procedure Linkage Table (PLT):

#define DT_X86_64_PLT (DT_LOPROC + 0)
#define DT_X86_64_PLTSZ (DT_LOPROC + 1)
#define DT_X86_64_PLTENT (DT_LOPROC + 2)

DT_X86_64_PLT: The address of PLT.
DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.

Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused,
it is repurposed to store the offset of the indirect branch in the corresponding
PLT entry. Together, they can be used to manipulate PLT entries

--
H.J.

Florian Weimer

unread,
Apr 21, 2023, 12:55:24 PM4/21/23
to H.J. Lu, x86-64-abi
* H. J. Lu:
DT_X86_64_PLTENT is optional, right? There's no fundamental requirement
for PLT entries have the same size.

The possible transformations are limited because for some applications,
it's required to preserve the accuracy of unwinding information. And
you need some sort of commitment from the toolchain that it won't
generate any jumps into the middle of a PLT entry.

Thanks,
Florian

Carlos O'Donell

unread,
Apr 21, 2023, 1:12:49 PM4/21/23
to H.J. Lu, x86-64-abi
On 4/21/23 12:38, H.J. Lu wrote:
> Define dynamic tags for Procedure Linkage Table (PLT):
>
> #define DT_X86_64_PLT (DT_LOPROC + 0)
> #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> #define DT_X86_64_PLTENT (DT_LOPROC + 2)
>
> DT_X86_64_PLT: The address of PLT.
> DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
> DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.
>
> Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused,

... that needs additional documentation adjustment too right?

> it is repurposed to store the offset of the indirect branch in the corresponding
> PLT entry. Together, they can be used to manipulate PLT entries

What kind of manipulations are you thinking about?

--
Cheers,
Carlos.

H.J. Lu

unread,
Apr 21, 2023, 1:39:59 PM4/21/23
to Florian Weimer, x86-64-abi
On Fri, Apr 21, 2023 at 9:55 AM Florian Weimer <fwe...@redhat.com> wrote:
>
> * H. J. Lu:
>
> > Define dynamic tags for Procedure Linkage Table (PLT):
> >
> > #define DT_X86_64_PLT (DT_LOPROC + 0)
> > #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> > #define DT_X86_64_PLTENT (DT_LOPROC + 2)
> >
> > DT_X86_64_PLT: The address of PLT.
> > DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
> > DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.
> >
> > Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused,
> > it is repurposed to store the offset of the indirect branch in the
> > corresponding PLT entry. Together, they can be used to manipulate PLT
> > entries
>
> DT_X86_64_PLTENT is optional, right? There's no fundamental requirement
> for PLT entries have the same size.

In order to rewrite the PLT entry, we need to know the size of each PLT entry
to avoid PLT entry overflow. If PLT entries don't have the same time, this tag
should be omitted.

> The possible transformations are limited because for some applications,
> it's required to preserve the accuracy of unwinding information. And
> you need some sort of commitment from the toolchain that it won't
> generate any jumps into the middle of a PLT entry.
>

True. The only valid entry point of the PLT entry should be the start of
the PLT entry.

--
H.J.

H.J. Lu

unread,
Apr 21, 2023, 1:41:56 PM4/21/23
to Carlos O'Donell, x86-64-abi
On Fri, Apr 21, 2023 at 10:12 AM Carlos O'Donell <car...@redhat.com> wrote:
>
> On 4/21/23 12:38, H.J. Lu wrote:
> > Define dynamic tags for Procedure Linkage Table (PLT):
> >
> > #define DT_X86_64_PLT (DT_LOPROC + 0)
> > #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> > #define DT_X86_64_PLTENT (DT_LOPROC + 2)
> >
> > DT_X86_64_PLT: The address of PLT.
> > DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
> > DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.
> >
> > Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused,
>
> ... that needs additional documentation adjustment too right?

Correct.

> > it is repurposed to store the offset of the indirect branch in the corresponding
> > PLT entry. Together, they can be used to manipulate PLT entries
>
> What kind of manipulations are you thinking about?
>

When lazy binding is disabled, ld.so may rewrite the indirect branch with
the direct branch if possible.


--
H.J.

Carlos O'Donell

unread,
Apr 21, 2023, 3:33:19 PM4/21/23
to H.J. Lu, x86-64-abi
There are a couple of combinations that could be in effect at this point.

To which PLT does this apply; ".plt" or ".plt.sec?"

Did we have the ".plt.sec" documentation in place already?

--
Cheers,
Carlos.

H.J. Lu

unread,
Apr 21, 2023, 4:45:13 PM4/21/23
to Carlos O'Donell, x86-64-abi
These dynamic tags apply to PLT with branches to external
functions via GOT. For CET enabled applications, it applies
to

Disassembly of section .plt.sec:

0000000000001050 <_dl_catch_exception@plt>:
1050: f3 0f 1e fa endbr64
1054: ff 25 5e 2f 03 00 jmp *0x32f5e(%rip) # 33fb8
<_dl_catch_exception@@GLIBC_PRIVATE+0x1afe8>
105a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)

not

Disassembly of section .plt:

0000000000001000 <.plt>:
....
1010: f3 0f 1e fa endbr64
1014: 68 00 00 00 00 push $0x0
1019: e9 e2 ff ff ff jmp 1000 <GLIBC_2.2.5+0x1000>
101e: 66 90 xchg %ax,%ax


--
H.J.

Fangrui Song

unread,
Apr 21, 2023, 4:57:23 PM4/21/23
to H.J. Lu, Carlos O'Donell, x86-64-abi
On Fri, Apr 21, 2023 at 1:45 PM H.J. Lu <hjl....@gmail.com> wrote:
>
> On Fri, Apr 21, 2023 at 12:33 PM Carlos O'Donell <car...@redhat.com> wrote:
> >
> > On 4/21/23 13:41, H.J. Lu wrote:
> > > On Fri, Apr 21, 2023 at 10:12 AM Carlos O'Donell <car...@redhat.com> wrote:
> > >>
> > >> On 4/21/23 12:38, H.J. Lu wrote:
> > >>> Define dynamic tags for Procedure Linkage Table (PLT):
> > >>>
> > >>> #define DT_X86_64_PLT (DT_LOPROC + 0)
> > >>> #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> > >>> #define DT_X86_64_PLTENT (DT_LOPROC + 2)
> > >>>
> > >>> DT_X86_64_PLT: The address of PLT.
> > >>> DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
> > >>> DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.
> > >>>
> > >>> Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused,

(I'd suggest writing some rationale, but thanksfully the replies have
mentioned PLT rewriting for rtld.)

Seems fine. There is a precedent on PowerPC64.
DT_PPC64_GLINK points to 32 bytes before the first lazy symbol
resolution stub (b 0x10000280 <__glink_PLTresolve>), in the middle of
__glink_PLTresolve.
Do you mean that with eager lazy binding, rtld may rewrite the .plt
code sequence to change indirect branches to direct branches (perhaps
for the main executable and immediately loaded shared objects).
Does that mean .plt is writable by rtld?


--
宋方睿

Jan Beulich

unread,
Apr 24, 2023, 2:50:38 AM4/24/23
to H.J. Lu, x86-64-abi
On 21.04.2023 18:38, H.J. Lu wrote:
> Define dynamic tags for Procedure Linkage Table (PLT):
>
> #define DT_X86_64_PLT (DT_LOPROC + 0)
> #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> #define DT_X86_64_PLTENT (DT_LOPROC + 2)

I guess the last one wants to be (DT_LOPROC + 3), to satisfy the even/odd
purposing of entries that the gABI specifies for entries outside of the
"two special compatibility ranges"?

Jan

Michael Matz

unread,
Apr 24, 2023, 10:44:38 AM4/24/23
to H.J. Lu, x86-64-abi
Hello,

On Fri, 21 Apr 2023, H.J. Lu wrote:

> Define dynamic tags for Procedure Linkage Table (PLT):
>
> #define DT_X86_64_PLT (DT_LOPROC + 0)
> #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> #define DT_X86_64_PLTENT (DT_LOPROC + 2)
>
> DT_X86_64_PLT: The address of PLT.
> DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
> DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.

Since the introduction of multiple PLT formats "the address of PLT" is
ambiguous. So you need to specify which. I see that replies in this
thread tried to clarify, but please put it into the full text of any
proposal. (And heed the remark from Jan about non-pointer dynamic tags to
have odd values)

> Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused, it
> is repurposed to store the offset of the indirect branch in the
> corresponding PLT entry. Together, they can be used to manipulate PLT
> entries

Woah, wait! Adding dynamic tags is one thing. Repurposing r_addend of
relocations is a completely different thing. Please don't mix the two in
one proposal. Or, if you want to have both in the same proposal then it
needs to be _much_ more complete. E.g. it would need to contain some
indication of which consumers were surveyed in order to see what effect
such change would have. I can easily imagine some loaders to error out on
r_addend being non-zero, for instance. And allowing manipulation of PLT
itself also seems to warrant a little more discussion than just mentioning
it in passing in half a sentence. (which PLT? Is it therefore moved into
the RELRO segment, or even into the RW segment? What are the security
implications, perhaps especially for .plt.sec?) And so on.


Ciao,
Michael.

H.J. Lu

unread,
Apr 24, 2023, 6:54:53 PM4/24/23
to Fangrui Song, Carlos O'Donell, x86-64-abi
Yes.

> Does that mean .plt is writable by rtld?
>

Not exactly. ld.so will change PLT back to readonly after PLT rewrite.


--
H.J.

H.J. Lu

unread,
Apr 24, 2023, 6:56:55 PM4/24/23
to Jan Beulich, x86-64-abi
On Sun, Apr 23, 2023 at 11:50 PM Jan Beulich <jbeu...@suse.com> wrote:
>
> On 21.04.2023 18:38, H.J. Lu wrote:
> > Define dynamic tags for Procedure Linkage Table (PLT):
> >
> > #define DT_X86_64_PLT (DT_LOPROC + 0)
> > #define DT_X86_64_PLTSZ (DT_LOPROC + 1)
> > #define DT_X86_64_PLTENT (DT_LOPROC + 2)
>
> I guess the last one wants to be (DT_LOPROC + 3), to satisfy the even/odd

You are right. It should be

#define DT_X86_64_PLTENT (DT_LOPROC + 3)

> purposing of entries that the gABI specifies for entries outside of the
> "two special compatibility ranges"?
>
> Jan
>
> > DT_X86_64_PLT: The address of PLT.
> > DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
> > DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.
> >
> > Since the r_addend field of R_X86_64_JUMP_SLOT relocation is unused,
> > it is repurposed to store the offset of the indirect branch in the corresponding
> > PLT entry. Together, they can be used to manipulate PLT entries
> >
>

Thanks.

--
H.J.

H.J. Lu

unread,
Apr 24, 2023, 7:00:54 PM4/24/23
to Michael Matz, x86-64-abi
True. The older versions of glibc add r_addend to the target address.

> itself also seems to warrant a little more discussion than just mentioning
> it in passing in half a sentence. (which PLT? Is it therefore moved into
> the RELRO segment, or even into the RW segment? What are the security
> implications, perhaps especially for .plt.sec?) And so on.
>

I will incorporate them into my proposal.

Thanks.

--
H.J.

Michael Matz

unread,
Apr 26, 2023, 9:30:25 AM4/26/23
to H.J. Lu, Fangrui Song, Carlos O'Donell, x86-64-abi
Hey,

On Mon, 24 Apr 2023, H.J. Lu wrote:

> > Does that mean .plt is writable by rtld?
> >
>
> Not exactly. ld.so will change PLT back to readonly after PLT rewrite.

Unsharing all PLT pages in its course :-/ You will also have problems with
processes that are started under certain seccomp filters. E.g. systemd's
MemoryDenyWriteExecute=true will install a filter that EACCESes each
mprotect that wants to make something PROT_EXEC (IOW: you can't ever make
anything executable after the fact, only via the initial mmap, but that
won't be writable then, as the same filter also denies
PROT_EXEC|PROT_WRITE mappings).

Obviously ld.so can be prepared for this and simply not do the rewriting
then. But ... is this all really such a wonderful optimization to be
worth the trouble?


Ciao,
Michael.

Florian Weimer

unread,
Apr 26, 2023, 11:04:51 AM4/26/23
to Michael Matz, H.J. Lu, Fangrui Song, Carlos O'Donell, x86-64-abi
* Michael Matz:

> Hey,
>
> On Mon, 24 Apr 2023, H.J. Lu wrote:
>
>> > Does that mean .plt is writable by rtld?
>> >
>>
>> Not exactly. ld.so will change PLT back to readonly after PLT rewrite.
>
> Unsharing all PLT pages in its course :-/ You will also have problems with
> processes that are started under certain seccomp filters. E.g. systemd's
> MemoryDenyWriteExecute=true will install a filter that EACCESes each
> mprotect that wants to make something PROT_EXEC (IOW: you can't ever make
> anything executable after the fact, only via the initial mmap, but that
> won't be writable then, as the same filter also denies
> PROT_EXEC|PROT_WRITE mappings).

It's possible to do the rewriting on the side and put it in place with
mremap *after* the problematic mprotect call (which may fail), so at
least ld.so can deal with the failure gracefully and proceed without the
optimization (unlike real text relocations).

There's the memfd_create loophole (which allows to create code at run
time even if mprotect is blocked and all writable file systems are
mounted noexec), but I think it's not worth exploring that because it
seems it's going away in practice.

> Obviously ld.so can be prepared for this and simply not do the rewriting
> then. But ... is this all really such a wonderful optimization to be
> worth the trouble?

I think so. I've been told that people are using Wasm today to get
direct jumps.

Thanks,
Florian

H.J. Lu

unread,
Apr 26, 2023, 11:57:42 AM4/26/23
to Michael Matz, Fangrui Song, Carlos O'Donell, x86-64-abi
On Wed, Apr 26, 2023 at 6:30 AM Michael Matz <ma...@suse.de> wrote:
>
> Hey,
>
> On Mon, 24 Apr 2023, H.J. Lu wrote:
>
> > > Does that mean .plt is writable by rtld?
> > >
> >
> > Not exactly. ld.so will change PLT back to readonly after PLT rewrite.
>
> Unsharing all PLT pages in its course :-/ You will also have problems with
> processes that are started under certain seccomp filters. E.g. systemd's
> MemoryDenyWriteExecute=true will install a filter that EACCESes each
> mprotect that wants to make something PROT_EXEC (IOW: you can't ever make
> anything executable after the fact, only via the initial mmap, but that
> won't be writable then, as the same filter also denies
> PROT_EXEC|PROT_WRITE mappings).

There won't be PROT_EXEC|PROT_WRITE. ld.so will change PLT
to PROT_WRITE before rewriting and change it back to PROT_EXEC.
In any case, it should be an opt-in feature.

> Obviously ld.so can be prepared for this and simply not do the rewriting
> then. But ... is this all really such a wonderful optimization to be
> worth the trouble?
>
>
> Ciao,
> Michael.



--
H.J.

Michael Matz

unread,
Apr 26, 2023, 12:16:36 PM4/26/23
to H.J. Lu, Fangrui Song, Carlos O'Donell, x86-64-abi
Hello,

On Wed, 26 Apr 2023, H.J. Lu wrote:

> > > > Does that mean .plt is writable by rtld?
> > > >
> > >
> > > Not exactly. ld.so will change PLT back to readonly after PLT rewrite.
> >
> > Unsharing all PLT pages in its course :-/ You will also have problems with
> > processes that are started under certain seccomp filters. E.g. systemd's
> > MemoryDenyWriteExecute=true will install a filter that EACCESes each
> > mprotect that wants to make something PROT_EXEC (IOW: you can't ever make
> > anything executable after the fact, only via the initial mmap, but that
> > won't be writable then, as the same filter also denies
> > PROT_EXEC|PROT_WRITE mappings).
>
> There won't be PROT_EXEC|PROT_WRITE. ld.so will change PLT
> to PROT_WRITE before rewriting and change it back to PROT_EXEC.

I know that this is your idea. I'm saying that this won't work. The
"mprotect (area, PROT_READ|PROT_EXEC)" will fail for such processes.
_Any_ mprotect that has PROT_EXEC set will fail. (As Florian says there
are loop-holes around this, but these are somewhat fishy, and either needs
access to a writable filesystem that's not mounted noexec, or similar
things).

> In any case, it should be an opt-in feature.

It has to be, yes.


Ciao,
Michael.

Michael Matz

unread,
Apr 26, 2023, 12:26:57 PM4/26/23
to Florian Weimer, H.J. Lu, Fangrui Song, Carlos O'Donell, x86-64-abi
Hello,

On Wed, 26 Apr 2023, Florian Weimer wrote:

> > anything executable after the fact, only via the initial mmap, but that
> > won't be writable then, as the same filter also denies
> > PROT_EXEC|PROT_WRITE mappings).
>
> It's possible to do the rewriting on the side and put it in place with
> mremap *after* the problematic mprotect call (which may fail), so at
> least ld.so can deal with the failure gracefully and proceed without the
> optimization (unlike real text relocations).

Ah yes, that might work.

> There's the memfd_create loophole (which allows to create code at run
> time even if mprotect is blocked and all writable file systems are
> mounted noexec), but I think it's not worth exploring that because it
> seems it's going away in practice.

Agreed.

> > Obviously ld.so can be prepared for this and simply not do the rewriting
> > then. But ... is this all really such a wonderful optimization to be
> > worth the trouble?
>
> I think so. I've been told that people are using Wasm today to get
> direct jumps.

Uhm, with measurement data, or with like "Wasm is c00L!!" ? I mean, we
are talking about changing the psABI, I would like to have something more
than hearsay about the newezt bl1ng as justification. I myself can't
remember having seen PLT slot code (or their memory access, or the
implied branch-target miss) in any performance measurements ever. Not
even in microbenchmarks, though I assume one could construct one for this
specific case.


Ciao,
Michael.

Florian Weimer

unread,
Apr 28, 2023, 5:17:34 AM4/28/23
to Michael Matz, H.J. Lu, Fangrui Song, Carlos O'Donell, x86-64-abi
* Michael Matz:

>> > Obviously ld.so can be prepared for this and simply not do the rewriting
>> > then. But ... is this all really such a wonderful optimization to be
>> > worth the trouble?
>>
>> I think so. I've been told that people are using Wasm today to get
>> direct jumps.
>
> Uhm, with measurement data, or with like "Wasm is c00L!!" ? I mean, we
> are talking about changing the psABI, I would like to have something more
> than hearsay about the newezt bl1ng as justification. I myself can't
> remember having seen PLT slot code (or their memory access, or the
> implied branch-target miss) in any performance measurements ever. Not
> even in microbenchmarks, though I assume one could construct one for this
> specific case.

They had measurement data, I think. And I guess Wasm isn't so cool if
you have to provide production support for it.

Thanks,
Florian

H.J. Lu

unread,
Apr 28, 2023, 11:45:07 AM4/28/23
to Florian Weimer, Michael Matz, Fangrui Song, Carlos O'Donell, x86-64-abi
In some production environments, indirect branch in PLT is the bottleneck.


--
H.J.

H.J. Lu

unread,
May 1, 2023, 2:40:30 PM5/1/23
to x86-64-abi
Procedure Linkage Table (PLT) is used to call external functions defined
in executables or shared libraries. For PLT, which satisfies the following
conditions:

1. PLT entries transfer control to external functions via indirect branch
over the corresponding entry in Global Offset Table (GOT) referenced by
R_X86_64_JUMP_SLOT relocation. A PLT entry may look like

jmp *foo@GOTPCREL(%rip)
pushq $foo_relocation_index
jmp .PLT0

or

endbr64
jmp *foo@GOTPCREL(%rip)
nop

2. All such PLT entries have the same layout.
3. The only entry point of a PLT entry is the first byte of the entry.
4. All entries have the same size and are aligned to the entry size.
5. When resolving R_X86_64_JUMP_SLOT relocation to update the GOT entry,
dynamic linker ignores the r_addend field of R_X86_64_JUMP_SLOT
relocation. Starting from glibc 2.33 branch, r_addend is ignored by

f8587a6189 x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT

define dynamic tags:

#define DT_X86_64_PLT (DT_LOPROC + 0)
#define DT_X86_64_PLTSZ (DT_LOPROC + 1)
#define DT_X86_64_PLTENT (DT_LOPROC + 3)

DT_X86_64_PLT: The address of PLT.
DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.

These PLT dynamic tags don't require any special PLT layout treatment.

When the r_addend field of R_X86_64_JUMP_SLOT relocation is ignored, it
is repurposed to store the offset, in bytes, of the indirect branch in
the corresponding PLT entry.

If dynamic linker is allowed to

1. Change PLT to writable and non-executable.
2. Update PLT entries.
3. Change PLT back to read-only and executable.

scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
and GOT entries, dynamic linker may change indirect branch in PLT entries
to direct branch when

1. Lazy binding is disabled.
2. The PLT entry can accommodate 32-bit direct branch.
3. The indirect branch address stored in the GOT entry can be reached
by direct branch.


--
H.J.

Jan Beulich

unread,
May 2, 2023, 4:54:38 AM5/2/23
to H.J. Lu, x86-64-abi
Iirc the question was raised before, but isn't addressed here: The storing
into r_addend is done by the static linker aiui. How does the static linker
know whether the dynamic linker is going to ignore r_addend (in its normal
meaning; as per below it'll again not be ignored)? While the presence of
the new dynamic tags can indicate to an aware dynamic linker that r_addend
has different meaning, that presence is entirely meaningless to unaware
dynamic linkers. Yet at the very least one would expect that such binaries
would fail to load (and not e.g. crash or misbehave later) with unaware
dynamic linkers (according to my reading, unknown DT_* are simply skipped).

Jan

Michael Matz

unread,
May 2, 2023, 8:47:09 AM5/2/23
to H.J. Lu, Florian Weimer, Fangrui Song, Carlos O'Donell, x86-64-abi
Hello,
If it's that bad (and that all is still hear-say), then I bet many cents
that there are many _much_ lower hanging fruits than changing the PLT code
to improve performance of whatever it's doing: if the indirect branch in
the PLT is a performance bottleneck that means function calls to external
library code dominate _everything_ you're doing, which in turn means
you're not doing much. Better code design would usually avoid doing
nothing with many calls.

But it's a side-discussion anyway, I'm not particularly against
documenting non-standard PLT layouts (and adding some support for such to
the psABI). I do wonder if we should document the specific layout of the
(non-standard) PLT at all, though, it's normally an internal
implementation detail that only low-level tools like ltrace would need to
know.


Ciao,
Michael.

Fangrui Song

unread,
May 2, 2023, 2:05:35 PM5/2/23
to H.J. Lu, Michael Matz, Florian Weimer, Carlos O'Donell, x86-64-abi
I am wondering whether the PLT call overhead is mainly present in
functions such as memcpy/memset/strcpy. These C library functions are
the primary reason why -fno-plt is advantageous for certain workloads.

I have some notes on
https://maskray.me/blog/2021-09-19-all-about-procedure-linkage-table#fno-plt

> By specifying -fno-plt, the user is making a trade-off on a function
> call which is not known to be bound locally at compile time: whether it
> resolves to the same component. For some benchmarks, especially those
> where libc functions are a bottleneck, there may be a performance win.
> This is however deceiving, as statically linking some libc functions
> will hide the performance win.

> Instead of using a global -fno-plt, the function attribute
> __attribute__((noplt)) can be added to individual declarations for
> fine-grained control. The compiler generates GOT indirection for calls
> to annotated functions.

With -fno-plt and __attribute__((noplt)), is there still a strong need
to add DT_X86_64_PLT* dynamic tags?

To use the proposed rtld optimization, the page containing .plt needs to
be remapped. The divergence between PT_LOAD program headers and memory
mappings may require tools to adapt. For example, if a user uses
hugepages to optimize text sections, the code will need to recognize the
.text part (large), not the memory-backed .plt part (small).

H.J. Lu

unread,
May 5, 2023, 12:59:54 PM5/5/23
to Jan Beulich, x86-64-abi
Since the dynamic linker may not ignore r_addend in R_X86_64_JUMP_SLOT,
this feature should be enabled by a linker option. Developer must know if the
dynamic linker works correctly with non-zero r_addend.

> Jan
>
> > If dynamic linker is allowed to
> >
> > 1. Change PLT to writable and non-executable.
> > 2. Update PLT entries.
> > 3. Change PLT back to read-only and executable.
> >
> > scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
> > and GOT entries, dynamic linker may change indirect branch in PLT entries
> > to direct branch when
> >
> > 1. Lazy binding is disabled.
> > 2. The PLT entry can accommodate 32-bit direct branch.
> > 3. The indirect branch address stored in the GOT entry can be reached
> > by direct branch.
> >
> >
>


--
H.J.

H.J. Lu

unread,
May 5, 2023, 1:04:40 PM5/5/23
to Michael Matz, Florian Weimer, Fangrui Song, Carlos O'Donell, x86-64-abi
In some cases, PLT can't be avoided.

> But it's a side-discussion anyway, I'm not particularly against
> documenting non-standard PLT layouts (and adding some support for such to
> the psABI). I do wonder if we should document the specific layout of the
> (non-standard) PLT at all, though, it's normally an internal
> implementation detail that only low-level tools like ltrace would need to
> know.

2 PLT layouts are examples. Any PLT layouts which satisfy the listed
conditions should work.

Thanks.

--
H.J.

H.J. Lu

unread,
May 5, 2023, 1:44:05 PM5/5/23
to Fangrui Song, Michael Matz, Florian Weimer, Carlos O'Donell, x86-64-abi
-fno-plt should also improve performance. But in some cases, PLT
is needed. Otherwise, -fno-plt can be default.

> To use the proposed rtld optimization, the page containing .plt needs to
> be remapped. The divergence between PT_LOAD program headers and memory

No need to remap PLT. mprotect can be used.

> mappings may require tools to adapt. For example, if a user uses
> hugepages to optimize text sections, the code will need to recognize the
> .text part (large), not the memory-backed .plt part (small).



--
H.J.

Carlos O'Donell

unread,
May 16, 2023, 3:15:11 PM5/16/23
to H.J. Lu, Jan Beulich, x86-64-abi
So the feature is backwards incompatible?

An old loader cannot ignore the new feature and operate correctly?

The last case of this I can remember was the treatment of SHN_ABS-relative
symbols for MIPS in 2018.

From glibc NEWS:

2298 * The GNU C Library now has correct support for ABSOLUTE symbols
2299 (SHN_ABS-relative symbols). Previously such ABSOLUTE symbols were
2300 relocated incorrectly or in some cases discarded. The GNU linker can
2301 make use of the newer semantics, but it must communicate it to the
2302 dynamic loader by setting the ELF file's identification (EI_ABIVERSION
2303 field) to indicate such support is required.

Would we need an equivalent EI_ABIVERSION change?

The linker has to tell the loader in some way that the ABI has changed.

>> Jan
>>
>>> If dynamic linker is allowed to
>>>
>>> 1. Change PLT to writable and non-executable.
>>> 2. Update PLT entries.
>>> 3. Change PLT back to read-only and executable.
>>>
>>> scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
>>> and GOT entries, dynamic linker may change indirect branch in PLT entries
>>> to direct branch when
>>>
>>> 1. Lazy binding is disabled.
>>> 2. The PLT entry can accommodate 32-bit direct branch.
>>> 3. The indirect branch address stored in the GOT entry can be reached
>>> by direct branch.
>>>
>>>
>>
>
>

--
Cheers,
Carlos.

H.J. Lu

unread,
May 16, 2023, 7:10:17 PM5/16/23
to Carlos O'Donell, Jan Beulich, x86-64-abi
The x86-64 psABI specifies that R_X86_64_JUMP_SLOT should be resolved
to the symbol value. Since glibc versions older than glibc 2.34 add r_addend
for R_X86_64_JUMP_SLOT, they aren't compatible with R_X86_64_JUMP_SLOT
with non-zero r_addend.

> The last case of this I can remember was the treatment of SHN_ABS-relative
> symbols for MIPS in 2018.
>
> From glibc NEWS:
>
> 2298 * The GNU C Library now has correct support for ABSOLUTE symbols
> 2299 (SHN_ABS-relative symbols). Previously such ABSOLUTE symbols were
> 2300 relocated incorrectly or in some cases discarded. The GNU linker can
> 2301 make use of the newer semantics, but it must communicate it to the
> 2302 dynamic loader by setting the ELF file's identification (EI_ABIVERSION
> 2303 field) to indicate such support is required.
>
> Would we need an equivalent EI_ABIVERSION change?

We should bump EI_ABIVERSION with this change.

>
> The linker has to tell the loader in some way that the ABI has changed.

True.

>
> >> Jan
> >>
> >>> If dynamic linker is allowed to
> >>>
> >>> 1. Change PLT to writable and non-executable.
> >>> 2. Update PLT entries.
> >>> 3. Change PLT back to read-only and executable.
> >>>
> >>> scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
> >>> and GOT entries, dynamic linker may change indirect branch in PLT entries
> >>> to direct branch when
> >>>
> >>> 1. Lazy binding is disabled.
> >>> 2. The PLT entry can accommodate 32-bit direct branch.
> >>> 3. The indirect branch address stored in the GOT entry can be reached
> >>> by direct branch.
> >>>
> >>>
> >>
> >
> >
>
> --
> Cheers,
> Carlos.
>

Thanks.

--
H.J.

H.J. Lu

unread,
May 19, 2023, 6:54:43 PM5/19/23
to x86-6...@googlegroups.com
Procedure Linkage Table (PLT) is used to call external functions defined
in executables or shared libraries. For PLT, which satisfies the following
conditions:

1. PLT entries transfer control to external functions via indirect branch
over the corresponding entry in Global Offset Table (GOT) referenced by
R_X86_64_JUMP_SLOT relocation. A PLT entry may look like

jmp *foo@GOTPCREL(%rip)
pushq $foo_relocation_index
jmp .PLT0

or

endbr64
jmp *foo@GOTPCREL(%rip)
nop

2. All such PLT entries have the same layout.
3. The only entry point of a PLT entry is the first byte of the entry.
4. All entries have the same size and are aligned to the entry size.

define dynamic tags:

#define DT_X86_64_PLT (DT_LOPROC + 0)
#define DT_X86_64_PLTSZ (DT_LOPROC + 1)
#define DT_X86_64_PLTENT (DT_LOPROC + 3)

DT_X86_64_PLT: The address of PLT.
DT_X86_64_PLTSZ: The total size, in bytes, of PLT.
DT_X86_64_PLTENT: The size, in bytes, of a PLT entry.

These PLT dynamic tags don't require any special PLT layout treatment.

Since the r_addend field of R_X86_64_JUMP_SLOT relocation is ignored, it
is repurposed to store the offset, in bytes, of the indirect branch in
the corresponding PLT entry.

If dynamic linker is allowed to

1. Change PLT to writable and non-executable.
2. Update PLT entries.
3. Change PLT back to read-only and executable.

scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
and GOT entries, dynamic linker may change indirect branch in PLT entries
to direct branch when

1. Lazy binding is disabled.
2. The PLT entry can accommodate 32-bit direct branch.
3. The indirect branch address stored in the GOT entry can be reached
by direct branch.

The x86-64 psABI specifies that R_X86_64_JUMP_SLOT should be resolved
to the symbol value. But dynamic linkers in glibc versions older than
2.33 don't ignore r_addend. Non-zero r_addend in R_X86_64_JUMP_SLOT
relocation is incompatible with such dynamic linkers.

When loading an ELF binary, dynamic linker in glibc checks the EI_OSABI
and EI_ABIVERSION fields in its ELF header. It won't load the binary if
its EI_OSABI field == ELFOSABI_GNU and its EI_ABIVERSION field >=
LIBC_ABI_MAX which is defined as

enum
{
LIBC_ABI_DEFAULT = 0,
LIBC_ABI_UNIQUE,
LIBC_ABI_IFUNC,
LIBC_ABI_ABSOLUTE,
LIBC_ABI_MAX
};

When non-zero r_addend in R_X86_64_JUMP_SLOT relocation, linker should
set the EI_OSABI field to ELFOSABI_GNU and the EI_ABIVERSION field to
LIBC_ABI_PLT define as

enum
{
LIBC_ABI_DEFAULT = 0,
LIBC_ABI_UNIQUE,
LIBC_ABI_IFUNC,
LIBC_ABI_ABSOLUTE,
LIBC_ABI_PLT,
LIBC_ABI_MAX
};

Dynamic linker in glibc should be updated to allow ELF binaries with
EI_OSABI == ELFOSABI_GNU and EI_ABIVERSION == LIBC_ABI_PLT.


H.J.

H.J. Lu

unread,
May 22, 2023, 12:49:10 PM5/22/23
to x86-6...@googlegroups.com
This doesn't work on executables which are loaded by kernel. We can
add a version dependency of glibc 2.33 similar to DT_RELR.

> enum
> {
> LIBC_ABI_DEFAULT = 0,
> LIBC_ABI_UNIQUE,
> LIBC_ABI_IFUNC,
> LIBC_ABI_ABSOLUTE,
> LIBC_ABI_MAX
> };
>
> When non-zero r_addend in R_X86_64_JUMP_SLOT relocation, linker should
> set the EI_OSABI field to ELFOSABI_GNU and the EI_ABIVERSION field to
> LIBC_ABI_PLT define as
>
> enum
> {
> LIBC_ABI_DEFAULT = 0,
> LIBC_ABI_UNIQUE,
> LIBC_ABI_IFUNC,
> LIBC_ABI_ABSOLUTE,
> LIBC_ABI_PLT,
> LIBC_ABI_MAX
> };
>
> Dynamic linker in glibc should be updated to allow ELF binaries with
> EI_OSABI == ELFOSABI_GNU and EI_ABIVERSION == LIBC_ABI_PLT.
>
>
> H.J.



--
H.J.

Carlos O'Donell

unread,
May 25, 2023, 7:39:17 AM5/25/23
to H.J. Lu, x86-6...@googlegroups.com
OK. Simlilar to PT_GNU_RELRO operations.

> scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
> and GOT entries, dynamic linker may change indirect branch in PLT entries
> to direct branch when
>
> 1. Lazy binding is disabled.
> 2. The PLT entry can accommodate 32-bit direct branch.
> 3. The indirect branch address stored in the GOT entry can be reached
> by direct branch.
>
> The x86-64 psABI specifies that R_X86_64_JUMP_SLOT should be resolved
> to the symbol value. But dynamic linkers in glibc versions older than
> 2.33 don't ignore r_addend. Non-zero r_addend in R_X86_64_JUMP_SLOT
> relocation is incompatible with such dynamic linkers.

OK. Agreed.

>
> When loading an ELF binary, dynamic linker in glibc checks the EI_OSABI
> and EI_ABIVERSION fields in its ELF header. It won't load the binary if
> its EI_OSABI field == ELFOSABI_GNU and its EI_ABIVERSION field >=
> LIBC_ABI_MAX which is defined as
>
> enum
> {
> LIBC_ABI_DEFAULT = 0,
> LIBC_ABI_UNIQUE,
> LIBC_ABI_IFUNC,
> LIBC_ABI_ABSOLUTE,
> LIBC_ABI_MAX
> };
>
> When non-zero r_addend in R_X86_64_JUMP_SLOT relocation, linker should
> set the EI_OSABI field to ELFOSABI_GNU and the EI_ABIVERSION field to
> LIBC_ABI_PLT define as
>
> enum
> {
> LIBC_ABI_DEFAULT = 0,
> LIBC_ABI_UNIQUE,
> LIBC_ABI_IFUNC,
> LIBC_ABI_ABSOLUTE,
> LIBC_ABI_PLT,

Agreed. This solves the backwards incompatibility.

> LIBC_ABI_MAX
> };
>
> Dynamic linker in glibc should be updated to allow ELF binaries with
> EI_OSABI == ELFOSABI_GNU and EI_ABIVERSION == LIBC_ABI_PLT.

Agreed.

>
>
> H.J.
>

--
Cheers,
Carlos.

Carlos O'Donell

unread,
May 25, 2023, 7:42:20 AM5/25/23
to H.J. Lu, x86-6...@googlegroups.com
That is an implementation detail.

The kernel maps the binary, and invokes the dynamic loader.

The dynamic loader should be doing whatever checks it needs to do to ensure compatibility.

Isn't a glibc bug that the loader doesn't check EI_ABIVERSION for a kernel loaded binary?

>> enum
>> {
>> LIBC_ABI_DEFAULT = 0,
>> LIBC_ABI_UNIQUE,
>> LIBC_ABI_IFUNC,
>> LIBC_ABI_ABSOLUTE,
>> LIBC_ABI_MAX
>> };
>>
>> When non-zero r_addend in R_X86_64_JUMP_SLOT relocation, linker should
>> set the EI_OSABI field to ELFOSABI_GNU and the EI_ABIVERSION field to
>> LIBC_ABI_PLT define as
>>
>> enum
>> {
>> LIBC_ABI_DEFAULT = 0,
>> LIBC_ABI_UNIQUE,
>> LIBC_ABI_IFUNC,
>> LIBC_ABI_ABSOLUTE,
>> LIBC_ABI_PLT,
>> LIBC_ABI_MAX
>> };
>>
>> Dynamic linker in glibc should be updated to allow ELF binaries with
>> EI_OSABI == ELFOSABI_GNU and EI_ABIVERSION == LIBC_ABI_PLT.
>>
>>
>> H.J.
>
>
>

--
Cheers,
Carlos.

H.J. Lu

unread,
May 25, 2023, 2:45:49 PM5/25/23
to Carlos O'Donell, x86-6...@googlegroups.com
We can't change existing ld.so to check EI_ABIVERSION for a kernel
loaded binary. For DT_RELR, linker adds the GLIBC_ABI_DT_RELR
version dependency so that DT_RELR will fail to run with existing
ld.so which doesn't provide GLIBC_ABI_DT_RELR. We should do the
same for R_X86_64_JUMP_SLOT with non-zero r_addend.

> >> enum
> >> {
> >> LIBC_ABI_DEFAULT = 0,
> >> LIBC_ABI_UNIQUE,
> >> LIBC_ABI_IFUNC,
> >> LIBC_ABI_ABSOLUTE,
> >> LIBC_ABI_MAX
> >> };
> >>
> >> When non-zero r_addend in R_X86_64_JUMP_SLOT relocation, linker should
> >> set the EI_OSABI field to ELFOSABI_GNU and the EI_ABIVERSION field to
> >> LIBC_ABI_PLT define as
> >>
> >> enum
> >> {
> >> LIBC_ABI_DEFAULT = 0,
> >> LIBC_ABI_UNIQUE,
> >> LIBC_ABI_IFUNC,
> >> LIBC_ABI_ABSOLUTE,
> >> LIBC_ABI_PLT,
> >> LIBC_ABI_MAX
> >> };
> >>
> >> Dynamic linker in glibc should be updated to allow ELF binaries with
> >> EI_OSABI == ELFOSABI_GNU and EI_ABIVERSION == LIBC_ABI_PLT.
> >>
> >>
> >> H.J.
> >
> >
> >
>
> --
> Cheers,
> Carlos.
>


--
H.J.

Fangrui Song

unread,
May 25, 2023, 3:36:14 PM5/25/23
to H.J. Lu, Carlos O'Donell, x86-6...@googlegroups.com
Could you elaborate the meaning of the offset? Perhaps an example will help.
Is the offset used to avoid some disassembly work for the rtld PLT optimization?

>> >> If dynamic linker is allowed to
>> >>
>> >> 1. Change PLT to writable and non-executable.
>> >> 2. Update PLT entries.
>> >> 3. Change PLT back to read-only and executable.
>> >>
>> >> scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
>> >> and GOT entries, dynamic linker may change indirect branch in PLT entries
>> >> to direct branch when
>> >>
>> >> 1. Lazy binding is disabled.
>> >> 2. The PLT entry can accommodate 32-bit direct branch.
>> >> 3. The indirect branch address stored in the GOT entry can be reached
>> >> by direct branch.
>> >>
>> >> The x86-64 psABI specifies that R_X86_64_JUMP_SLOT should be resolved
>> >> to the symbol value. But dynamic linkers in glibc versions older than
>> >> 2.33 don't ignore r_addend. Non-zero r_addend in R_X86_64_JUMP_SLOT
>> >> relocation is incompatible with such dynamic linkers.
>> >>
>> >> When loading an ELF binary, dynamic linker in glibc checks the EI_OSABI
>> >> and EI_ABIVERSION fields in its ELF header. It won't load the binary if
>> >> its EI_OSABI field == ELFOSABI_GNU and its EI_ABIVERSION field >=
>> >> LIBC_ABI_MAX which is defined as

This schemes will optimize PLT among immediately loaded shared objects Their
distances are usually within 32-bit direct branches. The executable is usually
located very far away from the shared objects.

The .plt.got optimization should be dropped, then?

cat > a.c <<e
void combined0(); void combined1();
void foo0(); void foo1();
unsigned long var;
void _start() {
var = (unsigned long)combined0 + (unsigned long)combined1;
combined0(); combined1();
foo0(); foo1();
}
e
cat > b.s <<e
.globl foo0, foo1, combined0, combined1
foo0: foo1: combined0: combined1:
e
gcc -fuse-ld=bfd -shared b.s -o b.so
gcc -fuse-ld=bfd -pie -nostdlib -fpie a.c b.so -o a

.plt and .plt.got have different PLT entries, nullifying this rtld PLT rewriting optimization.

>> > This doesn't work on executables which are loaded by kernel. We can
>> > add a version dependency of glibc 2.33 similar to DT_RELR.
>>
>> That is an implementation detail.
>>
>> The kernel maps the binary, and invokes the dynamic loader.
>>
>> The dynamic loader should be doing whatever checks it needs to do to ensure compatibility.
>>
>> Isn't a glibc bug that the loader doesn't check EI_ABIVERSION for a kernel loaded binary?
>
>We can't change existing ld.so to check EI_ABIVERSION for a kernel
>loaded binary. For DT_RELR, linker adds the GLIBC_ABI_DT_RELR
>version dependency so that DT_RELR will fail to run with existing
>ld.so which doesn't provide GLIBC_ABI_DT_RELR. We should do the
>same for R_X86_64_JUMP_SLOT with non-zero r_addend.

If the r_addend is repurposed, I agree that a symbol version similar to GLIBC_ABI_DT_RELR will work.
glibc reports an error if a Verneed is not defined.

>> >> enum
>> >> {
>> >> LIBC_ABI_DEFAULT = 0,
>> >> LIBC_ABI_UNIQUE,
>> >> LIBC_ABI_IFUNC,
>> >> LIBC_ABI_ABSOLUTE,
>> >> LIBC_ABI_MAX
>> >> };
>> >>
>> >> When non-zero r_addend in R_X86_64_JUMP_SLOT relocation, linker should
>> >> set the EI_OSABI field to ELFOSABI_GNU and the EI_ABIVERSION field to
>> >> LIBC_ABI_PLT define as
>> >>
>> >> enum
>> >> {
>> >> LIBC_ABI_DEFAULT = 0,
>> >> LIBC_ABI_UNIQUE,
>> >> LIBC_ABI_IFUNC,
>> >> LIBC_ABI_ABSOLUTE,
>> >> LIBC_ABI_PLT,
>> >> LIBC_ABI_MAX
>> >> };
>> >>
>> >> Dynamic linker in glibc should be updated to allow ELF binaries with
>> >> EI_OSABI == ELFOSABI_GNU and EI_ABIVERSION == LIBC_ABI_PLT.
>> >>
>> >>
>> >> H.J.
>> >
>> >
>> >
>>
>> --
>> Cheers,
>> Carlos.
>>
>
>
>--
>H.J.
>
>--
>You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
>To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/CAMe9rOpiRS4Co1-%2BY_hTbM0WDUocf%3Dkpanfd8N%3D6tFSuk6eEpA%40mail.gmail.com.

H.J. Lu

unread,
May 25, 2023, 5:46:10 PM5/25/23
to Fangrui Song, Carlos O'Donell, x86-6...@googlegroups.com
Good point.

Thanks.
--
H.J.

H.J. Lu

unread,
May 26, 2023, 1:14:46 PM5/26/23
to x86-6...@googlegroups.com
If dynamic linker is allowed to

1. Change PLT to writable and non-executable.
2. Update PLT entries.
3. Change PLT back to read-only and executable.

scanning R_X86_64_JUMP_SLOT relocations to locate the corresponding PLT
and GOT entries, dynamic linker may change indirect branch in PLT entries
to direct branch when

1. Lazy binding is disabled.
2. The PLT entry can accommodate 32-bit direct branch.
3. The indirect branch address stored in the GOT entry can be reached
by direct branch.

The x86-64 psABI specifies that R_X86_64_JUMP_SLOT should be resolved
to the symbol value. But dynamic linkers in glibc versions older than
2.33 don't ignore r_addend. Non-zero r_addend in R_X86_64_JUMP_SLOT
relocation is incompatible with such dynamic linkers.

Since executables are loaded by kernel and existing glibc doesn't check
the ELF header in executables, linker should add the glibc version 2.33
dependency to executables and shared libraries with non-zero r_addend in
R_X86_64_JUMP_SLOT.


H.J.
Reply all
Reply to author
Forward
0 new messages