On Sat, Feb 26, 2022 at 7:19 PM Rui Ueyama via Binutils
<
binu...@sourceware.org> wrote:
>
> Hello,
>
> I'd like to propose an alternative instruction sequence for the Intel
> CET-enabled PLT section. Compared to the existing one, the new scheme is
> simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not
> require a separate second PLT section (.plt.sec).
>
> Here is the proposed code sequence:
>
> PLT0:
>
> f3 0f 1e fa // endbr64
> 41 53 // push %r11
> ff 35 00 00 00 00 // push GOT[1]
> ff 25 00 00 00 00 // jmp *GOT[2]
> 0f 1f 40 00 // nop
> 0f 1f 40 00 // nop
> 0f 1f 40 00 // nop
> 66 90 // nop
>
> PLTn:
>
> f3 0f 1e fa // endbr64
> 41 bb 00 00 00 00 // mov $namen_reloc_index %r11d
> ff 25 00 00 00 00 // jmp *GOT[namen_index]
All PLT calls will have an extra MOV.
> GOT[namen_index] is initialized to PLT0 for all PLT entries, so that when a
> PLT entry is called for the first time, the control is passed to PLT0 to call
> the resolver function.
>
> It uses %r11 as a scratch register. x86-64 psABI explicitly allows PLT entries
> to clobber this register (*1), and the resolve function (__dl_runtime_resolve)
> already clobbers it.
>
> (*1) x86-64 psABI p.24 footnote 17: "Note that %r11 is neither required to be
> preserved, nor is it used to pass arguments. Making this register available as
> scratch register means that code in the PLT need not spill any registers when
> computing the address to which control needs to be transferred."
>
> FYI, this is the current CET-enabled PLT:
>
> PLT0:
>
> ff 35 00 00 00 00 // push GOT[0]
> f2 ff 25 e3 2f 00 00 // bnd jmp *GOT[1]
> 0f 1f 00 // nop
>
> PLTn in .plt:
>
> f3 0f 1e fa // endbr64
> 68 00 00 00 00 // push $namen_reloc_index
> f2 e9 e1 ff ff ff // bnd jmpq PLT0
> 90 // nop
>
> PLTn in .plt.sec:
>
> f3 0f 1e fa // endbr64
> f2 ff 25 ad 2f 00 00 // bnd jmpq *GOT[namen_index]
> 0f 1f 44 00 00 // nop
>
> In the proposed format, PLT0 is 32 bytes long and each entry is 16 bytes. In
> the existing format, PLT0 is 16 bytes and each entry is 32 bytes. Usually, we
> have many PLT sections while we have only one header, so in practice, the
> proposed format is almost 50% smaller than the existing one.
Does it have any impact on performance? .plt.sec can be placed
in a different page from .plt.
> The proposed PLT does not use jump instructions with BND prefix, as Intel MPX
> has been deprecated.
>
> I already implemented the proposed scheme to my linker
> (
https://github.com/rui314/mold) and it looks like it's working fine.
>
> Any thoughts?
I'd like to see visible performance improvements or new features in
a new PLT layout.
I cced x86-64 psABI mailing list.
--
H.J.