RFC: Update x86 psABIs to support IBT

95 views
Skip to first unread message

H.J. Lu

unread,
Jun 20, 2017, 12:38:09 PM6/20/17
to IA32 System V Application Binary Interface, x86-6...@googlegroups.com, gnu-...@sourceware.org
On Tue, Jun 13, 2017 at 12:11 PM, H.J. Lu <hjl....@gmail.com> wrote:
> To support ENDBR in Intel Control-flow Enforcement Technology (CET)
> instructions:
>
> https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
>
> following changes to i386 psABI are required.

Here is the updated extension for both i386 and x86-64 psABI to
support IBT. I will post a binutls patch later.

Any comments?

--
H.J.
---
To support indirect branch tracking (IBT) in Intel Control-flow Enforcement
Technology (CET) instructions:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

following changes to x86 psABI are required.

To program properties, add

#define GNU_PROPERTY_X86_FEATURE_1_AND 0xc0000002

#define GNU_PROPERTY_X86_FEATURE_1_IBT (1U << 0)

to indicate that all executable sections are compatible with IBT when
ENDBR instruction is inserted at:

a. All function entries whose addresses may be taken.
b. All branch targets whose addresses have been taken.

GNU_PROPERTY_X86_FEATURE_1_IBT is set on output only if it is set on
all relocatable inputs, which means that the C library must be compiled
with IBT-enabled compiler.

The followings changes are made to the Procedure Linkage Table (PLT) to
enable IBT:

1. For 64-bit x86-64, PLT is changed to:

PLT0: push GOT[1]
bnd jmp *GOT[2]
nop
...
PLTn: endbr64
push namen_reloc_index
bnd jmp PLT0

together with the second PLT section:

PLTn: endbr64
bnd jmp *GOT[namen_index]
nop

BND prefix is also added so that IBT-enabled PLT is compatible with MPX.

2. For 32-bit x86-64 (x32) and i386, PLT is changed to

PLT0: push GOT[1]
jmp *GOT[2]
nop
...
PLTn: endbr64 # endbr32 for i386.
push namen_reloc_index
jmp PLT0

together with the second PLT section:

PLTn: endbr64 # endbr32 for i386.
jmp *GOT[namen_index]
nop

BND prefix isn't used since MPX isn't supported on x32 and BND registers
aren't used in parameter passing on i386.

GOT is an array of addresses. Initially, GOT[namen_index] is filled
with the address of the ENDBR instruction of the corresponding entry
in the first PLT section. The function, namen, is called via the
ENDBR instruction in the second PLT entry. GOT[namen_index] is updated
to the actual address of the function, namen, at run-time.

Load-time processing

On an IBT capable processor, the following steps should be taken:

1. When loading an executable, if GNU_PROPERTY_X86_FEATURE_1_IBT is
set on the executable, enable IBT.
2. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
a. If legacy interwork is allowed, then mark all pages in executable
PL_LOAD segments in legacy code page bitmap. Failure of legacy code
page bitmap allocation causes an error.
b. If legacy interwork isn't allowed, it causes an error.

H.J. Lu

unread,
Feb 19, 2019, 11:35:54 PM2/19/19
to IA32 System V Application Binary Interface, x86-6...@googlegroups.com, gnu-...@sourceware.org
There are 2 reasons for this 2-PLT scheme:

1. Provide compatibility with other tools that have an hardcoded limit of 16
bytes for an x86 PLT entry.
2. Improve code cache locality: since most of the instructions in .plt would be
executed only the first time a symbol is resolved they would waste space in
the cache and, by having a .plt.sec, only instructions that are often executed
would be cached.

--
H.J.

H.J. Lu

unread,
Feb 20, 2019, 10:01:14 PM2/20/19
to Rui Ueyama, IA32 System V Application Binary Interface, x86-6...@googlegroups.com, gnu-...@sourceware.org
On Wed, Feb 20, 2019 at 4:30 PM Rui Ueyama <ru...@google.com> wrote:
>
> Hi H.J.Lu,
>
> I'm replying because I was wondering why the 2-PLT scheme was chosen to support Intel CET.
> I don't think that the 2-PLT scheme actually provides compatibility with existing tools. The new PLT uses different code instructions, and the usage of the .plt section has changed as well. IIUC, foo@PLT is now resolved to its entry in the second PLT instead of the first regular PLT.
>
> I know that some existing tools even crash if we change the PLT entry size, so keeping the PLT entry size would at least keep them from crashing. But I'd think compatibility means more than that.
>

We are doing the best we can.

>> 2. Improve code cache locality: since most of the instructions in .plt would be
>> executed only the first time a symbol is resolved they would waste space in
>> the cache and, by having a .plt.sec, only instructions that are often executed
>> would be cached.
>
>
> This is personally much more convincing answer than keeping the compatibility. The PLT section could be hot, and separating hot code from relatively cold code could have an performance impact. But do you know how much is the impact? I wonder if there's a measurable difference if you simply extend the PLT size to 32-byte.
>

We don't have such data.

FWIW, we introduced 2 PLT scheme for MPX. This isn't a new thing in
x86-64 psABI.


--
H.J.

H.J. Lu

unread,
Feb 21, 2019, 2:22:44 PM2/21/19
to Rui Ueyama, IA32 System V Application Binary Interface, x86-6...@googlegroups.com, gnu-...@sourceware.org
On Thu, Feb 21, 2019 at 11:18 AM Rui Ueyama <ru...@google.com> wrote:
> Then it could be a premature optimization. The single PLT scheme would be undeniably much simpler, so unless it is shown to not work, we probably shouldn't have splitted a PLT into two, no?
>

Simpler to implement, yes. We designed it with performance in mind.
We have implemented it many years ago starting from MPX. It shouldn't
be changed just because it is "hard" to implement.

--
H.J.

H.J. Lu

unread,
Feb 21, 2019, 6:09:27 PM2/21/19
to Rui Ueyama, IA32 System V Application Binary Interface, x86-6...@googlegroups.com, gnu-...@sourceware.org
On Thu, Feb 21, 2019 at 2:30 PM Rui Ueyama <ru...@google.com> wrote:
> I can see that the 2-PLT scheme performs better in theory. That being said, I don't think I'm convinced that the design is better in practice if the expected advantage was not measured.

We went with better in theory in our design. We may not see performance
differences in practice in most cases. In some cases, PLT section can be
quite large:

libLLVM-7.0.1.so:
[11] .plt PROGBITS 0000000000658020 658020
043bd0 10 AX 0 0 16
[12] .plt.sec PROGBITS 000000000069bbf0 69bbf0
043bc0 10 AX 0 0 16

> I don't think I'm requesting a change to the spec at least at the moment. What I'm trying to do is to understand the rationale behind a choice of the spec before implementing it to our linker, lld. Even if there's no evidence that the 2-PLT scheme performs better than the 1-PLT scheme, we might still want to implement as the spec says, considering the cost of breaking ABI compatibility. But if we take the route, we'd like to document that fact as-is.

Sure. We'd like to get as many feedbacks and inputs as we can when we propose
ABI changes. We encourage you participate in future discussions.

--
H.J.
Reply all
Reply to author
Forward
0 new messages