Alternative CET ABI

Florian Weimer

unread,

Jul 30, 2020, 12:02:04 PM7/30/20

to oss-se...@lists.openwall.com, x86-6...@googlegroups.com, kernel-h...@lists.openwall.com, Szabolcs Nagy

CET (and Arm BTI) restrict targets for indirect jumps and calls to
landing pads which start with specially-formatted NOP instruction
dedicated to this purpose (endrb64 in the x86-64 case).

The traditional way of implementing ELF on top of this is to have every
global function start with that NOP, and also use these NOPs in PLT
stubs in the main program (which may provide the canonical address of
functions, i.e. there address may be taken).

The downside of this approach is that all functions in the process
become available for execution, whether they are used in the original
program or not. (In principle, control flow integrity provides
reasonably efficient ways to counteract that, by keeping track of symbol
resolution and verifying flags at the start of critical functions, but
we do not have automated support for that today, and there are some open
issues about complex call graphs.)

CET has a NOTRACK prefix for indirect jumps/and calls. It asserts that
the jump target address is trusted and disables the control flow
integrity check. It is expected to be used with jump tables and the
like, in conjunction with RELRO (so that the address has been loaded
from read-only memory).

I think this also provides support for a completely different ABI, where
global functions are not automatically addressable. It depends on
BIND_NOW and RELRO, for a read-only GOT.

First of all, it needs new relocation types that tell the static link
editor which symbol references are address-significant. Generally,
function addresses which end up in RELRO data only are not
address-significant if they are used immediately in call instructions
(without indirection of any form through writable memory). This means
that direct calls do not have address significance. For vtables, it
depends on how they are used; their function addresses probably need to
be treated conservatively as address-significant (because the vtable
pointer is in writable memory; at least for C++ vtables, the address of
a virtual member function is not significant).

Functions no longer start with the ENDBR64 prefix. Instead, the link
editor produces a PLT entry with an ENDBR64 prefix if it detects any
address-significant relocation for it. The PLT entry performs a NOTRACK
jump to the target address. This assumes that the target address is
subject to RELRO, of course, so that redirection is not possible.
Without address-significant relocations, the link editor produces a PLT
entry without the ENDBR64 prefix (but still with the NOTRACK jump), or
perhaps no PLT entry at all.

The net effect is that only functions which have their address taken in
the original program can be called through indirect function calls. For
example, this means that the system function in libc is usually dormant,
and cannot be reached, even if an attacker can cause the process to call
arbitrary functions with an arbitrary string argument. The reason is
that the system function lacks the ENDBR64 prefix, and all PLT entries
calling it also lack it.

dlopen'ing a shared object which has a address-significant relocation
against a function is not a problem under this model. Either there
already was an address-significant relocation before, then the function
already has a canonical address, and that can be used. Or there was
not, then the just-loaded PLT entry (which as an ENDBR64 prefix)
provides the canonical address function.

To support dlsym, each global function definition would have a separate
ENDBR64-enabled PLT/GOT slot for that, with the GOT slot only filled in
at the time of the dlsym call (with mprotect calls around that, with
some hand-waving required these can never fail). This is probably the
most awkward part about all this. Alternatively, these stubs could also
be generated at run time, from a pre-computed code page.

Obviously, it is too late for that now for x86-64, but maybe someone
else gets a chance to try this.

Thanks,
Florian

Florian Weimer

unread,

Jul 30, 2020, 12:54:59 PM7/30/20

to Jann Horn, oss-se...@lists.openwall.com, x86-6...@googlegroups.com, Kernel Hardening, Szabolcs Nagy

* Jann Horn:

> On Thu, Jul 30, 2020 at 6:02 PM Florian Weimer <fwe...@redhat.com> wrote:
>> Functions no longer start with the ENDBR64 prefix. Instead, the link
>> editor produces a PLT entry with an ENDBR64 prefix if it detects any
>> address-significant relocation for it. The PLT entry performs a NOTRACK
>> jump to the target address. This assumes that the target address is
>> subject to RELRO, of course, so that redirection is not possible.
>> Without address-significant relocations, the link editor produces a PLT
>> entry without the ENDBR64 prefix (but still with the NOTRACK jump), or
>> perhaps no PLT entry at all.
>

> How would this interact with function pointer comparisons? As in, if
> library A exports a function func1 without referencing it, and
> libraries B and C both take references to func1, would they end up
> with different function pointers (pointing to their respective PLT
> entries)?

Same as today. ELF already deals with this by picking one canonical
function address per process.

Some targets already need PLTs for inter-DSO calls, so the problem is
not new. It happens even on x86 because the main program can refer to
its PLT stubs without run-time relocations, so those determine the
canonical address of those functions, and not the actual implementation
in a shared object.

> Would this mean that the behavior of a program that compares
> function pointers obtained through different shared libraries might
> change?

Hopefully not, because that would break things quite horribly (as it's
sometimes possible to observe if the RTLD_DEEPBIND flag is used).

Both the canonicalization and the fact in order to observe the function
pointer, you need to take its address should take care of this.

> I guess you could maybe canonicalize function pointers somehow, but
> that'd probably at least break dlclose(), right?

Ahh, dlclose. I think in this case, my idea to generate a PLT stub
locally in the address-generating DSO will not work because the
canonical address must survive dlclose if it refers to another DSO.
There are two ways to deal with this: do not unload the PLT stub until
the target DSO is also unloaded (but make sure that the DSO can be
reloaded at a different address; probably not worth the complexity),
or use the dlsym hack I sketched for regular symbol binding as well.
Even more room for experiments, I guess.

Thanks,
Florian

H.J. Lu

unread,

Jul 30, 2020, 1:14:47 PM7/30/20

to Florian Weimer, Jann Horn, oss-se...@lists.openwall.com, x86-64-abi, Kernel Hardening, Szabolcs Nagy

FWIW, we can introduce a different CET PLT as long as it is compatible
with the past, current and future binaries.

--
H.J.

mas...@google.com

unread,

Dec 22, 2022, 5:29:11 PM12/22/22

to X86-64 System V Application Binary Interface

(I clicked "Reply all" on groups.google.com/g/x86-64-abi/c/iQWEW-iW8DQ , but I suspect other mailing lists will not get a copy).

Multiple folks are interested in a fine-grained scheme which skips most non-address-taken functions.

-fsanitize=kcfi is the new CFI scheme used in the Linux kernel (replacing -fsanitize=cfi). There is a recent discussion about how to skips non-address-taken functions: https://github.com/ClangBuiltLinux/linux/issues/1737

In llvm-project, there is some LTO complexity (I documented a bit in https://maskray.me/blog/2022-12-18-control-flow-integrity). I think GCC will face similar issues.

---

Another note: some large x86-64 executables are facing relocation overflow pressure but don't want to switch to a medium code model.

Range extension thunks may be a future direction. If don't conservatively mark local linkage functions, we need to use NOTRACK.

Reply all

Reply to author

Forward