Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Alternative CET ABI

291 views
Skip to first unread message

Florian Weimer

unread,
Jul 30, 2020, 12:02:04 PM7/30/20
to oss-se...@lists.openwall.com, x86-6...@googlegroups.com, kernel-h...@lists.openwall.com, Szabolcs Nagy
CET (and Arm BTI) restrict targets for indirect jumps and calls to
landing pads which start with specially-formatted NOP instruction
dedicated to this purpose (endrb64 in the x86-64 case).

The traditional way of implementing ELF on top of this is to have every
global function start with that NOP, and also use these NOPs in PLT
stubs in the main program (which may provide the canonical address of
functions, i.e. there address may be taken).

The downside of this approach is that all functions in the process
become available for execution, whether they are used in the original
program or not. (In principle, control flow integrity provides
reasonably efficient ways to counteract that, by keeping track of symbol
resolution and verifying flags at the start of critical functions, but
we do not have automated support for that today, and there are some open
issues about complex call graphs.)

CET has a NOTRACK prefix for indirect jumps/and calls. It asserts that
the jump target address is trusted and disables the control flow
integrity check. It is expected to be used with jump tables and the
like, in conjunction with RELRO (so that the address has been loaded
from read-only memory).

I think this also provides support for a completely different ABI, where
global functions are not automatically addressable. It depends on
BIND_NOW and RELRO, for a read-only GOT.

First of all, it needs new relocation types that tell the static link
editor which symbol references are address-significant. Generally,
function addresses which end up in RELRO data only are not
address-significant if they are used immediately in call instructions
(without indirection of any form through writable memory). This means
that direct calls do not have address significance. For vtables, it
depends on how they are used; their function addresses probably need to
be treated conservatively as address-significant (because the vtable
pointer is in writable memory; at least for C++ vtables, the address of
a virtual member function is not significant).

Functions no longer start with the ENDBR64 prefix. Instead, the link
editor produces a PLT entry with an ENDBR64 prefix if it detects any
address-significant relocation for it. The PLT entry performs a NOTRACK
jump to the target address. This assumes that the target address is
subject to RELRO, of course, so that redirection is not possible.
Without address-significant relocations, the link editor produces a PLT
entry without the ENDBR64 prefix (but still with the NOTRACK jump), or
perhaps no PLT entry at all.

The net effect is that only functions which have their address taken in
the original program can be called through indirect function calls. For
example, this means that the system function in libc is usually dormant,
and cannot be reached, even if an attacker can cause the process to call
arbitrary functions with an arbitrary string argument. The reason is
that the system function lacks the ENDBR64 prefix, and all PLT entries
calling it also lack it.

dlopen'ing a shared object which has a address-significant relocation
against a function is not a problem under this model. Either there
already was an address-significant relocation before, then the function
already has a canonical address, and that can be used. Or there was
not, then the just-loaded PLT entry (which as an ENDBR64 prefix)
provides the canonical address function.

To support dlsym, each global function definition would have a separate
ENDBR64-enabled PLT/GOT slot for that, with the GOT slot only filled in
at the time of the dlsym call (with mprotect calls around that, with
some hand-waving required these can never fail). This is probably the
most awkward part about all this. Alternatively, these stubs could also
be generated at run time, from a pre-computed code page.

Obviously, it is too late for that now for x86-64, but maybe someone
else gets a chance to try this.

Thanks,
Florian

Florian Weimer

unread,
Jul 30, 2020, 12:54:59 PM7/30/20
to Jann Horn, oss-se...@lists.openwall.com, x86-6...@googlegroups.com, Kernel Hardening, Szabolcs Nagy
* Jann Horn:

> On Thu, Jul 30, 2020 at 6:02 PM Florian Weimer <fwe...@redhat.com> wrote:
>> Functions no longer start with the ENDBR64 prefix. Instead, the link
>> editor produces a PLT entry with an ENDBR64 prefix if it detects any
>> address-significant relocation for it. The PLT entry performs a NOTRACK
>> jump to the target address. This assumes that the target address is
>> subject to RELRO, of course, so that redirection is not possible.
>> Without address-significant relocations, the link editor produces a PLT
>> entry without the ENDBR64 prefix (but still with the NOTRACK jump), or
>> perhaps no PLT entry at all.
>
> How would this interact with function pointer comparisons? As in, if
> library A exports a function func1 without referencing it, and
> libraries B and C both take references to func1, would they end up
> with different function pointers (pointing to their respective PLT
> entries)?

Same as today. ELF already deals with this by picking one canonical
function address per process.

Some targets already need PLTs for inter-DSO calls, so the problem is
not new. It happens even on x86 because the main program can refer to
its PLT stubs without run-time relocations, so those determine the
canonical address of those functions, and not the actual implementation
in a shared object.

> Would this mean that the behavior of a program that compares
> function pointers obtained through different shared libraries might
> change?

Hopefully not, because that would break things quite horribly (as it's
sometimes possible to observe if the RTLD_DEEPBIND flag is used).

Both the canonicalization and the fact in order to observe the function
pointer, you need to take its address should take care of this.

> I guess you could maybe canonicalize function pointers somehow, but
> that'd probably at least break dlclose(), right?

Ahh, dlclose. I think in this case, my idea to generate a PLT stub
locally in the address-generating DSO will not work because the
canonical address must survive dlclose if it refers to another DSO.
There are two ways to deal with this: do not unload the PLT stub until
the target DSO is also unloaded (but make sure that the DSO can be
reloaded at a different address; probably not worth the complexity),
or use the dlsym hack I sketched for regular symbol binding as well.
Even more room for experiments, I guess.

Thanks,
Florian

H.J. Lu

unread,
Jul 30, 2020, 1:14:47 PM7/30/20
to Florian Weimer, Jann Horn, oss-se...@lists.openwall.com, x86-64-abi, Kernel Hardening, Szabolcs Nagy
FWIW, we can introduce a different CET PLT as long as it is compatible
with the past, current and future binaries.

--
H.J.

mas...@google.com

unread,
Dec 22, 2022, 5:29:11 PM12/22/22
to X86-64 System V Application Binary Interface
(I clicked "Reply all" on groups.google.com/g/x86-64-abi/c/iQWEW-iW8DQ , but I suspect other mailing lists will not get a copy).

Multiple folks are interested in a fine-grained scheme which skips most non-address-taken functions.

-fsanitize=kcfi is the new CFI scheme used in the Linux kernel (replacing -fsanitize=cfi). There is a recent discussion about how to skips non-address-taken functions: https://github.com/ClangBuiltLinux/linux/issues/1737

In llvm-project, there is some LTO complexity (I documented a bit in https://maskray.me/blog/2022-12-18-control-flow-integrity). I think GCC will face similar issues.

---

Another note: some large x86-64 executables are facing relocation overflow pressure but don't want to switch to a medium code model.
Range extension thunks may be a future direction. If don't conservatively mark local linkage functions, we need to use NOTRACK.

Stephen Röttger

unread,
Sep 23, 2024, 4:25:36 AM9/23/24
to X86-64 System V Application Binary Interface
I would like to raise this proposal again. In my opinion, we need something like this for IBT to be effective.

Just last weekend, there was a CET bypass challenge at a CTF competition where players had to exploit a stack buffer overflow.
It gave them control over the stack memory and a single function pointer call.
This was only possible to solve because of all the unnecessary endbr64 instructions in the process and the competing teams found multiple different solutions.

If there's no better solution, couldn't the whole DSO just be kept alive?
I.e. looking at the glibc code (mark_nodelete() in dl-lookup.c), there seem to be other cases where libraries can't get unloaded anymore.

> To support dlsym, each global function definition would have a separate
> ENDBR64-enabled PLT/GOT slot for that, with the GOT slot only filled in
> at the time of the dlsym call (with mprotect calls around that, with
> some hand-waving required these can never fail). This is probably the
> most awkward part about all this. Alternatively, these stubs could also
> be generated at run time, from a pre-computed code page.

If such entries would be needed for all functions, could this be the mechanism for the regular case too?
I.e. the runtime loader could fill these in at process startup time?

Florian Weimer

unread,
Sep 23, 2024, 6:21:10 AM9/23/24
to 'Stephen Röttger' via X86-64 System V Application Binary Interface, Stephen Röttger
* 'Stephen Röttger' via X86-64 System V Application Binary Interface:

>> Ahh, dlclose. I think in this case, my idea to generate a PLT stub
>> locally in the address-generating DSO will not work because the
>> canonical address must survive dlclose if it refers to another DSO.
>> There are two ways to deal with this: do not unload the PLT stub until
>> the target DSO is also unloaded (but make sure that the DSO can be
>> reloaded at a different address; probably not worth the complexity),
>> or use the dlsym hack I sketched for regular symbol binding as well.
>> Even more room for experiments, I guess.
>
> If there's no better solution, couldn't the whole DSO just be kept
> alive? I.e. looking at the glibc code (mark_nodelete() in
> dl-lookup.c), there seem to be other cases where libraries can't get
> unloaded anymore.

We can still unload them (call their destructors etc.), but we can't
unmap some of the code and data in it.

>> To support dlsym, each global function definition would have a separate
>> ENDBR64-enabled PLT/GOT slot for that, with the GOT slot only filled in
>> at the time of the dlsym call (with mprotect calls around that, with
>> some hand-waving required these can never fail). This is probably the
>> most awkward part about all this. Alternatively, these stubs could also
>> be generated at run time, from a pre-computed code page.
>
> If such entries would be needed for all functions, could this be the
> mechanism for the regular case too? I.e. the runtime loader could
> fill these in at process startup time?

The security folks generally frown upon run-time code generation
(although we could make this one work without code generation because
the trampolines are so regular). I'm not sure that run-time generation
of trampolines for functions where it is known at a compile time that
they have their address taken is a significant simplification.
Preserving the information which relocations against function symbols
are address-significant and which aren't probably needs some
cross-toolchain work.

Thanks,
Florian

Stephen Röttger

unread,
Sep 23, 2024, 7:56:31 AM9/23/24
to Florian Weimer, 'Stephen Röttger' via X86-64 System V Application Binary Interface
If this would look like the plt, it should work without run-time code generation.
So for example the libc would have an ibt.plt with an entry for every exported function:
```
system:
  endbr64
  notrack jmp [system.got.ibt]
```
and then the system.got.ibt value would be NULL by default and only filled in if `system` is address-taken.
The same approach should work for dlsym() and it should be compatible with dlclose since the entries live in the target library.
 
Preserving the information which relocations against function symbols
are address-significant and which aren't probably needs some
cross-toolchain work.

Thanks,
Florian

--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/87zfnywu0i.fsf%40oldenburg.str.redhat.com.

Stephen Röttger

unread,
Oct 2, 2024, 11:13:22 AM10/2/24
to Florian Weimer, 'Stephen Röttger' via X86-64 System V Application Binary Interface
> If this would look like the plt, it should work without run-time code generation.
> So for example the libc would have an ibt.plt with an entry for every exported function:
> ```
> system:
> endbr64
> notrack jmp [system.got.ibt]
> ```
> and then the system.got.ibt value would be NULL by default and only filled in if `system` is address-taken.
> The same approach should work for dlsym() and it should be compatible with dlclose since the entries live in the target library.


> Ahh, dlclose. I think in this case, my idea to generate a PLT stub
> locally in the address-generating DSO will not work because the
> canonical address must survive dlclose if it refers to another DSO.
> There are two ways to deal with this: do not unload the PLT stub until
> the target DSO is also unloaded (but make sure that the DSO can be
> reloaded at a different address; probably not worth the complexity),
> or use the dlsym hack I sketched for regular symbol binding as well.
> Even more room for experiments, I guess.

After re-reading your previous reply more carefully, I think this is
similar to what you were proposing here as an option, right?

I believe going this way would allow us to build this in a completely
backwards-compatible way.
So considering the following setup:
* the link editor adds PLT entries for all global functions defined in the DSO
* the plt entries use endbr and notrack
* the symbols are rewritten to point to the PLT entries instead
* the GOT entries are filled in with relative relocations
For an ld.so that is not aware of the new format, this should just
work and support the current IBT ABI. Just that the function symbols
now point to PLT entries instead that contain the endbr instruction.

An ld.so that is aware of it, can now improve the security properties
by removing most of the GOT entries.
They can be removed, if:
* there’s no relocation to it at all
* all relocations are not address-significant and used with notrack,
at which point the loader can resolve it to the actual function

> First of all, it needs new relocation types that tell the static link editor which symbol references are address-significant.

Can JUMP_SLOT relocations be address-significant? Maybe this plus a
bit in the ELF saying JUMP_SLOT entries are used with notrack would
work?
Reply all
Reply to author
Forward
0 new messages