Proposal to add two new IBT-enabled PLT formats

286 views
Skip to first unread message

Rui Ueyama

unread,
Jun 3, 2019, 9:34:26 AM6/3/19
to x86-6...@googlegroups.com
Hello,

This is a proposal to define two new PLT formats for the Intel IBT.
The aim of this proposal is to provide an option to use a simplified
ABI and to reduce the amount of space occupied by the PLT by half.

Background:
Intel introduced a new extension called Intel CET (Control-flow
Enforcement Technology) to their x86/x86-64 instruction sets to
mitigate ROP or JOP attacks. Note that there's no actual processors
supporting CET in the market yet, only the spec is available at the
moment. CET introduces new instructions, ENDBR{32,64}, to explicitly
mark the locations where indirect jump instructions (such
register-indirect calls or jumps) can jump to. If CET is enabled, a
processor raises an exception if an indirect jump target doesn't start
with an endbr instruction. This makes it very difficult to transfer
the control to a middle of a function and use it as a "gadget",
preventing ROP or JOP. This feature is called IBT (Indirect Branch
Tracking).

With IBT, you need to make functions to start with endbr, because
otherwise you can't call functions via function pointers. PLT entries,
which are generated by linkers, act as entry points to (possibly)
externally-defined functions. PLT entries must start with endbr for
the same reason as the functions do. We couldn't simply add an endbr
at the beginning of each PLT, because the PLT is 16-bytes long on
x86/x86-64, and there's no spare bytes in the PLT entry.

Note that a PLT entry can be divided into two parts. The first part is
an indirect jump instruction that reads an address from GOTPLT. The
second part is an instruction sequence to resolve an external symbol
name and set its address to a corresponding GOTPLT entry, so that
subsequent PLT calls can jump straight to their corresponding
functions.


Current status and my observations:
The current x86 psABI defines a PLT entry for IBT. The scheme defined
in the psABI is so-called the 2-PLT scheme as it is using two separate
PLT sections. The ".plt.sec" section contains the first parts of PLT
entries (i.e. indirect jump instructions that reads jump targets from
GOTPLT). The ".plt" section contains the second parts (i.e. the code
to resolve external names lazily).

The 2-PLT scheme is not a straightforward extension to the existing
PLT format, as it would have been more straightforward to keep using
the single PLT section and extend its entry size from 16 bytes to 32
bytes.

The justification of the 2-PLT scheme is a better memory locality. The
"second" PLT part runs only once for each PLT entry, so it shouldn't
be hot, while the "first" part runs every time you call an external
function. In theory it makes sense, but looks like the claim is not
proved by anyone. I made a microbenchmark to hammer PLT entries to
compare 16 bytes PLT and 32 bytes PLT and didn't find any difference.
It looks like it is difficult to find a difference between the 2-PLT
scheme and a single PLT with 32 bytes entries, while the former is
much simpler from the perspective of the ABI.

The second thing I noticed is that there's a room for an improvement
in terms of PLT size by not emitting the "second" part of the PLT. You
can disable the PLT lazy symbol resolution by giving "-z now" option
to a linker. If the flag is given, a linker sets a flag in an ELF
header to tell the loader to resolve all symbols in GOTPLT at load
time. If the flag is set, only the "first" part of an PLT entry is
executed, as all GOTPLT entries are pre-filled by the loader before
the control is transferred to a user program. So, if the flag is set,
the "second" PLT part is redundant.

Many security-aware applications passes "-z now" to reduce the number
of moving parts that can affect the control flow. Therefore, the
IBT-aware PLT with "-z now" is not an uncommon combination. Perhaps,
it can be more common combination than the IBT-aware PLT without "-z
now".


Proposal:
Given the above observations, I'd like to propose the following two
new IBT-enabled PLT formats.

1. 32 bytes IBT-enabled PLT

This is a straightforward extension to the existing non-IBT-enabled
PLT so that each PLT entry is not 16 bytes long but 32 bytes long.

endbr
jump *external_func_name@GOT
endbr
pushq relocation_offset
jmp plt0

The above instruction sequence is just a concatenation of the first
and the second PLT of the 2-PLT scheme. In this proposed scheme, the
above instruction sequence is written to .plt, and there's no .plt.sec
section.

2. 16 bytes IBT-enabled PLT for "-z now"

In this scheme, ".plt.sec" in the 2-PLT scheme is renamed ".plt", and
".plt" in the 2-PLT scheme is simply omitted, as we don't need any
code for lazy PLT resolution.

Any comments?

Thanks,
Rui

Florian Weimer

unread,
Jun 3, 2019, 9:41:43 AM6/3/19
to 'Rui Ueyama' via X86-64 System V Application Binary Interface, Rui Ueyama
* via:

> Many security-aware applications passes "-z now" to reduce the number
> of moving parts that can affect the control flow. Therefore, the
> IBT-aware PLT with "-z now" is not an uncommon combination.

Isn't the second PLT needed to implement auditing, even with BIND_NOW?

Thanks,
Florian

Rui Ueyama

unread,
Jun 3, 2019, 8:46:27 PM6/3/19
to Florian Weimer, 'Rui Ueyama' via X86-64 System V Application Binary Interface
Could you elaborate?

I haven't used auditing, but it looks like the hook for auditing lazy
PLT binding is implemented in the loader's function to resolve dynamic
symbols. Does that depend on the actual PLT code sequence?

H.J. Lu

unread,
Jun 4, 2019, 3:11:18 PM6/4/19
to Rui Ueyama, x86-64-abi
PLT with 16-byte entry needs fewer pages:

1. A single page can contain more PLT entries.
2. A single page may contain both branch instruction to PLT entry
as well as the PLT entry.

I created a micro benchmark to compare:

1. A single page contains both branch instruction to PLT entry as well as
the PLT entry.
2. Branch instruction to PLT entry and the PLT entry are in different pages.

https://gitlab.com/x86-benchmarks/microbenchmark/tree/pltsize

A single page is slightly faster.

Its performance impact may not be visible to a single program. But they
can improve overall system performance if processes require fewer pages.

--
H.J.

Ali Bahrami

unread,
Jun 4, 2019, 3:45:59 PM6/4/19
to x86-6...@googlegroups.com
On 6/4/19 1:10 PM, H.J. Lu wrote:
> Its performance impact may not be visible to a single program. But they
> can improve overall system performance if processes require fewer pages.

Just a generic observation from a bystander who probably won't
have to implement this...

Micro benchmarks often reveal differences that you can't feel
in real world systems. The added complexity on the other hand,
is a tax for everyone. You'd have to have a lot of objects that
each had an enormous number of PLTs for this to be measurable.
Can you show significant speedups on a real world system, and
are those speedups large enough to be interesting on a
realistic workload?

I'm possibly off the mark, but in most applications, the
number of pages used for PLTs amount to a tiny rounding
error on the number of pages in the process. A 4096 byte
page can hold 128 32-byte PLTS. Most objects have a fairly
modest number of exported symbols, and likely don't use many
pages for PLTs. How useful is it to cut that small number
in half? Conversely, a smaller number of objects may have
a gigantic number of PLTs, but are likely not mapped into many
processes. Again, how interesting is it to worry about how
many pages these outliers use for PLTs, when the number
is probably just a tiny fraction of their overall memory use?

- Ali

H.J. Lu

unread,
Jun 4, 2019, 6:05:54 PM6/4/19
to Ali Bahrami, x86-64-abi
I took a look at /usr/lib on my machine. There are 651 ELF files.
50 of them have PLT > 4K. Some of them are

./systemd/systemd-networkd: large PLT: 0x0017f0
./systemd/systemd: large PLT: 0x0040c0
./systemd/libsystemd-shared-239.so: large PLT: 0x007e00
./systemd/systemd-udevd: large PLT: 0x0019e0
./systemd/systemd-logind: large PLT: 0x001250
./systemd/systemd-resolved: large PLT: 0x0016d0
./polkit-1/polkitd: large PLT: 0x001750

Under /usr/lib64, there are 759 files with PLT size > 4KB and
3802 files with PLT size <= 4KB.

These have 16-byte PLT entries. They will double in size with
32-byte entries. They may not sound much. But they can have
visible impact on system performance.

--
H.J.

Ali Bahrami

unread,
Jun 4, 2019, 8:30:27 PM6/4/19
to x86-6...@googlegroups.com
On 6/4/19 4:05 PM, H.J. Lu wrote:
> I took a look at /usr/lib on my machine. There are 651 ELF files.
> 50 of them have PLT > 4K. Some of them are
>
> ./systemd/systemd-networkd: large PLT: 0x0017f0
> ./systemd/systemd: large PLT: 0x0040c0
> ./systemd/libsystemd-shared-239.so: large PLT: 0x007e00
> ./systemd/systemd-udevd: large PLT: 0x0019e0
> ./systemd/systemd-logind: large PLT: 0x001250
> ./systemd/systemd-resolved: large PLT: 0x0016d0
> ./polkit-1/polkitd: large PLT: 0x001750
>
> Under /usr/lib64, there are 759 files with PLT size > 4KB and
> 3802 files with PLT size <= 4KB.
>
> These have 16-byte PLT entries. They will double in size with
> 32-byte entries. They may not sound much. But they can have
> visible impact on system performance.


Thanks for the numbers. I still don't see how this can give
a visible impact. Please humor me a bit, while I explain.

This is a bit like Amdahl's law. The effect from reducing
PLT size is limited by the overall contribution that the PLT
represents. As a percentage of memory used, PLTs are
tiny. You're proposing to cut a tiny number in half. The
impact of that will also be tiny.

To put some numbers to it, the 64-bit libc on my desktop has
a PLT of 0x350 bytes, and the text/data mappings altogether
come to roughly 2.7MB. Ignoring the impact of stack and heap,
that comes to .03%. With stack and heap, it's less. If you
cut that in half, it just doesn't mean much.

You have a lot of objects on disk. Let's make a back of
the envelope estimation, assuming that all of them are mapped
simultaneously, and that all 4561 objects have a 16K PLT.
That's 72976K (73M).

PLTs on x86 are readonly, and shared from the text segment for
all processes, so you only pay once for every object no matter
how many processes are using it. Hence, that 73M isn't going
to go up with the number of processes, and can only happen
if all 4561 objects are in simultaneous use.

73M sounds like a lot, but my cheap PC has 64GB, so it
really isn't a big number, relatively speaking. We'll
never get there though, based on the above libc example,
since the %99.97% of things that are not PLT will crush
the system long before we get anywhere near it.

A problem with this calculation is that it assumes
that all of those 4561 objects will ever be mapped
simultaneously. That's really unlikely. I'll guess that
far more likely is that typical systems have 10's, or low
hundreds, of current objects. I think that 73MB estimate
is probably an order of magnitude too high. 7.3 MB is
.01% of the 64GB on my machine. Even on a 1GB machine,
it's only .73%, not that such a machine could load any
useful subset of those objects.

Sorry, but I still don't see how this can have a visible
impact. I vote for simpler PLTs. [*]

- Ali

[*] You're within your rights to ignore my vote, as I
don't work on the GNU toolchain. I think these
observations are generally true though.

Xiang1 Zhang

unread,
Jun 4, 2019, 8:47:51 PM6/4/19
to X86-64 System V Application Binary Interface
The most important to me is that, Currently, we tend to use GNU-tools (instead of LLVM's) to deal with program built by LLVM,  if LLVM not keep same with GNU PLT scheme, we may have risk to use these tools, i.g gdb.

Roland McGrath

unread,
Jun 4, 2019, 9:35:46 PM6/4/19
to Rui Ueyama, Florian Weimer, 'Rui Ueyama' via X86-64 System V Application Binary Interface
Audit doesn't depend on the PLT implementation details. But it does depend on all objects having full PLT entries regardless of BIND_NOW.  When auditing is enabled at runtime the dynamic linker makes every PLT call take the "lazy path" through the dynamic linker and the GOT slot is never pointed directly at the target so the short-circuit PLT path is never taken.  On systems that don't support auditing (AFAIK only glibc's and Solaris's dynamic linkers do), or perhaps for particular modules even for glibc/Solaris systems, it would be worthwhile to reduce the PLT sizes by eliminating the lazy support portions altogether when in BIND_NOW mode (which is the default for some systems such as Fuchsia) regardless of IBT support (and even for other machines, where the lazy PLT schemes are all pretty similar).  Smaller PLTs will improve locality, i-cache pressure, binary size.  With -z now and no audit support, the majority of all PLT code is dead code today.  So I think it would be worthwhile.  But for compatibility with existing expectations it would have to be a separate switch rather than implicit with -z now.  Such systems can also use -fno-plt code generation to eliminate PLT entries entirely, but that has its own cans of worms and even when BIND_NOW behavior is the only behavior available (as on Fuchsia today) we're likely to have PLT entries for a long time to come.

There is some risk in changing PLT implementation details wrt tools like debuggers.  But I don't think we should let that hold us back.  Nothing should be hard-coding the size or contents of PLT entries.  The size is already indicated by sh_entsize on the generated sections, which debuggers and the like can use for purposes such as synthetic foo@plt symbols.  The GNU linkers generate CFI for the PLTs so there's no need to hard-code assumptions about PLT implementation for unwinding purposes (I'm not sure off hand if lld does this yet, but if not it should be made to rather than taking this as a reason that the PLT format can't be changed).  If there are other cases of things relying on PLT implementation details, we should get those fixed and find better ways to do whatever they are doing.

--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/CAJENXgsjJZF47Q2ai1JTQwdOGP60Vq%3D2AwtKegk_7%3De19ZtqYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

H.J. Lu

unread,
Jun 5, 2019, 11:17:36 AM6/5/19
to Ali Bahrami, x86-64-abi
The choices are

1. A single 32-byte PLT entry. The first 16 half is only used
one time and the second half is used every time.
2. 2 PLT tables with 16-byte entry. The first table is only used
one time and the second table is used every time.

The total memory sizes of 2 choices are the same. The impact
is on i-cache, not RAM. My processor has 32 KB i-cache. Cut
hot PLT size by half improves i-cache utilization.

--
H.J.

Ali Bahrami

unread,
Jun 5, 2019, 11:41:22 AM6/5/19
to x86-6...@googlegroups.com
On 6/5/19 9:16 AM, H.J. Lu wrote:
> The choices are
>
> 1. A single 32-byte PLT entry. The first 16 half is only used
> one time and the second half is used every time.
> 2. 2 PLT tables with 16-byte entry. The first table is only used
> one time and the second table is used every time. >
> The total memory sizes of 2 choices are the same. The impact
> is on i-cache, not RAM. My processor has 32 KB i-cache. Cut
> hot PLT size by half improves i-cache utilization.
>

Perhaps, if the i-cache is dominated by PLT contention.
Is it, and if so, is it significant compared to other
things contending for the same cache? If so, how have
we not felt it in the previous decades, on much smaller
and less capable hardware? Isn't that the definition of
not visible?

I guess I've made my point by now, so I'll bow out of
this and leave it to others to voice their opinions.

Thanks.

- Ali

H.J. Lu

unread,
Jun 5, 2019, 11:59:38 AM6/5/19
to Ali Bahrami, x86-64-abi
On Wed, Jun 5, 2019 at 8:41 AM Ali Bahrami <Ali.B...@oracle.com> wrote:
>
> On 6/5/19 9:16 AM, H.J. Lu wrote:
> > The choices are
> >
> > 1. A single 32-byte PLT entry. The first 16 half is only used
> > one time and the second half is used every time.
> > 2. 2 PLT tables with 16-byte entry. The first table is only used
> > one time and the second table is used every time. >
> > The total memory sizes of 2 choices are the same. The impact
> > is on i-cache, not RAM. My processor has 32 KB i-cache. Cut
> > hot PLT size by half improves i-cache utilization.
> >
>
> Perhaps, if the i-cache is dominated by PLT contention.
> Is it, and if so, is it significant compared to other
> things contending for the same cache? If so, how have
> we not felt it in the previous decades, on much smaller
> and less capable hardware? Isn't that the definition of
> not visible?

When we don't have a choice, there is nothing we can do.
When we do have a choice, we should not make CET PLT
i-cache utilization worse than non-CET PLT. With the current
2 PLT scheme, CET PLT i-cache utilization is similar to non-CET
PLT.

--
H.J.

Ali Bahrami

unread,
Jun 5, 2019, 12:12:49 PM6/5/19
to H.J. Lu, x86-64-abi
On 6/5/19 9:59 AM, H.J. Lu wrote:
> When we don't have a choice, there is nothing we can do.

Not so --- there is always choice. There's no way that
we would have collectively tolerated something that wasn't
working well, for 30+ years.

This stuff all dates from absolutely ancient and limited
hardware, the i386 and such. It was apparently not a problem
then, and I'm struggling to see how it is now, on real
workloads, rather than in micro-benchmarks.

- Ali

Michael Matz

unread,
Jun 5, 2019, 12:18:51 PM6/5/19
to Rui Ueyama, x86-6...@googlegroups.com
Hi,

On Mon, 3 Jun 2019, 'Rui Ueyama' via X86-64 System V Application Binary Interface wrote:

> Proposal:
> Given the above observations, I'd like to propose the following two
> new IBT-enabled PLT formats.
>
> 1. 32 bytes IBT-enabled PLT
>
> This is a straightforward extension to the existing non-IBT-enabled
> PLT so that each PLT entry is not 16 bytes long but 32 bytes long.
>
> endbr
> jump *external_func_name@GOT
> endbr
> pushq relocation_offset
> jmp plt0
>
> The above instruction sequence is just a concatenation of the first
> and the second PLT of the 2-PLT scheme. In this proposed scheme, the
> above instruction sequence is written to .plt, and there's no .plt.sec
> section.

That doesn't seem to be any improvement over the 2-plt scheme, except
perhaps for being a more obvious extension. But given that the more
complicated scheme is already defined and implemented (and so, if it
brings any actual improvements over the less complicated scheme doesn't
matter) I don't see value in adding _another_ PLT scheme. If we'd start
from a clean slate, perhaps, but now? Please, no. We don't just change
PLT layouts because we can.

(FWIW, I do think the 2-PLT scheme is overengineered and didn't like the
original design for lack of proof of speed advantages; but alas, now we
have it, so we should stick with it if there are no wondrous advantages
with still newer schemes; and the bar should be high).

> 2. 16 bytes IBT-enabled PLT for "-z now"
>
> In this scheme, ".plt.sec" in the 2-PLT scheme is renamed ".plt", and
> ".plt" in the 2-PLT scheme is simply omitted, as we don't need any
> code for lazy PLT resolution.
>
> Any comments?

Well, yes, that seems an obvious extension, but as Roland says, not under
-z now, but controled by a different option.


Ciao,
Michael.

Michael Matz

unread,
Jun 5, 2019, 12:20:40 PM6/5/19
to Ali Bahrami, H.J. Lu, x86-64-abi
Hi,
As I wrote elsewhere, the 2-PLT scheme is overengineered and I agree with
you that the simpler scheme would have been just as good speedwise, and
be, well ... simpler. But we have it now, and adding still another option
that isn't clearly better doesn't serve anyone.


Ciao,
Michael.

Carlos O'Donell

unread,
Jul 11, 2019, 11:20:18 AM7/11/19
to X86-64 System V Application Binary Interface
I agree that adding yet another PLT scheme doesn't serve the tools
community as a whole.

If we add another scheme it must be meaningfully better in some way.

I *like* Rui's recommended "2. 16 bytes IBT-enabled PLT for "-z now", but
this conflates the issue of removing audit support and disabling lazy
binding. I agree with Roland McGrath that a new option, other than "-z now"
should be used to indicate the removal of audit support. Audit is important
for a key set of users and customers, so compatibility is important there.
While the compiler "-fno-plt" makes the call site do a direct GOT-call, that
relocation can still be used to synthesize unused PLT slots for auditing,
and so "-fno-plt" is not directly indicating of "no audit support."

While the 2-PLT scheme might be overengineered, and that's a matter of
personal taste and subjectivity, it conceptually has the right set of features
required for a solution.

Lastly, all of us are responsible for working together to design and discuss
current and future additions to the ABI, and that includes making time to
discuss these issues as they come up on this list.

Do we have a consensus on Rui's proposal?

Cheers,
Carlos.

Annita Zhang

unread,
Oct 9, 2019, 10:34:34 PM10/9/19
to X86-64 System V Application Binary Interface

This CET ABI issue was raised during glibc discussion in GCC Cauldron. The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a better option, but because it’s been finalized for 2 years (in 2017). And the implementation has been included in Linux distributions already. There's no obvious reason to change it w/o a pretty better solution.

 

On the other hand, Intel wants to support it in LLVM lld anyway. But we don’t want to force lld developers to do what they don’t want. So now the ball is at lld side. If lld would like to follow the CET ABI, we’re willing to submit the corresponding lld patches. If not, we will submit the lld patches supporting 32-byte PLTs. The disadvantage of the latter is ABI is incompatible between GCC and LLVM for CET. It may cause potential issues in some tools which has dependence on PLT size and layout. But we don’t have known samples yet. From our estimation, the risk may be low.

 

I’m looking forward to hearing your opinions and moving it forward.


Thanks,

Annita

Michael Matz

unread,
Oct 10, 2019, 8:43:41 AM10/10/19
to Annita Zhang, X86-64 System V Application Binary Interface
Hi,

On Wed, 9 Oct 2019, Annita Zhang wrote:

> This CET ABI issue was raised during glibc discussion in GCC Cauldron.
> The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a
> better option, but because it’s been finalized for 2 years (in 2017).
> And the implementation has been included in Linux distributions already.
> There's no obvious reason to change it w/o a pretty better solution.
>
> On the other hand, Intel wants to support it in LLVM lld anyway. But we
> don’t want to force lld developers to do what they don’t want. So now
> the ball is at lld side. If lld would like to follow the CET ABI, we’re
> willing to submit the corresponding lld patches. If not, we will submit
> the lld patches supporting 32-byte PLTs. The disadvantage of the latter
> is ABI is incompatible between GCC and LLVM for CET. It may cause
> potential issues in some tools which has dependence on PLT size and
> layout. But we don’t have known samples yet. From our estimation, the
> risk may be low.

If you're willing to change the CET ABI for llvm, then it simply means
that the claim that we can't change the PLT format because "it’s been
finalized for 2 years" is wrong. That implies we can change it to
something nice also on the binutils side, for everyone. So, is that the
case?


Ciao,
Michael.

Annita Zhang

unread,
Oct 10, 2019, 11:35:36 AM10/10/19
to X86-64 System V Application Binary Interface


On Thursday, October 10, 2019 at 8:43:41 PM UTC+8, Michael Matz wrote:
Hi,

On Wed, 9 Oct 2019, Annita Zhang wrote:

> This CET ABI issue was raised during glibc discussion in GCC Cauldron.
> The conclusion was CET ABI wouldn’t be changed. It’s not because it’s a
> better option, but because it’s been finalized for 2 years (in 2017).
> And the implementation has been included in Linux distributions already.
> There's no obvious reason to change it w/o a pretty better solution.
>
> On the other hand, Intel wants to support it in LLVM lld anyway. But we
> don’t want to force lld developers to do what they don’t want. So now
> the ball is at lld side. If lld would like to follow the CET ABI, we’re
> willing to submit the corresponding lld patches. If not, we will submit
> the lld patches supporting 32-byte PLTs. The disadvantage of the latter
> is ABI is incompatible between GCC and LLVM for CET. It may cause
> potential issues in some tools which has dependence on PLT size and
> layout. But we don’t have known samples yet. From our estimation, the
> risk may be low.

If you're willing to change the CET ABI for llvm, then it simply means
that the claim that we can't change the PLT format because "it’s been
finalized for 2 years" is wrong.  That implies we can change it to
something nice also on the binutils side, for everyone.  So, is that the
case?


Ciao,
Michael.

No, the conclusion was the CET ABI wouldn't be changed for LLVM. And the binutils won't change either.
But Intel would like to support it in lld anyway. So if lld wants to use 32-byte PLT, we are ok to support it in a way which is incompatible to current ABI. We know it may have potential issues. But we don't want to be stuck here and need to move forward. 

Thanks,
Annita

Carlos O'Donell

unread,
Oct 10, 2019, 5:32:40 PM10/10/19
to Annita Zhang, X86-64 System V Application Binary Interface
Please consider that you are contributing to future ABI problems by
enabling llvm lld to have a non-standard CET ABI. It would be *better*
IMO if llvm lld never had CET support in the first place, rather than
a non-conforming CET implementation.

I urge caution here with respect to ABI's and point at the littered
past created by 32-bit ARM OABI, and 32-bit/64-bit MIPS, and the problems
this has caused their software ecosystems and the hardships faced by
those software developers over the years as ABIs were not well controlled.

Adding a non-conforming Intel CET ABI via a different PLT should at a minimum
provide a way to detect at runtime which PLT is in use to allow developer
tooling like gotcha or ltrace to adapt. If you don't consider these consequences
then we are not fully considering the cost of the alternate ABI. You now
need to mark it up, detect it, issue warnings at the static linker, issue
fatal errors in the dynamic loader (if required) etc.

While it might seem like you are enabling all of the ecosystem to support
Intel CET, doing so at the price of ABI deviation will cause the ecosystem
integration costs to increase and put at risk the value of those hardware
deployments with CET.

--
Cheers,
Carlos.

[1] LLNL's gotcha - https://github.com/LLNL/GOTCHA
[2] ltrace - https://gitlab.com/cespedes/ltrace

Xiang1 Zhang

unread,
Oct 11, 2019, 1:45:42 AM10/11/19
to X86-64 System V Application Binary Interface
Hi, Let's come to the simple:
1. The two 16-Bytes PLT (CET ABI) and one 32-Bytes PLT (LLD suggested at first) all works well. (works well, works well, works well, important thing better to say 3 times)
2. The key difference is just that one 32-Bytes PLT is easier to empliment in coding.
3. Intel want to implement it in anyone of them if LLD let us in.

That is very clear, I just hope we should not blocked by it.

BR
THS

Luo, Yuanke

unread,
Oct 11, 2019, 2:15:46 AM10/11/19
to Carlos O'Donell, Annita Zhang, X86-64 System V Application Binary Interface
From compatibility point of view, it may be better for lld community to apply 2nd PLT scheme, with the reality that it has been there for 2 years. But it is also acceptable if lld adopt 1 PLT solution. Actually, we encounter several ABI compatibility issues on llvm for 32-bit x86, and the issues last for couple of years and may still exit in the future.
So, we'd like to hear the voice from Rui and MaskRay.
Hi Rui and MaskRay,
Can you make decision for lld to choose one solution?

Thanks
Yuanke
--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/e00670aa-3982-a409-ff8b-44f8dd4bf405%40redhat.com.
Reply all
Reply to author
Forward
0 new messages