gABI's role in microarchitecture targeting

55 views

Skip to first unread message

Gregory Szorc

unread,

Jan 28, 2022, 10:32:09 PM1/28/22

to Generic System V Application Binary Interface

Machine architectures often aren't set in stone: they evolve over time. e.g. x86-64 has gained SSE3/4, AVX/AVX2/AVX-512, AES-NI, and more since it was initially defined in 1999.

These machine architecture extensions are commonly referred to as "microarchitectures" and today gABI is largely ignorant about their existence and handling. Since I couldn't find any prior conversations, I'd like to start one around gABI's role in microarchitecture targeting.

# Why Does it Matter

Some gABI defined machine types can be highly variable. e.g. `e_machine == EM_X86_64` encompasses dozens of x86-64 microarchitectures existing over multiple decades.

For highly variable machine types like x86-64, this can cause problems. For the most part, today:

* Linkers will link a binary requiring a modern instruction/microarchitecture even if targeting an older microarchitecture. This defers an error from link time to run time. i.e. it allows developers to more easily ship bugs related to microarchitecture targeting that could have been caught at link time.

* Loaders will execute a binary as long as its ELF header metadata is deemed compatible (often simple comparisons on `e_machine` and similar fields). But as soon as an instruction not supported by the current machine is executed, a hardware fault/trap can occur. This often results in immediate program termination - if not a hardware crash - and/or a generic error message like `Illegal instruction`.

This current behavior is sub-optimal for a few reasons:

a. It doesn't follow the principle of failing fast - both at link and load time.
b. (Deferred) error messages are often generic and not easily actionable by end-users. Debugging the root cause requires technical skills not possessed by the average user and may even require arcane knowledge of compilers and packaging mechanisms to fully understand.
c. The lack of stronger microarchitecture compatibility mechanisms at the object file level arguably biases ecosystems to default to targeting older microarchitectures since this is safer, more reliable, and therefore easier to support.

(If you buy into "c," it follows that for widely deployed architectures like x86-64 the industry norm of targeting the ~original microarchitecture level can sacrifice a lot of performance. This performance inefficiency can translate to a lot of wasted money and power!)

I'm far from a domain expert in this space, but my naive thinking here is that if I could wave a magic wand and transform the software ecosystem, object files would be able to self-describe and/or validate their compatibility with microarchitecture levels so consumers could implement stronger validation around machine compatibility and the end-user experience related to microarchitecture mis-targeting could be improved.

Imagine a loader failing fast with e.g. `unable to load foo.so because it requires AVX-512 and the current machine doesn't support this x86-64 instruction set extension` versus an `Illegal instruction` that may occur "randomly" during program execution. Imagine getting a more descriptive error message right away instead of having to use a debugger to figure out who issued the faulty instruction. Or imagine this scenario not happening in the first place because the linker was smart enough to realize that a binary required e.g. AVX-512 and refused to link because the original x86-64 microarchitecture level was being targeted.

I feel like there's significant potential to improve the ergonomics around microarchitecture targeting by teaching object files to better identify and/or validate their microarchitecture compatibility. If adopted at scale, I think this has the potential to enable widely deployed and highly varied ecosystems like x86-64 to sooner adopt newer microarchitectures without as many risks as they have today. If nothing else, it should translate to better error messages and save countless hours debugging microarchitecture targeting bugs.

# Requests for Comments

I think microarchitecture ignorance is a major shortcoming of existing object file design because it undermines user experience outcomes and biases towards conservative targeting decisions, which undermines performance and efficiency.

What I'm not sure about is what role gABI has in better supporting microarchitectures, if any. But since gABI is arguably the most impactful place to "solve" this problem, I figured I'd start the conversation here and hear what people think.

The sections below detail a few independent ideas to shore up microarchitecture handling. These ideas aren't fully baked and I'm not an expert in this space or the group dynamics involved, so please pardon the ignorance.

# Expression of microarchitecture in ELF header

If the ELF header expressed microarchitecture levels, linkers, loaders, and similar tools in this space could continue to utilize a simple approach for screening binaries for compatibility.

Concretely, additional `e_machine` values could be defined for well-defined microarchitecture levels. e.g. the x86-64 psABI defined 3 additional microarchitecture levels in 2020 [1]: x86-64-v2, x86-64-v3, and x86-64-v4. Could those possibly warrant `EM_X86_64_V2`, `EM_X86_64_V3`, and `EM_X86_64_V4`?

Obviously introducing new `e_machine` values would be backwards incompatible and shouldn't be done lightly. And there seems to be precedent for not handing out new values for machine variants. But is that the right decision, especially for high impact machine types like x86-64?

Pros:

* Simple implementation.
* Works similarly as existing mechanisms.
* Linkers, loaders, etc able to detect microarchitecture mis-targeting like they can for machine or OS/ABI level mismatch today.

Cons:

* Backwards incompatible / disruption.
* Likely limited to expression of microarchitecture levels (like `x86-64-v3`) instead of granular "machine features."
* Proliferation of `e_machine` values?
* Error messages marginally better but not as granular as they could be. e.g. you get "because not x86-64-v4" instead of "because no AVX-512."

# Special Section for Machine-Level Compatibility

A generic idea with many variants is to define an ELF section for expressing/testing machine-level compatibility.

Let's propose a fictional `.machine_compat` section. The purpose of this section is to define a mechanism for declaring and/or determining machine-level compatibility. It can help answer the question "does this object supporting `e_machine` type X support running on sub-machine type X.Y?"

Pros:

* Complementary to metadata in ELF header. e.g. old loaders ignore the section; new loaders get enhanced functionality.
* Potential for much more power and expression.

Cons:

* Possible data loss or corruption when using old/unaware linkers.
* More complexity for gABI. Would it be better left to machines or operating systems that need it to define it?

## Generic Data-Based Capabilities and Requirements

Our fictional `.machine_compat` section could contain some gABI defined data structures for generically expressing machine capabilities and requirements around them.

Imagine gABI defined `.machine_compat` as an array of the following data structure:

```
enum SubMachineUsage {
SM_USAGE_REQUIRED,
SM_USAGE_OPTIONAL,
...
}

struct {
u32 submachine_feature; // Enumeration describing feature within a machine.
SubMachineUsage usage;
} SubMachineRequirement;
```

Essentially, this allows an object file to self-declare a dependence on a given feature of its target machine and the nature of that dependency. For x86-64, one could imagine logically expressing "requires microarchitecture level x86-64-v3" or "requires CPUID feature AVX2" or "optionally uses AVX-512."

Pros:

* Super generic. Applicable to all machine types.
* Static data structures are readable from all machine types.
* Presence of unknown values in known data structures enables failing fast. e.g. if a linker or loader knows how to evaluate sub-machine requirements from these data structures but doesn't recognize required machine-level feature with value 42 it can infer it isn't capable of handling that binary.
* Error messages can be as granular as the definition of machine-level features. e.g. if you expose e.g. AVX-512 as a feature, you could have an AVX-512 specific error message.
* Enforcement is optional and could be skipped if desired since data is
in a supplemental ELF section.

Cons:

* Is it possible to even define a generic capabilities mechanism that is appropriate for all machine types? If you start defining things like CPU/memory topologies as machine-level features, simple enumerations may not be sufficient.
* Related to above, the type system may evolve to be sufficiently complex that linkers, loaders, etc may not want to implement the complexity to evaluate compatibility, undermining utility of this data.
* Use in wild may be limited to few, popular machine types that need it. Is definition in gABI warranted?
* Unclear who acts as registrar for sub-machine enumerations. Does gABI take this on so all values are centrally defined? Or do you defer to each machine owner for defining values? Maybe gABI carves out ranges reserved for different machine types as their owners request them?

## Just Define a Named Section

gABI may not want to be in the business of even attempting to define how our fictional `.machine_compat` should behave on each machine type! Perhaps a unified machine capabilities mechanism - no matter how generic - is just too complicated given how varied machines and operating systems are in the wild. In this case, each machine type is free to define its own format of its `.machine_compat` section, independent of all others.

In the case of x86-64, we could imagine the contents of this `.machine_compat` section as an array of structs defining bitmask comparisons for CPUID queries. Something like:

```
struct {
u32 eax_query; // Value to set in EAX before issuing CPUID instruction.
u32 eax_required_mask; // Expected AND bitmask for EAX to match against.
u32 ebx_required_mask; // Expected AND bitmask for EBX...
u32 ecx_required_mask; // ...
u32 edx_required_mask; // ...
} CPUID_compat;
```

(This isn't sufficient for production use.)

Pros:

* Super flexible. Each machine type defines its own rules. gABI just says what section it is in.
* Keeps things simple for gABI.

Cons:

* Possibly no common format between machine types. Likely harder to interpret section content for multiple machine types. Does defining a section with undefined content make sense?

# Conclusion

These are just some half-baked ideas my (again naive and not subject-matter expert) brain could come up with. I'm honestly not sure if gABI wants to get in the business of attempting to "regulate" machine-level compatibility checking. But when you consider things like how the multi-billion dollar x86-64 ecosystem generally has their microarchitecture targeting running ~20 years behind current and there's a compelling argument that limitations in current object file design contribute to that outcome, I thought I'd raise the topic.

I realize that gABI is only part of a much larger ecosystem and that any progress on this idea likely requires buy-in from other groups. I'm largely naive of the dynamics and groups at play here but would welcome being enlightened if it helps moves this idea forward.

Should gABI get in the business of defining machine-level targeting and microarchitecture compatibility?

If so, what could/should that look like and what are potential next steps?

If not, could someone please direct me to appropriate forums to engage the x86-64 and/or GNU/Linux communities on this? (I perceive this group to have the most to gain from any effort to remove roadblocks standing in the way of modernizing microarchitecture targeting.)

And thank you for all you do to help maintain core standards used on billions of devices! I hope I'm not wasting your valuable time with my ideas.

Humbly,

Gregory

[1] https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/9

Roland McGrath

unread,

Jan 28, 2022, 11:57:45 PM1/28/22

to gener...@googlegroups.com

GNU tools have already adopted their chosen solution for this in the `SHT_GNU_ATTRIBUTES` section and related "property" note protocols. There has been a great deal of discussion on the gnu-...@sourceware.org mailing list (http://sourceware.org/mailman/listinfo/gnu-gabi) about the details. While I am not a particular fan of the direction of complexity this design has taken, inventing an equivalent thing separately at this point seems like a dubious plan. A new formalization in the shared ELF format should at the very least take all the experience that went into the GNU solution into account and explain why its differences from that solution are preferable.

Ali Bahrami

unread,

Jan 31, 2022, 11:11:34 AM1/31/22

to gener...@googlegroups.com

On 1/28/22 9:57 PM, 'Roland McGrath' via Generic System V Application Binary Interface wrote:
> GNU tools have already adopted their chosen solution for this in the `SHT_GNU_ATTRIBUTES` section and related "property" note protocols. There has been a great deal of discussion on the

> gnu-...@sourceware.org <mailto:gnu-...@sourceware.org> mailing list (http://sourceware.org/mailman/listinfo/gnu-gabi <http://sourceware.org/mailman/listinfo/gnu-gabi>) about the details. While I am

> not a particular fan of the direction of complexity this design has taken, inventing an equivalent thing separately at this point seems like a dubious plan. A new formalization in the shared ELF
> format should at the very least take all the experience that went into the GNU solution into account and explain why its differences from that solution are preferable.

Similarly, Solaris has had an answer to this
for a couple of decades now. Solaris objects embed
capability masks in a special section, which the runtime
linker interprets against the abilities that the kernel
advertises for the running system:

% isainfo -v
64-bit amd64 applications
prfchw efs rdrand pclmulqdq aes movbe sse4_2 sse4_1 ssse3
popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc
cx8 tsc fpu
32-bit i386 applications
prfchw efs rdrand pclmulqdq aes movbe sse4_2 sse4_1 ssse3
popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc
fpu

(that's an old server in my house, hence the non-cutting edge
capabilities shown).

This forms the basis for rejecting objects that can't run
on the system (object capabilities), and also for providing
multiple implementations of a given function, and automatically
having ld.so.1 pick the best one for the running system
at runtime (symbol capabilities).

ABI details:

https://docs.oracle.com/cd/E37838_01/html/E36783/chapter7-28.html#scrolltoc

Higher Level Documentation:

https://docs.oracle.com/cd/E37838_01/html/E36783/man-cp.html#scrolltoc

It might have been nice if the gABI, and the psABIs had
defined a generic framework for this 20 years ago, but in
the intervening years, the OSABIs have filled the vacuum.

- Ali

Carlos O'Donell

unread,

Jan 31, 2022, 5:34:04 PM1/31/22

to gener...@googlegroups.com, Ali Bahrami

Agreed :-)

For GNU ABI we have a combination of:

- "Object Attributes" .gnu.attributes and SHT_GNU_ATTRIBUTES (13 years old)
- for static link time processing.
- Early discussions:
https://gcc.gnu.org/legacy-ml/gcc/2007-06/msg00654.html

- "Program Properties" .note and NT_GNU_PROPERTY_TYPE_0 (4 years old)
- for dynamic loader processing.

Lastly, through annobin[1] you can inject full object markup if you want
to do something more detailed offline, and we do that via annocheck
if you want belt-and-suspenders checking for application builds that
want to enforce coverage that is more detailed than Object Attributes
and Program Properties can provide. Though the markup is evolving.

--
Cheers,
Carlos.

[1] https://sourceware.org/annobin/

Reply all

Reply to author

Forward

0 new messages