Clarify function ABI of vector types (__m256 and __m512) when features aren't available

62 views
Skip to first unread message

connor horman

unread,
Aug 15, 2023, 6:41:31 AM8/15/23
to X86-64 System V Application Binary Interface
The current specification passes vector types in a manner that indicates the whole value is passed in one vector register
> If the class is SSEUP, it is passed in the next eightbyte of the last used vector register.

This does not pose an issue for __m128, as xmm registers (presumed to exist) are 16 bytes wide by default. However, __m256 and __m512 do not have an abi when passed without the avx and avx512 (or avx10-512 now I presume) features respectively.

This has led to all kinds of compiler-level differences between these functions, both when the feature is globally disabled (via -mno-avx) and when the feature is globally disabled but locally enabled. These differences can be observed https://godbolt.org/z/55ac9sWKE.

Notable differences:
* Parameter passing is unaffected when globally and locally disabled, both gcc and clang on the stack. 
* Returning when disabled globally and locally differs: gcc returns in memory using rdi. clang returns in two vector (or four for __mm512) registers, xmm0 and xmm1 (xmm0 through xmm3 for __m512).
* When feature is enabled locally only: gcc passes and returns correctly according to abi. clang still passes on the stack but returns properly

(Note: on the stack here means offset relative to the start of the parameter passing area - neither passes in memory via replacement with a pointer in any case)

The Sys-V abi should clarify (at least) one of the following:
* That code which passes __m256 and __m512 without the required feature available explicitly does not conform to the abi, and compilers should warn or error when code attempts to do so (gcc does warn here, but clang does not),
* What the abi of these types are when the features are globally disabled, and when they are enabled per-function.

Michael Matz

unread,
Aug 15, 2023, 8:28:06 AM8/15/23
to connor horman, X86-64 System V Application Binary Interface
Hello,

On Tue, 15 Aug 2023, connor horman wrote:

> The current specification passes vector types in a manner that indicates
> the whole value is passed in one vector register
> > If the class is SSEUP, it is passed in the next eightbyte of the last
> used vector register.

The psABI actually says this:

If the class is SSEUP, the \eightbyte is passed in the next
available \eightbyte chunk of the last used vector register.

Note the word "available". Without avx (or avx512f for __m512) there's no
next available eightbyte for the higher SSEUP parts, at which point this
wording comes into play:

If there are no registers available for any \eightbyte of an
argument, the whole argument is passed on the stack. If registers have
already been assigned for some \eightbytes of such an argument, the
assignments get reverted.

> * Returning when disabled globally and locally differs: gcc returns in
> memory using rdi. clang returns in two vector (or four for __mm512)
> registers, xmm0 and xmm1 (xmm0 through xmm3 for __m512).

Returning of values is defined in terms of eightbyte classification in the
same way as argument passing. In particular __m256 has four eightbytes of
type SSE and three times SSEUP and the same language for how to assign
SSEUP parts. The section about returning values misses the wording that
when there are no available regs for an eightbyte that it should be
returned via memory, but that seems obvious.

So, the above is simply a clang bug.;

> * When feature is enabled locally only: gcc passes and returns correctly
> according to abi. clang still passes on the stack but returns properly
>
> The Sys-V abi should clarify (at least) one of the following:
> * That code which passes __m256 and __m512 without the required feature
> available explicitly does not conform to the abi, and compilers should warn
> or error when code attempts to do so (gcc does warn here, but clang does
> not),

Actually it does conform to the psABI. But a warning is very sensible as
the ABI changes for passing/returning __m256/512 depending on available
ISA.

> * What the abi of these types are when the features are globally disabled,
> and when they are enabled per-function.

It does say so, see above. Passing and returning in memory.


Ciao,
Michael.

connor horman

unread,
Aug 15, 2023, 8:58:22 AM8/15/23
to Michael Matz, X86-64 System V Application Binary Interface
Alright, that's reasonable. So I guess this is a bug to file against llvm.

connor horman

unread,
Aug 15, 2023, 9:07:42 AM8/15/23
to Michael Matz, X86-64 System V Application Binary Interface
I do wonder if there is an argument of whether `ymm0` is "Available" when using `__attribute__((target))`, though. That may be something the psABI *should* clarify.

Phoebe Wang

unread,
Aug 15, 2023, 11:13:44 AM8/15/23
to Michael Matz, connor horman, X86-64 System V Application Binary Interface
I'm proposing we should make ABI constant no matter if the target supports it or not, that is, we should always emit YMM/ZMM for  __m256/512. See https://groups.google.com/g/x86-64-abi/c/vQcfj--osKs
Although it may result in run time crash, it'd still be better for unexpected results when users interlink among modules built by different target features.

Michael Matz <ma...@suse.de> 于2023年8月15日周二 20:28写道:
--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/alpine.LSU.2.20.2308151218230.25429%40wotan.suse.de.

connor horman

unread,
Aug 15, 2023, 11:16:30 AM8/15/23
to Phoebe Wang, Michael Matz, X86-64 System V Application Binary Interface
That would be better IMO: although I'd prefer a compile-time error instead of a runtime error.

I believe VEX/EVEX are also some of the few instructions that used to be interpreted incorrectly (not as a #UD) on some old cpus, so there would be more issues than just getting SIGILL. IDK if P4 is old enough to see those issues, though, so x86-64 may be immune.

Michael Matz

unread,
Aug 15, 2023, 11:22:22 AM8/15/23
to Phoebe Wang, connor horman, X86-64 System V Application Binary Interface
Hello,

On Tue, 15 Aug 2023, Phoebe Wang wrote:

> I'm proposing we should make ABI constant no matter if the target supports
> it or not, that is, we should always emit YMM/ZMM for __m256/512. See
> https://groups.google.com/g/x86-64-abi/c/vQcfj--osKs

If consistency over ISA capabilities is desired then the only way would be
to always pass on stack. There's nothing special about these types (in
comparison to, say, "struct {double member[4];}") that would justify
forbidding their use when compiling without -mavx/avx512f.

> Although it may result in run time crash, it'd still be better for
> unexpected results when users interlink among modules built by
> different target features.

If you want to forbid the types when YMM/ZMM aren't available a compile
time error (i.e. promote the existing warning of GCC to an error) would be
enough.

But the psABI is now like it is, and for m256 it is like that since many
years (and m512 isn't that much better in this respect). We can't really
change that anymore. Interoperability between ISAs should have been the
concern back when introducing them and their ABI, it's too late now, I
fear.


Ciao,
Michael.

connor horman

unread,
Aug 15, 2023, 11:32:29 AM8/15/23
to Michael Matz, Phoebe Wang, X86-64 System V Application Binary Interface
I'm not sure it's too late for the psABI, to, say "The use of __m256 and __m512 in parameters and return values is deprecated when ymm or zmm registers (respectively) are unavailable. Compilers Compliant with this ABI should issue a warning or error when translating such programs."

It is definitely unfortunate that this was not caught earlier in the development. However, It's certainly a learning experience for new features that get ABI support - __tile8192, whenever that gets added by intel, for example could either always pass in memory or require AMX to be available - or for other abis (I've written an ABI for an ISA that I developed, and it specs that vector types just always pass in integer registers and should be moved into vector registers by the callee, sidestepping the issue). 

connor horman

unread,
Aug 15, 2023, 11:44:28 AM8/15/23
to Michael Matz, Phoebe Wang, X86-64 System V Application Binary Interface
Also, on the nothing special side:
`struct foo{double member[4]};` consists of 4 SSE eightbytes, rather than 1 SSE and 3 SSEUP eightbytes. Since it's more than 2 eightbytes and not 1 SSE followed by any number of SSEUP, it get's passed in memory. 
This may be an argument as to whether or not `__m256` should have been treated specially in the first place, but `struct foo{double member[2]};` and `__m128` also differ in this way - `foo` get's passed in the low eightbytes of 2 xmm registers, whereas `__m128` is passed in 1.

I think the classification of `__m256` and `__m512` are sensible, and the handling of the SSEUP eightbytes w/o the feature isn't. Both gcc and clang refuse to compile code using `__m128` (or normal floats) with sse disabled (https://godbolt.org/z/hK68f3oMj), so I don't think it's a stretch to expect `__m256` and `__m512` to behave the same.

Jan Beulich

unread,
Aug 15, 2023, 12:16:20 PM8/15/23
to Phoebe Wang, connor horman, X86-64 System V Application Binary Interface, Michael Matz
On 15.08.2023 17:13, Phoebe Wang wrote:
> I'm proposing we should make ABI constant no matter if the target supports
> it or not, that is, we should always emit YMM/ZMM for __m256/512. See
> https://groups.google.com/g/x86-64-abi/c/vQcfj--osKs
> Although it may result in run time crash, it'd still be better for
> unexpected results when users interlink among modules built by
> different target features.

But that would require making __m256 / __m512 special, because it is not
reasonable to forbid (crash on) passing values of types declared using
the vector_size attribute.

Jan

Phoebe Wang

unread,
Aug 16, 2023, 2:28:47 AM8/16/23
to Jan Beulich, connor horman, X86-64 System V Application Binary Interface, Michael Matz
Jan Beulich <jbeu...@suse.com> 于2023年8月16日周三 00:16写道:
Actually GCC does have made __m256/__m512 special. Clang doesn't distinguish them so far.

Jan Beulich

unread,
Aug 16, 2023, 2:56:26 AM8/16/23
to Phoebe Wang, connor horman, X86-64 System V Application Binary Interface, Michael Matz
Have they? I don't see anything special in

typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__));
typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__));
typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__));

I can use exactly the same constructs to produce custom types, which
- afaict - are the indistinguishable to the compiler.

Jan

Phoebe Wang

unread,
Aug 16, 2023, 6:48:55 AM8/16/23
to Jan Beulich, connor horman, X86-64 System V Application Binary Interface, Michael Matz
Jan Beulich <jbeu...@suse.com> 于2023年8月16日周三 14:56写道:

Jan Beulich

unread,
Aug 16, 2023, 7:32:38 AM8/16/23
to Phoebe Wang, connor horman, X86-64 System V Application Binary Interface, Michael Matz
Nope. Try putting bar() first and foo() second. All that happens is that you
get the warning only once per CU.

Jan

Phoebe Wang

unread,
Aug 16, 2023, 12:19:06 PM8/16/23
to Jan Beulich, connor horman, X86-64 System V Application Binary Interface, Michael Matz
Jan Beulich <jbeu...@suse.com> 于2023年8月16日周三 19:32写道:
 Ashamed for the mistake and thanks for correcting my prejudice!
Reply all
Reply to author
Forward
0 new messages