clang vs gcc

124 views
Skip to first unread message

Victor Khimenko

unread,
Dec 5, 2024, 6:23:44 PM12/5/24
to x86-64-abi
This code generates incompatible code when compiler with gcc and clang https://godbolt.org/z/nzE4e4jPo :

using UInt64x2 = unsigned long long __attribute__((__vector_size__(16), may_alias));

template<int id>
struct XMM1 {
UInt64x2 x;
};

struct XMM2 : XMM1<0>, XMM1<1> {
};

XMM2 foo() {
XMM2 result;
((XMM1<0>*)&result)->x = UInt64x2{1, 2};
((XMM1<1>*)&result)->x = UInt64x2{3, 4};
return result;
}

Is it normal? Or is it a violation of x86-64 ABI? If that's a violation then which compiler violates it?

Michael Matz

unread,
Dec 11, 2024, 10:22:23 AM12/11/24
to Victor Khimenko, x86-64-abi
clang violates the psABI. The return type above (XMM2) is not the type
__m256 and is larger than two eight-bytes. It's an aggregate where the
individual eight-bytes have initially classes SSE,SSEUP,SSE,SSEUP, which
then becomes SSE,SSE,SSE,SSE. Alternatively it remains
SSE,SSEUP,SSE,SSEUP, but either way it's larger than two eight-bytes and,
while the first eightbyte is SSE, not all others are SSEUP.

Hence it gets class MEMORY and therefore is returned by invisible
reference (caller passes address of return slot via %rdi and expects
callee to fill that).

clang seems to regard type 'XMM2' to be equal to __m256, but that's not
the case. A test like this shows other problems as well:

% cat x.cc
#include <x86intrin.h>
__m128 get2 () { __m128 r = (__m128){5,6}; return r; }
__m256 get4 () { __m256 r = (__m256){7,8,9,10}; return r; }

Depending on compiling this with or without -march=x86-64-v3 the
"processor supports type __m256" or not, and so get4 should either return
via %ymm0 or via invisible ref. That's also what GCC is doing. clang
instead returns via %ymm0 (with the option, and that's correct then) or
does this:

_Z4get4v: # @_Z4get4v
movaps .LCPI2_0(%rip), %xmm0 # xmm0 = [7.0E+0,8.0E+0,9.0E+0,1.0E+1]
xorps %xmm1, %xmm1
retq
.Lfunc_end2:

which is completely wrong. (%xmm1 would be used as return register for
types like 'complex double', which __m256 also isn't)

The current wording in the psABI, specifically the requirement that
arguments/return type be precisely type __m256 (or __m512 for avx512
processors), and not merely any random type that one could envision to
somewhat fit into a __m256, is so that the ABI isn't changed for user
defined types merely by compiling things with different options.

The above XMM2 type is such a normal used-defined type, and so its use as
parameter or return type must not influence the ABI depending on
compile options.

(The types __m256 and __m512, and only these, have this problem to a much
lesser extent as they were only introduced with ymm/zmm registers and
hence their use on pre-avx2 machine is very limited. GCC does warn
about using these types for param passing/returning when compiling for
non-avx machines).


Ciao,
Michael.
Reply all
Reply to author
Forward
0 new messages