[RFC] AVX10: Unify 512-bit ABI for both AVX10-256 and AVX10-512 targets

81 views
Skip to first unread message

Phoebe Wang

unread,
Aug 11, 2023, 6:19:03 AM8/11/23
to x86-64-abi
The initial discussion can be found here. Repost here to make it a formal discussion.

Background

Passing/return 512-bit vector type, e.g., __m512, on non 512-bit targets breaks the ABI. https://godbolt.org/z/M57bWf3nY
Compilers warn for such use scenarios, but generate code in an altered ABI.
This will lead unexpected run time failures when linking across AVX2 and AVX512 targets. And linker cannot detect the risk in advance, which makes user in a high risk when they use 512-bit vector types on non 512-bit targets.

The problem has been existing for many years, and not limited to 512-bit vector. But it's getting serious for 512-bit vector ABI in the future AVX10 targets [spectechnical paper]. Because the AVX10-256 is a general setting for binaries that can run on both AVX10-256 and AVX10-512. It would be common that binaries compiled with AVX10-256 link with native built binaries on AVX10-512 targets in the future.

To avoid the potential undetectable linking catastrophes, we should improve the ABI by unifying it on both AVX10-256 and AVX10-512 targets. Here are proposals to solve it.

Proposals

Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function which has 512-bit or above vectors in passing/returning arguments.
Problem: Binary cannot run on AVX10-256 only target.
Reason:
When user tries to pass/return 512-bit vector, they should be aware of it will become target dependent. User should be taught not to use it on 256-bit targets and there will be unexpected things happening if they insist.
Actually, ICC and MSVC already have chosen to promote for the argument: https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the misbehavior between fail in result and crash due to illegal instruction, the latter is definitely better than the former.
In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and has the interaction with previous versions.

Proposal 2: Abort compilation when user tries to pass/return 512-bit vectors.
Reason: This turns pential run time crash into compile time error.

Proposal 3: Change the ABI of 512-bit vector and always be passed/returned from memory.
Reason: We expect AVX10-256 is a universal configuration and in most scenarios, 512-bit vector won't bring performance improvements. So we can sacrifice a little 512-bit performance to achieve the interaction between AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in the future either.

Summary

My preference is proposal 1 is better than proposal 2 and proposal 3 is the lest choice becaue 512-bit ABI on 512-bit targets is widely used everywhere.
Reply all
Reply to author
Forward
0 new messages