Re: Intel AVX10.1 Compiler Design and Support

62 views
Skip to first unread message

Hongtao Liu

unread,
Aug 8, 2023, 10:15:11 PM8/8/23
to Joseph Myers, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper
On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <craz...@gmail.com> wrote:
>
> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <jos...@codesourcery.com> wrote:
> >
> > Do you have any comments on the interaction of AVX10 with the
> > micro-architecture levels defined in the ABI (and supported with
> > glibc-hwcaps directories in glibc)? Given that the levels are cumulative,
> > should we take it that any future levels will be ones supporting 512-bit
> > vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
> > AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
> > that only support 256-bit vector width will be considered to match the
> > x86-64-v3 micro-architecture level but not any higher level?
> This is actually something we really want to discuss in the community,
> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
> One big reason is Intel E-core will only support AVX10 256-bit, if we
> want to use x86-64-v5 accross server and client, it's better to
> 256-bit default.
+ ABI and LLVM folked for this topic.
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao

Hongtao Liu

unread,
Aug 8, 2023, 10:18:47 PM8/8/23
to Joseph Myers, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper
s/folked/folks/

> > >
> > > --
> > > Joseph S. Myers
> > > jos...@codesourcery.com
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



--
BR,
Hongtao

Phoebe Wang

unread,
Aug 9, 2023, 12:01:56 AM8/9/23
to Hongtao Liu, Joseph Myers, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper

I have some proposals about unifying ABI on AVX10 for both 256-bit and 512-bit.

 

Proposal 1: Promote attribute from AVX10-256 to AVX10-512 for any function which has 512-bit or above vectors in passing/returning arguments.

Problem: Binary cannot run on AVX10-256 only target.

Reason:

When user tries to pass/return 512-bit vector, they should be aware of it will become target dependent. User should be taught not to use it on 256-bit targets and there will be unexpected things happening if they insist.

Actually, ICC and MSVC already have chosen to promote for the argument: https://godbolt.org/z/vcrf9qW5z I think if compiler have to choose the misbehavior between fail in result and crash due to illegal instruction, the latter is definitely better than the former.

In this way, we can also declare x86-64-v5 is inherit from x86-64-v4 and has the interaction with previous versions.

 

Proposal 2: Abort compilation when user tries to pass/return 512-bit vectors.

Reason: This turns possible run time crash into compile time error.

 

Proposal 3: Change the ABI of 512-bit vector and always be passed/returned from memory.

Reason: We expect AVX10-256 is a universal configuration and in most scenarios, 512-bit vector won't bring performance improvements. So we can sacrifice a little 512-bit performance to achieve the interaction between AVX10-256 and AVX10-512. In this way, there won't have any runtime issue in the future either.

 

Thanks

Phoebe


Hongtao Liu <craz...@gmail.com> 于2023年8月9日周三 10:18写道:
--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/x86-64-abi/CAMZc-bzj5971PJ4UN2aB4LB-9nj4q_fRiykT9My3syohGLbZrw%40mail.gmail.com.

Jan Beulich

unread,
Aug 9, 2023, 3:17:59 AM8/9/23
to Hongtao Liu, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers
On 09.08.2023 04:14, Hongtao Liu wrote:
> On Wed, Aug 9, 2023 at 9:21 AM Hongtao Liu <craz...@gmail.com> wrote:
>>
>> On Wed, Aug 9, 2023 at 3:55 AM Joseph Myers <jos...@codesourcery.com> wrote:
>>>
>>> Do you have any comments on the interaction of AVX10 with the
>>> micro-architecture levels defined in the ABI (and supported with
>>> glibc-hwcaps directories in glibc)? Given that the levels are cumulative,
>>> should we take it that any future levels will be ones supporting 512-bit
>>> vector width for AVX10 (because x86-64-v4 requires the current AVX512F,
>>> AVX512BW, AVX512CD, AVX512DQ and AVX512VL) - and so any future processors
>>> that only support 256-bit vector width will be considered to match the
>>> x86-64-v3 micro-architecture level but not any higher level?
>> This is actually something we really want to discuss in the community,
>> our proposal for x86-64-v5: AVX10.2-256(Implying AVX10.1-256) + APX.
>> One big reason is Intel E-core will only support AVX10 256-bit, if we
>> want to use x86-64-v5 accross server and client, it's better to
>> 256-bit default.

Aiui these ABI levels were intended to be incremental, i.e. higher versions
would include everything earlier ones cover. Without such a guarantee, how
would you propose compatibility checks to be implemented in a way
applicable both forwards and backwards? If a new level is wanted here, then
I guess it could only be something like v3.5.

Jan

Hongtao Liu

unread,
Aug 9, 2023, 3:38:24 AM8/9/23
to Jan Beulich, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers
Are there many software implemenation based on this assumption?
At least in GCC, it's not a big problem, we can adjust code for the
new micro-architecture level.
> applicable both forwards and backwards? If a new level is wanted here, then
> I guess it could only be something like v3.5.
But if we use avx10.1 as v3.5, it's still not subset of
x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
x86-64-v4), there will be still a diverge.
Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

Our main proposal is to make AVX10.x as new micro-architecture level
with 256-bit default, either v3.5 or v5 would be acceptable if it's
just the name.
>
> Jan



--
BR,
Hongtao

Jan Beulich

unread,
Aug 9, 2023, 4:04:09 AM8/9/23
to Hongtao Liu, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers
Hmm, yes. But something will end up being odd in any event. Versions no
longer being integral values is kind of indicating a "branch", i.e. v4
not being a successor. Maybe v3.1 would be better, for it to then have
possible successors v3.2, v3.3, etc. Of course it would be possible to
"merge" branches back then, into e.g. v5 covering AVX10.2/512 (and
thus fully covering everything that's in v4).

Jan

Florian Weimer

unread,
Aug 9, 2023, 4:14:38 AM8/9/23
to Richard Biener via Gcc-patches, Phoebe Wang, Richard Biener, Hongtao Liu, Joseph Myers, Haochen Jiang, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper
* Richard Biener via Gcc-patches:

> I don’t think we can realistically change the ABI. If we could
> passing them in two 256bit registers would be possible as well.
>
> Note I fully expect intel to turn around and implement 512 bits on a
> 256 but data path on the E cores in 5 years. And it will take at
> least that time for AVX10 to take off (look at AVX512 for this and how
> they cautionously chose to include bf16 to cut off Zen4). So IMHO we
> shouldn’t worry at all and just wait and see for AVX42 to arrive.

Yes, the direction is a bit unclear. In retrospect, we could have
defined x86-64-v4 to use 256 bit vector width, so it could eventually be
compatible with AVX10; it's also what current Intel CPUs prefer (and
past, with the exception of the Xeon Phi line). But in the meantime,
AMD has started to ship CPUs that seem to prefer 512 bit vectors,
despite having a double pumped implementation. (Disclaimer: All CPU
preferences inferred from current compiler tuning defaults, not actual
experiments. 8-/)

To me, this looks like we may have defined x86-64-v4 prematurely, and
this suggests we should wait a bit to see where things are heading.

Thanks,
Florian

Hongtao Liu

unread,
Aug 9, 2023, 4:24:56 AM8/9/23
to Florian Weimer, Richard Biener via Gcc-patches, Phoebe Wang, Richard Biener, Joseph Myers, Haochen Jiang, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper
On Wed, Aug 9, 2023 at 4:14 PM Florian Weimer <fwe...@redhat.com> wrote:
>
> * Richard Biener via Gcc-patches:
>
> > I don’t think we can realistically change the ABI. If we could
> > passing them in two 256bit registers would be possible as well.
> >
> > Note I fully expect intel to turn around and implement 512 bits on a
> > 256 but data path on the E cores in 5 years. And it will take at
> > least that time for AVX10 to take off (look at AVX512 for this and how
> > they cautionously chose to include bf16 to cut off Zen4). So IMHO we
> > shouldn’t worry at all and just wait and see for AVX42 to arrive.
>
> Yes, the direction is a bit unclear. In retrospect, we could have
> defined x86-64-v4 to use 256 bit vector width, so it could eventually be
> compatible with AVX10; it's also what current Intel CPUs prefer (and
NOTE, avx10.x-256 also inhibit the usage of 64-bit kmask which is
supposed to be only used by zmm instructions.
But in theory, those 64-bit kmask intrinsics can be used standalone
.i.e. kshift/kand/kor.
> past, with the exception of the Xeon Phi line). But in the meantime,
> AMD has started to ship CPUs that seem to prefer 512 bit vectors,
> despite having a double pumped implementation. (Disclaimer: All CPU
> preferences inferred from current compiler tuning defaults, not actual
> experiments. 8-/)
>
> To me, this looks like we may have defined x86-64-v4 prematurely, and
> this suggests we should wait a bit to see where things are heading.
>
> Thanks,
> Florian
>


--
BR,
Hongtao

Florian Weimer

unread,
Aug 9, 2023, 5:15:42 AM8/9/23
to Hongtao Liu, Jan Beulich, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers
* Hongtao Liu:

> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich <jbeu...@suse.com> wrote:
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way

Correct, this was the intent. But it's mostly to foster adoption and
make it easier for developers to pick the variants that they want to
target custom builds. If it's an ascending chain, the trade-offs are
simpler.

> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.

The glibc framework can deal with alternate choices in principle,
although I'd prefer not to go there for the reasons indicated.

>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.

> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.
> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

The question is whether you want to mandate the 16-bit floating point
extensions. You might get better adoption if you stay compatible with
shipping CPUs. Furthermore, the 256-bit tuning apparently benefits
current Intel CPUs, even though they can do 512-bit vectors.

(The thread subject is a bit misleading for this sub-topic, by the way.)

Thanks,
Florian

Hongtao Liu

unread,
Aug 9, 2023, 6:15:50 AM8/9/23
to Florian Weimer, Jan Beulich, Haochen Jiang, gcc-p...@gcc.gnu.org, ubi...@gmail.com, hongt...@intel.com, Zhang, Annita, phoeb...@intel.com, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers
Not only 16-bit floating point, here's a whole picture of AVX512->AVX10 in
Figure 1-1. Intel® AVX-512 Feature Flags Across Intel® Xeon® Processor
Generations vs. Intel® AVX10
and Figure 1-2. Intel® ISA Families and Features
at https://cdrdv2.intel.com/v1/dl/getContent/784343 (this link is a
direct download of pdf).



>
> (The thread subject is a bit misleading for this sub-topic, by the way.)
>
> Thanks,
> Florian
>


--
BR,
Hongtao

Michael Matz

unread,
Aug 9, 2023, 9:54:27 AM8/9/23
to Zhang, Annita, Florian Weimer, Hongtao Liu, Beulich, Jan, Jiang, Haochen, gcc-p...@gcc.gnu.org, ubi...@gmail.com, Liu, Hongtao, Wang, Phoebe, x86-64-abi, llvm-dev, Craig Topper, Joseph Myers
Hello,

On Wed, 9 Aug 2023, Zhang, Annita via Gcc-patches wrote:

> > The question is whether you want to mandate the 16-bit floating point
> > extensions. You might get better adoption if you stay compatible with shipping
> > CPUs. Furthermore, the 256-bit tuning apparently benefits current Intel CPUs,
> > even though they can do 512-bit vectors.
> >
> > (The thread subject is a bit misleading for this sub-topic, by the way.)
> >
> > Thanks,
> > Florian
>
> Since 256bit and 512bit are diverged from AVX10.1 and will continue in
> the future AVX10 versions, I think it's hard to keep a single version
> number to cover both and increase monotonically. Hence I'd like to
> suggest x86-64-v5 for 512bit and x86-64-v5-256 for 256bit, and so on.

The raison d'etre for the x86-64-vX scheme is to make life sensible as
distributor. That goal can only be achieved if this scheme contains only
a few components that have a simple relationship. That basically means:
one dimension only. If you now add a second dimension (with and without
-512) we have to add another one if Intel (or whomever else) next does a
marketing stunt for feature "foobar" and end up with x86-64-v6,
x86-64-v6-512, x86-64-v6-1024, x86-64-v6-foobar, x86-64-v6-512-foobar,
x86-64-v6-1024-foobar.

In short: no.

It isn't the right time anyway to assign meaning to x86-64-v5, as it
wasn't the right time for assigning x86-64-v4 (as we now see). These are
supposed to reflect generally useful feature sets actually shipped in
generally available CPUs in the market, and be vendor independend. As
such it's much too early to define v5 based purely on text documents.


Ciao,
Michael.

Joseph Myers

unread,
Aug 9, 2023, 4:43:09 PM8/9/23
to Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-p...@gcc.gnu.org, ubi...@gmail.com, Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper
On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:

> Proposal 3: Change the ABI of 512-bit vector and always be
> passed/returned from memory.

Changing ABIs like that for existing code that has worked for some time on
existing hardware is a bad idea.

At this point it seems appropriate to remind people of another ABI
consideration for vector extensions. glibc's libmvec defines vector
versions of various functions, including AVX512 ones (of course those
function versions only work on hardware with the relevant instructions).
glibc's headers use both _Pragma ("omp declare simd notinbranch") and
__attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler
including those headers, what function variants are available in glibc.

Existing glibc versions need to continue to work with new compiler
versions. That is, it's part of the ABI, which must remain stable,
exactly which function versions the above pragma and attribute imply are
available - and of course the details of how those functions versions take
arguments / return results are also part of the ABI (it would be OK for a
new compiler to choose not to use some of those vector versions, but not
to start calling them with a different ABI).

Maybe you'll want to add new vector function versions, with different
interfaces, to libmvec in future. If so, you need a *different* pragma or
attribute to declare to the compiler that the libmvec version using that
pragma or attribute has the additional functions - so new compilers using
the existing header will not try to generate calls to new function
versions that don't exist in that glibc version (but new compilers using a
new header version from new glibc will see the new pragma or attribute and
so be able to generate the relevant calls to new functions). And once
you've defined the ABI for such a new pragma or attribute, that itself
then becomes a stable interface - so if you end up with vector extensions
involving yet another set of interfaces, they need another corresponding
new pragma / attribute for libmvec to declare to the compiler that the new
interfaces exist.

Phoebe Wang

unread,
Aug 10, 2023, 8:36:22 AM8/10/23
to Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-p...@gcc.gnu.org, ubi...@gmail.com, Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper
>  Changing ABIs like that for existing code that has worked for some time on
>  existing hardware is a bad idea.

I agree, so Proposal 3 is the last choice.

The target of the proposals is to solve the ABI incompatible issue between AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are discussing the default ABI rather than other vector variants.

If you believe that changing 512-bit ABI (the 512-bit version) is a bad idea, how about Proposal 1 and 2? I don't want to call the non 512-bit version an ABI because it doesn't provide the interaction between 256-bit and 512-bit targets. Besides, LLVM also behaves differently with GCC on non 512-bit targets. It is a good time to solve the problem together if we make the 512-bit ABI consistent and target independent. WDYT?

Thanks
Phoebe

Joseph Myers <jos...@codesourcery.com> 于2023年8月10日周四 04:43写道:
--
You received this message because you are subscribed to the Google Groups "X86-64 System V Application Binary Interface" group.
To unsubscribe from this group and stop receiving emails from it, send an email to x86-64-abi+...@googlegroups.com.

Phoebe Wang

unread,
Aug 10, 2023, 9:12:22 AM8/10/23
to Richard Biener, Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-p...@gcc.gnu.org, ubi...@gmail.com, Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper
>  The psABI should have some simple rule covering all of the above I think.

psABI has a rule for the case doesn't mean the rule is a well defined ABI in practice. A well defined ABI should guarantee 1) interlinkable across different compile options within the same compiler; 2) interlinkable across different compilers. Both aspects are failed in the non 512-bit version.

1) is more important than 2) and becomes more critical on AVX10 targets. Because we expect AVX10-256 is a general setting for binaries that can run on both AVX10-256 and AVX10-512. It would be common that binaries compiled with AVX10-256 may link with native built binaries on AVX10-512 targets.

Both 1) and 2) show the problem of the current rule in the psABI. So I think the psABI should be updated to solve them.

Thanks
Phoebe

Richard Biener <richard....@gmail.com> 于2023年8月10日周四 20:46写道:
On Thu, Aug 10, 2023 at 2:37 PM Phoebe Wang via Gcc-patches
<gcc-p...@gcc.gnu.org> wrote:
>
> >  Changing ABIs like that for existing code that has worked for some time
> on
> >  existing hardware is a bad idea.
>
> I agree, so Proposal 3 is the last choice.
>
> The target of the proposals is to solve the ABI incompatible issue between
> AVX10-256 and AVX10-512 when passing/returning 512 vectors. So we are
> discussing the default ABI rather than other vector variants.
>
> If you believe that changing 512-bit ABI (the 512-bit version) is a bad
> idea, how about Proposal 1 and 2? I don't want to call the non 512-bit
> version an ABI because it doesn't provide the interaction between 256-bit
> and 512-bit targets. Besides, LLVM also behaves differently with GCC on non
> 512-bit targets. It is a good time to solve the problem together if we make
> the 512-bit ABI consistent and target independent. WDYT?

Isn't this situation similar to the not defined ABI when passing generic
vectors (via __attribute__((vector_size))) that do not map to vectors supported
by the current ISA?  There's cases like vector<2> char or vector<1> double
to consider for example that would fit in a lowpart of a supported vector
register and as in the AVX512 case vectors that are larger than any supported
vector register.

The psABI should have some simple rule covering all of the above I think.

Richard.

Jan Beulich

unread,
Aug 10, 2023, 9:31:02 AM8/10/23
to Phoebe Wang, Joseph Myers, Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-p...@gcc.gnu.org, ubi...@gmail.com, Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper, Richard Biener
On 10.08.2023 15:12, Phoebe Wang wrote:
>> The psABI should have some simple rule covering all of the above I think.
>
> psABI has a rule for the case doesn't mean the rule is a well defined ABI
> in practice. A well defined ABI should guarantee 1) interlinkable across
> different compile options within the same compiler; 2) interlinkable across
> different compilers. Both aspects are failed in the non 512-bit version.
>
> 1) is more important than 2) and becomes more critical on AVX10 targets.
> Because we expect AVX10-256 is a general setting for binaries that can run
> on both AVX10-256 and AVX10-512. It would be common that binaries compiled
> with AVX10-256 may link with native built binaries on AVX10-512 targets.

But you're only describing a pre-existing problem here afaict. Code compiled
with -mavx51f passing __m512 type data to a function compiled with only,
say, -maxv2 won't interoperate properly either. What's worse, imo the psABI
doesn't sufficiently define what __m256 etc actually are. After all these
aren't types defined by the C standard (as opposed to at least most other
types in the respective table there), and you can't really make assumptions
like "this is what certain compilers think this is".

Jan

Joseph Myers

unread,
Aug 10, 2023, 6:16:50 PM8/10/23
to Richard Biener, Phoebe Wang, Wang, Phoebe, Hongtao Liu, Jiang, Haochen, gcc-p...@gcc.gnu.org, ubi...@gmail.com, Liu, Hongtao, Zhang, Annita, x86-64-abi, llvm-dev, Craig Topper
On Thu, 10 Aug 2023, Richard Biener via Gcc-patches wrote:

> Isn't this situation similar to the not defined ABI when passing generic
> vectors (via __attribute__((vector_size))) that do not map to vectors supported
> by the current ISA? There's cases like vector<2> char or vector<1> double
> to consider for example that would fit in a lowpart of a supported vector
> register and as in the AVX512 case vectors that are larger than any supported
> vector register.

Note there is a difference in some cases (I don't know if this is relevant
for x86) between "vectors supported by the current ISA" and "vectors whose
ABI, for ISAs that do support them, can be implemented using the current
ISA".

Specifically, when working on the VFP AAPCS variant for 32-bit Arm, I made
sure that generic vectors had the same ABI on all processors supporting
VFP, whether or not the vector parts of the instruction set were supported
on the chosen processor. On 32-bit Arm that's possible because vector
registers are the same as floating-point registers (and even the
single-precision-only VFP variant has suitable load and store
instructions).

Of course if your ABI for some kinds of vectors uses registers not
supported on all processors, and on the processors that do support those
registers you use that ABI for corresponding generic vectors, then you
won't be able to be compatible with that ABI for those generic vectors on
processors without those registers.
Reply all
Reply to author
Forward
0 new messages