Vector ABI

575 views
Skip to first unread message

Andrew Senkevich

unread,
Apr 7, 2015, 8:53:22 AM4/7/15
to x86-6...@googlegroups.com, Joseph S. Myers, Jakub Jelinek, Xinmin Tian, Zamyatin, Igor, Ostanevich Sergey, Enkovich Ilya, Bataev Alexey
Hi,

here is new chapter describing Vector ABI for OpenMP.

Goal of this patch is to have documented Vector ABI to achieve
compatibility between compiler implementations.
Compatibility between compilers and libraries also covered by this Vector ABI.

Content is based on Intel Vector ABI and current GCC implementation.
It was agreed what ICC will have compatibility mode to follow this
Vector ABI.

Any comments are welcome!

==============================
Vector Function Application Binary Interface Specification for OpenMP

1. Vector Function ABI Overview

Vector Function ABI provides ABI for vector functions generated by
compiler supporting SIMD constructs of OpenMP 4.0 [1].
The use of a SIMD construct for a function declaration or definition
enables the creation of vector versions of the function from the
scalar version of the function that can be used to process multiple
instances concurrently in a single invocation in a vector context
(e.g. vectorized loops).
Vector Function ABI defines a set of rules that the caller and the
callee functions must obey.
These rules consist of:
* Calling convention
* Vector length (the number of concurrent scalar invocations to be
processed per invocation of the vector function)
* Mapping from element data types to vector data types
* Ordering of vector arguments
* Vector function masking
* Vector function name mangling
* Compiler generated variants of vector function

Vector Function ABI makes possible to know exact list of available
vector function implementations provided by some library based on
OpenMP pragma found in the function`s prototype in library headers.

2. Vector Function ABI

2.1. Calling Convention

The vector functions should use calling convention described in
section 3.2 Function Calling Sequence of original AMD64 ABI document.

2.2. Vector Length

Every vector variant of a SIMD-enabled function has a vector length (VLEN).
If OpenMP clause "simdlen" is used, the VLEN is the value of the
argument of that clause. The VLEN value must be power of 2.
In other case the notion of the function`s "characteristic data type"
(CDT) is used to compute the vector length.
CDT is defined in the following order:
a) For non-void function, the CDT is the return type.
b) If the function has any non-uniform, non-linear parameters,
then the CDT is the type of the first such parameter.
c) If the CDT determined by a) or b) above is struct, union, or
class type which is pass-by-value (except for the type that maps to
the built-in complex data type), the characteristic data type is int.
d) If none of the above three cases is applicable, the CDT is int.
e) For Intel(R) Xeon(TM) Phi(TM) native and offload compilation,
if the resulting characteristic data type is 8-bit or 16-bit integer
data type, the characteristic data type is int.
The VLEN is then determined based on the CDT and the size of vector
register of that ISA for which current vector version is generated.
The VLEN is computed using the formula below:

VLEN = sizeof(vector_register) / sizeof(CDT),
where vector register size specified in section 3.2.1 Registers and
the Stack Frame of original AMD64 ABI document.

For example, if ISA is SSE and CDT of the function is "int", the VLEN is 4.

2.3. Element Data Type to Vector Data Type Mapping

The vector data types for parameters are selected depending on ISA,
vector length, data type of original parameter, and parameter
specification.
For uniform and linear parameters (detailed description could be found
in [1]), the original data type is preserved.
For vector parameters, vector data types are selected by the compiler.
The mapping from element data type to vector data type is described as
below.
* The bit size of vector data type of parameter is computed as:
size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8
For instance, for SSE version of vector function with parameter data type "int":
If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which
means one argument of type __m128 to be passed.
* If the size_of_vector_data_type is greater than the width of the
vector register, multiple vector registers are selected and the
parameter will be passed in multiple vector registers.
For instance, for SSE version of vector function with parameter data
type "int": If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256
(bits), so the vector data type is __m256, which means 2 arguments of
type __m128 are to be passed.

2.4. Ordering of Vector Arguments

When a parameter in the original data type results in one argument in
the vector function, the ordering rule is a simple one to one match
with the original argument order.
For example, when the original argument list is (int a, float b, int
c), VLEN is 4, the ISA is SSE, and all a, b, and c are classified
vector parameters, the vector function argument list becomes (__m128
vec_a, __m128 vec_b, __m128 vec_c).
There are cases where a single parameter in the original data type
results in the multiple arguments in the vector function. Those
addition second and subsequent arguments are inserted in the argument
list right after the corresponding first argument, not appended to the
end of the argument list of the vector function. For example, the
original argument list is (int a, float b, int c), VLEN is 8, the ISA
is SSE, and all a, b, and c are classified as vector parameters, the
vector function argument list becomes (__m128 vec_a1, __m128 vec_a2,
__m128 vec_b1, __m128 vec_b2, __m128 vec_c1, __m128 vec_c2).

2.5. Masking of Vector Function

Masked vector function variant used for invocation in conditional
statement (please refer to [1] for detailed information) additionally
takes an implicit mask argument, which disables processing of some of
the vector lanes. For masked vector functions, the additional "mask"
parameters are required.
Each element of "mask" parameters has the data type of the CDT (see
Section 2.2). The number of mask parameters is the same as number of
parameters required to pass the vector of CDT for the given vector
length. The value of a mask parameter must be either bit patterns of
all ones or all zeros for each element.
For the MIC target, the mask parameters are collection of 1-bit masks
in unsigned integers. The total number of mask bits is equal to VLEN.
The number of mask parameters is equal to the number of parameters for
the vector of characteristic data type. The mask bits occupy the least
significant bits of unsigned integer. For example, if the
characteristic data type is double and VLEN is 16, there are 16 mask
bits stored in two unsigned integers.
For each element of the vector, if the corresponding mask value is
zero, the return value associated to that element is zero. Mask
parameters are passed after all other parameters in the same order of
parameters that they are apply to.

2.6. Vector Function Name Mangling

The name mangling of the generated vector function based on vector
annotation is important part of Vector ABI. It allows the caller and
the callee functions to be compiled in separate files or compilation
units. Using the function prototype in header files to communicate
vector function annotation information, the compiler can perform
function matching while vectorizing code at call sites.

The vector function name is mangled as the concatenation of the following items:

<vector_prefix> <isa> <mask> <vlen> <vparameters> '_' <original_name>

The descriptions of each item are:
* <vector_prefix>
string "_ZGV"

* <original_name>
name of scalar function, including C++ and Fortran mangling

* <isa>
'b' // SSE
| 'c' // AVX
| 'd' // AVX2
| 'e' // MIC

* <mask>
'M' // masked version
| 'N' // unmasked version

* <vlen>
decimal-number

* <vparameters>
/* empty */
<vparameter> <opt-align> <vparameters>
o <vparameter>
(please refer to [1] for information about parameter types used below)
's' decimal-number // linear parameter, variable stride ,
// decimal number is the position # of
// stride argument, which starts from 0
| 'l' <number> // linear parameter, constant stride
| 'u' // uniform parameter
| 'v' // vector parameter
o <number>
[n] non-negative decimal integer // n indicates negative
o <opt-align>
/* empty */
| 'a' non-negative-decimal-number

Please refer to section 2.7 Compiler generated variants of vector
function for examples of vector function mangling.

2.7. Compiler generated variants of vector function

Compiler's architecture selection flag has no impact on ISA selection
for the generated vector variants.
Vector variants should be generated by compiler for SSE, AVX, AVX2
ISAs, both masked and unmasked versions for each ISA (if one of them
is not specified with according clause).
Compiler implementations must not generate calls to version of other
ISAs unless some non-standard pragma or clause is used to declare
those other versions are available.

Example 1.
#pragma omp declare simd uniform(q) aligned(q:16) linear(k:1)
float foo(float *q, float x, int k)
{
q[k] = q[k] + x;
return q[k];
}

List of generated function names (or list of symbols provided by
library with the same pragma in "foo" prototype):

1) _ZGVbN4ua16vl_foo (SSE ISA, unmasked version)
2) _ZGVbM4ua16vl_foo (SSE ISA, masked version)
3) _ZGVcN8ua16vl_foo (AVX ISA, unmasked version)
4) _ZGVcM8ua16vl_foo (AVX ISA, masked version)
5) _ZGVdN8ua16vl_foo (AVX2 ISA, unmasked version)
6) _ZGVdM8ua16vl_foo (AVX2 ISA, masked version)

Where the "foo" is the original mangled function name, "_ZGV" is the
prefix of the vector function name, "b" indicates the SSE ISA, "c"
indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that
this is a unmasked version, "M" indicates that this is a masked
version, "4" is the vector length for SSE ISA, "8" is the vector
length for AVX and AVX2 ISA, "ua16" indicates uniform(q) and
align(a:32), "v" indicates second argument x is vector argument, "l"
indicates linear(k:1) - k is a linear variable whose stride is 1.

Example 2.
#pragma omp declare simd notinbranch
double foo(double x)
{
return x*x;
}

List of generated function names (or list of symbols provided by
library with the same pragma in "foo" prototype):

1) _ZGVbN2v_foo (SSE ISA)
2) _ZGVcN4v_foo (AVX ISA)
3) _ZGVdN4v_foo (AVX2 ISA)

Where the "foo" is the original mangled function name, "_ZGV" is the
prefix of the vector function name, "b" indicates the SSE ISA, "c"
indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that
this is a unmasked version, "2" is the vector length for SSE ISA, "4"
is the vector length for AVX and AVX2 ISA, "v" indicates single
argument x is vector argument.

3. References

[1] OpenMP 4.0 Specification
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf

==============================


--
Andrew

Andrew Senkevich

unread,
Apr 17, 2015, 9:55:59 AM4/17/15
to x86-6...@googlegroups.com, Joseph S. Myers, Jakub Jelinek, Xinmin Tian, Zamyatin, Igor, Ostanevich Sergey, Enkovich Ilya, Bataev Alexey
Hi,

here is the patch updated after discussion with Xinmin.

=====================================================
Below we mean under SSE ISA all of SSE2/SSE3/SSSE3/SSE4 ISAs.
vector parameters, the vector function argument list becomes (__m128i
vec_a, __m128 vec_b, __m128i vec_c).
There are cases where a single parameter in the original data type
results in the multiple arguments in the vector function. Those
addition second and subsequent arguments are inserted in the argument
list right after the corresponding first argument, not appended to the
end of the argument list of the vector function. For example, the
original argument list is (int a, float b, int c), VLEN is 8, the ISA
is SSE, and all a, b, and c are classified as vector parameters, the
vector function argument list becomes (__m128i vec_a1, __m128i vec_a2,
__m128 vec_b1, __m128 vec_b2, __m128i vec_c1, __m128i vec_c2).
Vector variants should be generated by compiler for SSE, AVX, AVX2,
AVX512 ISAs, both masked and unmasked versions for each ISA (if one of
them is not specified with according clause).
Compiler implementations must not generate calls to version of other
ISAs unless some non-standard pragma or clause is used to declare
those other versions are available.

Example 1.
#pragma omp declare simd uniform(q) aligned(q:16) linear(k:1)
float foo(float *q, float x, int k)
{
q[k] = q[k] + x;
return q[k];
}

Below is the list of generated function names or list of symbols
provided by library with the same pragma in "foo" prototype.
AVX512 version provided by library optionally, if library was built
with AVX512 ISA support.

1) _ZGVbN4ua16vl_foo (SSE ISA, unmasked version)
2) _ZGVbM4ua16vl_foo (SSE ISA, masked version)
3) _ZGVcN8ua16vl_foo (AVX ISA, unmasked version)
4) _ZGVcM8ua16vl_foo (AVX ISA, masked version)
5) _ZGVdN8ua16vl_foo (AVX2 ISA, unmasked version)
6) _ZGVdM8ua16vl_foo (AVX2 ISA, masked version)
7) _ZGVeN16ua16vl_foo (AVX512 ISA, unmasked version)
8) _ZGVeM16ua16vl_foo (AVX512 ISA, masked version)

Where the "foo" is the original mangled function name, "_ZGV" is the
prefix of the vector function name, "b" indicates the SSE ISA, "c"
indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that
this is a unmasked version, "M" indicates that this is a masked
version, "4" is the vector length for SSE ISA, "8" is the vector
length for AVX and AVX2 ISA, "ua16" indicates uniform(q) and
align(a:32), "v" indicates second argument x is vector argument, "l"
indicates linear(k:1) - k is a linear variable whose stride is 1.

Example 2.
#pragma omp declare simd notinbranch
double foo(double x)
{
return x*x;
}

Below is the list of generated function names or list of symbols
provided by library with the same pragma in "foo" prototype.
AVX512 version provided by library optionally, if library was built
with AVX512 ISA support.

1) _ZGVbN2v_foo (SSE ISA)
2) _ZGVcN4v_foo (AVX ISA)
3) _ZGVdN4v_foo (AVX2 ISA)
3) _ZGVeN8v_foo (AVX512 ISA)

Where the "foo" is the original mangled function name, "_ZGV" is the
prefix of the vector function name, "b" indicates the SSE ISA, "c"
indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that
this is a unmasked version, "2" is the vector length for SSE ISA, "4"
is the vector length for AVX and AVX2 ISA, "v" indicates single
argument x is vector argument.

3. References

[1] OpenMP 4.0 Specification
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf

[2] Intel Vector ABI
https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf

=====================================================

Is it OK for trunk?



WBR,
Andrew

H.J. Lu

unread,
Apr 17, 2015, 10:46:29 AM4/17/15
to Andrew Senkevich, x86-6...@googlegroups.com, Joseph S. Myers, Jakub Jelinek, Xinmin Tian, Zamyatin, Igor, Ostanevich Sergey, Enkovich Ilya, Bataev Alexey
On Fri, Apr 17, 2015 at 6:55 AM, Andrew Senkevich
<andrew.n....@gmail.com> wrote:
> Hi,
>
> here is the patch updated after discussion with Xinmin.
>
> =====================================================
>
> Vector Function Application Binary Interface Specification for OpenMP
>
> 1. Vector Function ABI Overview
>
>
> 3. References
>
> [1] OpenMP 4.0 Specification
> http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf
>
> [2] Intel Vector ABI
> https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf
>
> =====================================================
>
> Is it OK for trunk?
>

x86-64 psABI covers low level information, like calling
convention, relocation, exception handling,...

Vector ABI looks more look like vector API, like libm
interface. The same vector API can apply to different
psABIs, like Linux, Windows, MacOS, ...


--
H.J.

Joseph Myers

unread,
Apr 23, 2015, 2:30:39 PM4/23/15
to Andrew Senkevich, x86-6...@googlegroups.com, Jakub Jelinek, Xinmin Tian, Zamyatin, Igor, Ostanevich Sergey, Enkovich Ilya, Bataev Alexey
On Fri, 17 Apr 2015, Andrew Senkevich wrote:

> AVX512 version provided by library optionally, if library was built
> with AVX512 ISA support.

This is not a useful statement for an ABI document. The ABI needs to
specify what a library user can rely on, based on the header contents with
the declaration of the function with vector versions, not anything that a
library might or might not provide but that can't be determined based on
header contents.

--
Joseph S. Myers
jos...@codesourcery.com

Andrew Senkevich

unread,
Apr 23, 2015, 4:23:34 PM4/23/15
to Joseph Myers, x86-6...@googlegroups.com, Jakub Jelinek, Xinmin Tian, Zamyatin, Igor, Ostanevich Sergey, Enkovich Ilya, Bataev Alexey
>> AVX512 version provided by library optionally, if library was built
>> with AVX512 ISA support.
>
> This is not a useful statement for an ABI document. The ABI needs to
> specify what a library user can rely on, based on the header contents with
> the declaration of the function with vector versions, not anything that a
> library might or might not provide but that can't be determined based on
> header contents.

Ok, I will remove it.
AVX512 versions will be provided unconditionally, as wrapper to AVX2
implementation if built with no AVX512 support.


--
WBR,
Andrew

Andrew Senkevich

unread,
Apr 23, 2015, 10:30:12 PM4/23/15
to Joseph Myers, Xinmin Tian, Jakub Jelinek, Bataev Alexey, Ostanevich Sergey, Enkovich Ilya, x86-6...@googlegroups.com, Zamyatin, Igor

But implementation as wrapper to AVX2 version also required build supporting AVX512. It require update of minimal required for Glibc build version of gcc to 4.9 and binutils to 2.24 and it is hard to get community approve for such change I think.
Another point is GCC 5.1 and below doesn't generate AVX512 versions.
So we need to return to previous state without AVX512 version in the list of generated functions.
And for AVX512 versions we need to have some additional statement in declaration.

--
Andrew

Andrew Senkevich

unread,
Apr 24, 2015, 10:51:16 AM4/24/15
to Jakub Jelinek, Joseph Myers, Xinmin Tian, Bataev Alexey, Ostanevich Sergey, Enkovich Ilya, x86-6...@googlegroups.com, Zamyatin, Igor
Jakub,

thank you for you input. Having AVX512 version as wrapper to AVX2
emitting AVX512F instructions using .byte we can include AVX512
version to the list of functions provided by library unconditionally.

Here is updated version.
| 'e' // AVX512
1) _ZGVbN4ua16vl_foo (SSE ISA, unmasked version)
2) _ZGVbM4ua16vl_foo (SSE ISA, masked version)
3) _ZGVcN8ua16vl_foo (AVX ISA, unmasked version)
4) _ZGVcM8ua16vl_foo (AVX ISA, masked version)
5) _ZGVdN8ua16vl_foo (AVX2 ISA, unmasked version)
6) _ZGVdM8ua16vl_foo (AVX2 ISA, masked version)
7) _ZGVeN16ua16vl_foo (AVX512 ISA, unmasked version)
8) _ZGVeM16ua16vl_foo (AVX512 ISA, masked version)

Where the "foo" is the original mangled function name, "_ZGV" is the
prefix of the vector function name, "b" indicates the SSE ISA, "c"
indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that
this is a unmasked version, "M" indicates that this is a masked
version, "4" is the vector length for SSE ISA, "8" is the vector
length for AVX and AVX2 ISA, "ua16" indicates uniform(q) and
align(a:32), "v" indicates second argument x is vector argument, "l"
indicates linear(k:1) - k is a linear variable whose stride is 1.

Example 2.
#pragma omp declare simd notinbranch
double foo(double x)
{
return x*x;
}

Below is the list of generated function names or list of symbols
provided by library with the same pragma in "foo" prototype.

1) _ZGVbN2v_foo (SSE ISA)
2) _ZGVcN4v_foo (AVX ISA)
3) _ZGVdN4v_foo (AVX2 ISA)
3) _ZGVeN8v_foo (AVX512 ISA)

Where the "foo" is the original mangled function name, "_ZGV" is the
prefix of the vector function name, "b" indicates the SSE ISA, "c"
indicates the AVX ISA, "d" indicates the AVX2 ISA, "N" indicates that
this is a unmasked version, "2" is the vector length for SSE ISA, "4"
is the vector length for AVX and AVX2 ISA, "v" indicates single
argument x is vector argument.

3. References

[1] OpenMP 4.0 Specification
http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf

[2] Intel Vector ABI
https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf

===============================================================================


--
WBR,
Andrew
Reply all
Reply to author
Forward
0 new messages