Marco,
The compiler may vectorize if generating code optimised for a given platform.
A distro provided Open MPI is likely to be optimised only for "common" architectures (e.g. no AVX512 on x86 - SSE only? - and no SVE on aarch64)
Cheers,
Gilles
To unsubscribe from this group and stop receiving emails from it, send an email to devel+un...@lists.open-mpi.org.
Gilles,
Thank you for your response. I understand that distro-provided
OpenMPI binaries are typically built for broad compatibility,
often targeting only baseline instruction sets.
For x86, this makes sense—if OpenMPI is compiled with a target
instruction set like `x86-64-v2` (no AVX), the `configure.m4`
script for the AVX component first attempts to compile AVX code
directly. If that fails, it retries with the necessary
vectorization flags (e.g., `-mavx512f`, etc.). If successful,
these flags are applied, ensuring that vectorized functions are
included in the AVX component. At runtime, OpenMPI detects CPU
capabilities (via CPUID) and uses the AVX functions when
available, even if vectorization wasn’t explicitly enabled by the
package maintainers - assuming I correctly understood the
compilation process of the OP components.
What I find unclear is why the AArch64 component follows a
different approach. During configuration, it only checks whether
the compiler can compile NEON or SVE without additional flags. If
not, the corresponding intrinsic functions are omitted entirely.
This means that if the distro compilation settings don’t allow
NEON or SVE, OpenMPI won’t include the optimized functions, and
processors with these vector units won’t benefit. Conversely, if
NEON or SVE is allowed, the base OPs will likely be
auto-vectorized, reducing the performance gap between the base and
intrinsic implementations.
Is there a specific reason for this difference in handling SIMD
support between x86 and AArch64 in OpenMPI or am I wrong about the
configuration process?
Cheers,
Marco
Hi George,
Thank you for your response and clarification.
I am working on integrating the same flag-checking mechanism used
in the AVX component into the AArch64 component. However, I have
encountered an issue.
On x86, the GCC compiler provides dedicated command-line switches
for SIMD instruction sets, such as -mavx
(GCC
x86 Options). These options are independent of the -march
configuration within the CFLAGS
variable, allowing
the AVX component to append -mavx
without modifying
-march
.
In contrast, for AArch64, there does not appear to be an
equivalent standalone switch for enabling SVE (GCC
AArch64 Options). Instead, SVE is enabled by appending +sve
directly to the -march
parameter, unless it is
already implicitly included (e.g., with armv9-a
or
later).
To address this, I attempted to modify the -march
parameter within CFLAGS
as follows:
AS_IF([echo "$CFLAGS" | grep -qv --
'\+sve'],
[modified_cflags="`echo $CFLAGS |
sed 's/\(-march=[^ ]*\)/\1+sve/'`"])
While this is not an optimal solution, I wanted to explore how
far this approach would take me. For testing, I appended +sve
to the end of the -march=armxxx
string and verified
whether the modified flag combination enabled SVE code
compilation. The configuration process completed successfully, but
an issue arose during the compilation of OpenMPI.
In Makefile.am
, I integrated the new CFLAGS
value in the same manner as the AVX component:
liblocal_ops_sve_la_CFLAGS =
@MCA_BUILD_OP_SVE_FLAGS@
However, this only adds the contents of @MCA_BUILD_OP_SVE_FLAGS@
(which includes +sve
) to the existing CFLAGS
variable instead of replacing it. As a result,
the final CFLAGS
contains two -march
options. Since liblocal_ops_sve_la_CFLAGS
is
prepended to the original CFLAGS
, the compiler
recognizes only the second, unmodified -march
value,
effectively ignoring the appended +sve
flag. My
modifications to the build systems are available in my branch
(https://github.com/vogma/ompi/tree/sve_op_build_update).
Given that this experimental workaround is not functioning as
intended, and since GCC does not provide dedicated command-line
options like -msve
or -msve2
, it seems
unlikely that the AVX-based approach can be replicated for
AArch64. I am open to testing any ideas one might have and will
gladly submit a pull request to get any working changes reviewed.
Marco