Marco,
The compiler may vectorize if generating code optimised for a given platform.
A distro provided Open MPI is likely to be optimised only for "common" architectures (e.g. no AVX512 on x86 - SSE only? - and no SVE on aarch64)
Cheers,
Gilles
To unsubscribe from this group and stop receiving emails from it, send an email to devel+un...@lists.open-mpi.org.
Gilles,
      
      Thank you for your response. I understand that distro-provided
      OpenMPI binaries are typically built for broad compatibility,
      often targeting only baseline instruction sets.  
      
      For x86, this makes sense—if OpenMPI is compiled with a target
      instruction set like `x86-64-v2` (no AVX), the `configure.m4`
      script for the AVX component first attempts to compile AVX code
      directly. If that fails, it retries with the necessary
      vectorization flags (e.g., `-mavx512f`, etc.). If successful,
      these flags are applied, ensuring that vectorized functions are
      included in the AVX component. At runtime, OpenMPI detects CPU
      capabilities (via CPUID) and uses the AVX functions when
      available, even if vectorization wasn’t explicitly enabled by the
      package maintainers - assuming I correctly understood the
      compilation process of the OP components. 
      
      What I find unclear is why the AArch64 component follows a
      different approach. During configuration, it only checks whether
      the compiler can compile NEON or SVE without additional flags. If
      not, the corresponding intrinsic functions are omitted entirely.
      This means that if the distro compilation settings don’t allow
      NEON or SVE, OpenMPI won’t include the optimized functions, and
      processors with these vector units won’t benefit. Conversely, if
      NEON or SVE is allowed, the base OPs will likely be
      auto-vectorized, reducing the performance gap between the base and
      intrinsic implementations.  
      
      Is there a specific reason for this difference in handling SIMD
      support between x86 and AArch64 in OpenMPI or am I wrong about the
      configuration process?
    
Cheers,
Marco
    
Hi George,
Thank you for your response and clarification.
I am working on integrating the same flag-checking mechanism used
      in the AVX component into the AArch64 component. However, I have
      encountered an issue.
      On x86, the GCC compiler provides dedicated command-line switches
      for SIMD instruction sets, such as -mavx (GCC
        x86 Options). These options are independent of the -march
      configuration within the CFLAGS variable, allowing
      the AVX component to append -mavx without modifying
      -march.
      In contrast, for AArch64, there does not appear to be an
      equivalent standalone switch for enabling SVE (GCC
        AArch64 Options). Instead, SVE is enabled by appending +sve
      directly to the -march parameter, unless it is
      already implicitly included (e.g., with armv9-a or
      later).
      To address this, I attempted to modify the -march
      parameter within CFLAGS as follows:
AS_IF([echo "$CFLAGS" | grep -qv --
        '\+sve'],
    [modified_cflags="`echo $CFLAGS |
        sed 's/\(-march=[^ ]*\)/\1+sve/'`"]) 
While this is not an optimal solution, I wanted to explore how
      far this approach would take me. For testing, I appended +sve
      to the end of the -march=armxxx string and verified
      whether the modified flag combination enabled SVE code
      compilation. The configuration process completed successfully, but
      an issue arose during the compilation of OpenMPI.
      In Makefile.am, I integrated the new CFLAGS
      value in the same manner as the AVX component:
liblocal_ops_sve_la_CFLAGS =
      @MCA_BUILD_OP_SVE_FLAGS@
    However, this only adds the contents of @MCA_BUILD_OP_SVE_FLAGS@
      (which includes +sve) to the existing CFLAGS
      variable instead of replacing it. As a result,
      the final CFLAGS contains two -march
      options. Since liblocal_ops_sve_la_CFLAGS is
      prepended to the original CFLAGS, the compiler
      recognizes only the second, unmodified -march value,
      effectively ignoring the appended +sve flag. My
      modifications to the build systems are available in my branch
      (https://github.com/vogma/ompi/tree/sve_op_build_update).
    
Given that this experimental workaround is not functioning as
      intended, and since GCC does not provide dedicated command-line
      options like -msve
      or -msve2, it seems
      unlikely that the AVX-based approach can be replicated for
      AArch64. I am open to testing any ideas one might have and will
      gladly submit a pull request to get any working changes reviewed.
Marco