Hi everyone.
Recently I have installed the OMPI version 5.1.0a1, cloning the master branch.
--------------------------------------------------------------------------------------------------------------------
git clone --recursive -b master
https://github.com/open-mpi/ompi.git--------------------------------------------------------------------------------------------------------------------
I used an internal all configuration (HWLOC, LIBEVENT, PMIX) for testing my previous working codes on my clean installation.
--------------------------------------------------------------------------------------------------------------------
./
autogen.pl./configure --prefix=$HOME/OMPI --disable-man-pages
make all install
make check
make clean
--------------------------------------------------------------------------------------------------------------------
Until here everything is fine.
--------------------------------------------------------------------------------------------------------------------
Open MPI configuration:
-----------------------
Version: 5.1.0a1
Build MPI C bindings: yes
Build MPI Fortran bindings: mpif.h, use mpi, use mpi_f08
Build MPI Java bindings (experimental): no
Build Open SHMEM support: false (no spml)
Debug build: no
Platform file: (none)
Miscellaneous
-----------------------
CUDA support: no
CUDA support: no
Fault Tolerance support: mpi
hwloc: internal
libevent: internal
pmix: internal
prrte: internal
Threading Package: pthreads
Atomics
-----------------------
OMPI: BUILTIN_C11
Transports
-----------------------
Cisco usNIC: no
Cray uGNI (Gemini/Aries): no
Intel Omnipath (PSM2): no
Open UCX: no
OpenFabrics OFI Libfabric: no
Portals4: no
Shared memory/copy in+copy out: yes
Shared memory/Linux CMA: yes
Shared memory/Linux KNEM: no
Shared memory/XPMEM: no
TCP: yes
OMPIO File Systems
-----------------------
DDN Infinite Memory Engine: no
Generic Unix FS: yes
IBM Spectrum Scale/GPFS: no
Lustre: no
PVFS2/OrangeFS: no
--------------------------------------------------------------------------------------------------------------------
If I run my tests without killing processes, all processes finish well, but if I kill one (with SIGKILL), all processes stop showing the message:
--------------------------------------------------------------------------------------------------------------------
[daniel-lap:29408] [grpcomm_bmg_module.c:259] PMIx Error: UNPACK-INADEQUATE-SPACE
--------------------------------------------------------------------------------------------------------------------
So despite having selected fault tolerance, when the process stops, the error is not caught.
Do you know what this error means?
Did I miss something with my installation process?
Should I post this error on the OMPI mailing list?
Thanks a lot for your help.
EXTRA INFO
--------------------------------------------------------------------------
The command line I use to compile is:
mpicc -g -O3 test.c -o test -lm
The command line I use to execute is:
mpiexec --np 4 --machinefile hostfile --mca btl_base_verbose 100 --map-by node:oversubscribe --with-ft mpi --enable-recovery --mca mpi_ft_detector_thread true ./test myArgs
My machine is:
Linux daniel-lap 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
--------------------------------------------------------------------------