In
order to get portability of Singularity images containing OpenMPI
distributed applications, he suggested mix some OpenMPI versions with
some external PMIX to check about the interoperability across versions
while using of the Singularity MPI hybrid approach (see his response in
the thread).
I did some experiments and I would like to share with you my results and to discuss about the conclusions.
First of all, I'm going to describe the environment (see attached scripts).
- I performed this test at CESGA FinisTerrae II cluster (https://www.cesga.es/en/infraestructuras/computacion/FinisTerrae2).
- The compiler used is GCC/6.3.0 and I had to compile some external dependencies to be linked from PMIX or OpenMPI:
- hwloc/1.11.5
- libevent/2.0.22
- PMIX versions used in this experiments:
- I configure PMIX with the following options:
- ./configure --with-hwloc= --with-munge-libdir= --with-platform=optimized --with-libevent=
- OpenMPI versions used in this experiments:
- I configure OpenMPI (both, container and host) with the following options:
- ./configure
--with-hwloc= --enable-shared --with-slurm --enable-mpi-thread-multiple
--with-verbs-libdir= --enable-mpirun-prefix-by-default --disable-dlopen --with-pmix= --with-libevent= --with-knem
- Version 2.1.1 was compiled with flag --disable-pmix-dstore
- I used the well known "Ring" OpenMPI application.
- I used MPIRUN as process manager
What
I expected from previous Ralph response was full cross-version
compatibility using any OpenMPI >= 2.0.0 linked against PMIX 1.2.X
both, inside the container and at the host.
In general, the obtained results were not as good as expected, but promising.
- The
worst thing is, my results show that OpenMPI 2.X versions needs exactly
the same version of OpenMPI inside & outside the container, but I
can mix PMIx 1.2.1, 1.2.2 and 1.2.3
- The better thing, if OpenMPI
3.0.0_rc3 version is present inside or outside the container, seems to
work mixing any other OpenMPI >= 2.X version and also mixing PMIx
1.2.1, 1.2.2 and 1.2.3. Some notes* to this result:
- OpenMPI 2.0.0 with PMIx 1.2.2 and 1.2.3 (In&Out the container) never worked.
- After getting the expected output from "Ring" app, I randomly get SEGFAULT if OpenMPI 3.0.0.rcX is involved.
- As Ralph said, PMIx 1.2X and 2.0.X are not interoperable.
- I was not able to compile OpenMPI 2.1.0 with external PMIx
I
can conclude that PMIx 1.2.1, 1.2.2 and 1.2.3 are interoperable, but only
OpenMPI 3.0.0_rc3 can work*, in general, with other versions of OpenMPI
(>2).
Going back again to Ralph Castain mail to this thread, I
would expect full support for interoperability with different PMIx
versions (>1.2) through PMIx > 2.1 (not yet released)
Some questions about this experiments and conclusions are:
- What do you think about this results? Do you have any suggestion? I'm missing something?
- are these results aligned with your expectations?
- I know that PMIx 2.1 is being developed but, any version is already available to check? How can I get it?
- The SEGFAULT I get with OpenMPI 3.0.0.rcX is something already tracked?
BR,
Víctor