OpenMPI/PMIx interoperable versions

430 views
Skip to first unread message

Victor

unread,
Aug 28, 2017, 2:55:48 AM8/28/17
to pmix
Dear PMIx team,

this is an already open thread in Singularity and OpenMPI mailing list related with a previous mail of Ralph Castain to this thread:

https://groups.google.com/a/lbl.gov/forum/#!topic/singularity/lQ6sWCWhIWY

In order to get portability of Singularity images containing OpenMPI distributed applications, he suggested mix some OpenMPI versions with some external PMIX to check about the interoperability across versions while using of the Singularity MPI hybrid approach (see his response in the thread).

I did some experiments and I would like to share with you my results and to discuss about the conclusions.

First of all, I'm going to describe the environment (see attached scripts).
  • I performed this test at CESGA FinisTerrae II cluster (https://www.cesga.es/en/infraestructuras/computacion/FinisTerrae2).
  • The compiler used is GCC/6.3.0 and I had to compile some external dependencies to be linked from PMIX or OpenMPI:
    • hwloc/1.11.5
    • libevent/2.0.22
  • PMIX versions used in this experiments:
    • 1.2.1
    • 1.2.2
    • 1.2.3
    • 2.0.0
  • I configure PMIX with the following options:
    • ./configure --with-hwloc= --with-munge-libdir= --with-platform=optimized --with-libevent=
  • OpenMPI versions used in this experiments:
    • 2.0.X
    • 2.1.1
    • 3.0.0_rcX
  • I configure OpenMPI (both, container and host) with the following options:
    • ./configure --with-hwloc= --enable-shared --with-slurm --enable-mpi-thread-multiple --with-verbs-libdir= --enable-mpirun-prefix-by-default --disable-dlopen --with-pmix= --with-libevent= --with-knem
    • Version 2.1.1 was compiled with flag --disable-pmix-dstore
  • I used the well known "Ring" OpenMPI application.
  • I used MPIRUN as process manager

What I expected from previous Ralph response was full cross-version compatibility using any OpenMPI >= 2.0.0  linked against PMIX 1.2.X both, inside the container and at the host.


In general, the obtained results were not as good as expected, but promising.
  • The worst thing is, my results show that OpenMPI 2.X versions needs exactly the same version of OpenMPI inside & outside the container, but I can mix PMIx 1.2.1, 1.2.2 and 1.2.3
  • The better thing, if OpenMPI 3.0.0_rc3 version is present inside or outside the container,  seems to work mixing any other OpenMPI >= 2.X version and also mixing PMIx 1.2.1, 1.2.2 and 1.2.3. Some notes* to this result:
    • OpenMPI 2.0.0 with PMIx 1.2.2 and 1.2.3 (In&Out the container) never worked.
    • After getting the expected output from "Ring" app, I randomly get SEGFAULT if OpenMPI 3.0.0.rcX is involved.
  • As Ralph said, PMIx 1.2X and 2.0.X are not interoperable.
  • I was not able to compile OpenMPI 2.1.0 with external PMIx

I can conclude that PMIx 1.2.1, 1.2.2 and 1.2.3 are interoperable, but only OpenMPI 3.0.0_rc3 can work*, in general, with other versions of OpenMPI (>2).

Going back again to Ralph Castain mail to this thread, I would expect full support for interoperability with different PMIx versions (>1.2) through PMIx > 2.1 (not yet released)

Some questions about this experiments and conclusions are:

  • What do you think about this results?  Do you have any suggestion? I'm missing something?
  • are these results aligned with your expectations?
  • I know that PMIx 2.1 is being developed but, any version is already available to check? How can I get it?
  • The SEGFAULT I get with  OpenMPI 3.0.0.rcX is something already tracked?

BR,

Víctor
container_bootstrap.def
host_install.sh

r...@open-mpi.org

unread,
Aug 28, 2017, 9:50:58 AM8/28/17
to pmix
Hi Victor

I don’t see anything surprising in your results. We should be interoperable within a major release series, but all bets are off beyond that point pending release of cross-version support. I’m not sure why the OpenMPI 2.x series isn’t compatible between 2.0 and 2.1, but that is likely to be something in the OMPI integration code and not in PMIx as all OMPI 2.x releases are based on PMIx v1.2.2 (the upcoming OMPI 2.1.2 revved up to PMIx v1.2.3, but that’s just a bug fix release). I know the OMPI crew doesn’t check for cross-operation - probably something they should start doing if they plan to support that model (I don’t think it is something they’ve ever discussed, but I’ll raise it to them).

We are still working on the cross-version support for PMIx v2.1. Current target is to release it sometime in Sept, but the actual date will be driven by completion of the cross-version work. It’s still too early for someone else to start testing it, I’m afraid, but I will post something to this list when it gets to that point.

Ralph

--
You received this message because you are subscribed to the Google Groups "pmix" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmix+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
Visit this group at https://groups.google.com/group/pmix.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmix/9e5fb034-e672-4564-a0f0-147a3142a499%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<container_bootstrap.def><host_install.sh>

Victor

unread,
Aug 29, 2017, 8:59:24 AM8/29/17
to pmix, r...@open-mpi.org
HI Ralph,

thanks for the quick response.

I'm waiting for any response from OpenMPI side about the issues related with them.

I'm excited about the launching of PMIx 2.1 release. I will be aware and will share my experience related with interoperability in this thread.

Another question related with Singularity hybrid approach and OpenMPI/PMIx interoperability... Will be PMIX 2.1 interoperable with PMI-1 and/or PMI-2?

Thanks and congrats for your great work!

BR,
Víctor.

r...@open-mpi.org

unread,
Aug 29, 2017, 6:03:01 PM8/29/17
to pm...@googlegroups.com
Hi Victor

Spoke with the OMPI folks today, and there is interest in working towards interoperability (at least within a major release). I doubt they’ll be able to go back to the OMPI 2.x series and do anything there, but they are about to release 3.0, and so perhaps that can be made to work going forward. We will be working with them to better understand the issues.

PMIx has supported PMI-1 and 2 since the first release, but maybe not in the way you mean. Remember that PMI was never a standardized library - MPICH had one implementation, SLURM another, etc. None of them were compatible with each other except at the API level.

So if you have something in a container that is linked against somebody’s PMI library, then you are stuck - you’d have to be using the exact same library on the outside. If, however, you replace their PMI library with PMIx in the container, then we will “translate” the PMI-1/2 calls into PMIx and be able to support PMIx-based systems outside the container.

HTH
Ralph


Victor

unread,
Aug 30, 2017, 4:56:04 AM8/30/17
to pmix, r...@open-mpi.org
Hi Ralph,

thank you for the explanation. You are helping me and I'm learning a lot!

Ok, I understand that PMIx is able to understand PMI-1/2, but the opposite is not true. This means that host+slurm+PMI-1/2 (through srun) is able to manage processes while launching a Singularity container+OpenMPI+PMIx.

I perform exactly this test and it doesn't work. Only when using OpenMPI 3.0.0_rc3 seems to return an understandable message instead an "ompi_mpi_init: ompi_rte_init failed" (with OpenMPI 2.X).

To specify in detail the versions of the involved variables:
  - Host Slurm version: slurm 14.11.10-Bull.1.0 (PMIx not supported)
  - Host PMI v2
  - Container OpenMPI v3.0.0_rc3
  - Container PMIx

The message I get while running with srun in this context is:

--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
...

Do you think it is an issue from OpenMPI side?

BR,
Víctor.

r...@open-mpi.org

unread,
Aug 30, 2017, 10:30:00 AM8/30/17
to pmix
You have to configure OMPI to use the Slurm PMI libraries - it won’t do so by default. You also then need to include those PMI libraries in your container.


Reply all
Reply to author
Forward
0 new messages