MPI on Singularity 3.8 issue

121 views
Skip to first unread message

Andrew Reid

unread,
Apr 7, 2022, 9:21:19 PM4/7/22
to Singularity Community Edition
Hi all --

(Pardon the markdown formatting, this was originally composed for StackOverflow, but I think it's better here.)

I have a requirement to run a fairly complicated MPI-aware executable, whose dependencies are reasonably easily satisfied on Debian 10, but I need to run it on a Debian 9 host system.

I have seen the instructions on the [Singluarity docs][1], and I think I want a variant of the "hybrid" model where the container MPI is provided by Debian packages. but I am having issues.

My general solution is to build a Debian 10 singularity container to grab the dependencies, and run that on the Debian 9 host. The containerized application consists of a bunch of MPI-aware Debian dependencies (python-mpi4py, libhdf5-openmpi, and some others)  and then a build of the actual application from source. This feature of the application makes it hard to build MPI from source in the container, as recommended in the docs -- I would then have to also build all the MPI-aware dependencies, including Python, and the whole motivation for containerization (being able to use Debian 10 packages to satisfy dependencies) goes away.

My proposed scheme is a kind of "reverse hybrid" approach, where I build an OpenMPI on the Debian 9 host that matches the version packaged by Debian 10 (OpenMPI 3.1.3, as it happens), and then do the `mpirun` command on the host using this source-built MPI.

In the end, the scheme looks like this:

`deb9$ /path/to/v3.1.3/mpirun -np <n> /path/to/singularity exec /path/to/container /path/to/container/execuable`

The first naive attempt to do this with a very basic `cpi` MPI example gets some errors:

`[host:pid] PMIX ERROR: NOT-FOUND in file server/pmix_server_ops.c at line 1865
[host:pid] PMIX ERROR: NOT-FOUND in file server/pmix_server_ops.c at line 1865`

A bit of digging revealed that the Debian-10 packaged OpenMPI 3.1.3 has a dependency, this "pmix" library, which is not present on the Debian 9 host system.

OpenMPI 3.1.3 has some config options that looked promising, specifically `--with-pmix=internal` and `--enable-install-pmix`.

I did that on the host, but I am still getting the errors.

If I continue down this path, I guess my next step would be to build the pmix dependency for OpenMPI 3.1.3 on the host.

But, I'm concerned that maybe my whole "reverse hybrid" approach is just wrong-headed and doomed to fail, given that the documentation doesn't seem to support it?

Also the whole business is kind of time-limited, Debian 9 reaches end-of-life in June, so I'm kind of looking for an easy/fast way out until the OS upgrades happen.

The host's OpenMPI 3.1.3 build itself passes basic sanity checks, I can build and run `cpi.c` across multiple ranks with no errors, so either it doesn't need pmix, or it does and has it.

Basic container operations (`/path/to/singularity exec /path/to/container ls /usr/bin`) also seem to work fine.

The Singularity set-up is a locally-installed (i.e. not Debian-packaged) set-up of Singularity CE v3.8.

I have seen related issues on StackOverflow -- question [56298351][2] looks related but is unresolved, and question [65671771][3] is also unresolved, and might be conflated with a directory permissions problem.

  [1]: https://sylabs.io/guides/3.8/user-guide/mpi.html
  [2]: https://stackoverflow.com/questions/56298351/using-mpi-communication-with-containerized-applications

David Trudgian

unread,
Apr 8, 2022, 10:42:02 AM4/8/22
to Singularity Community Edition
Hi Andrew,

I would probably consider matching the host MPI stack to the container MPI stack to still be a 'hybrid' approach. Theoretically there shouldn't really be any difference between matching the container to the host versus the host to the container. Practically though, what you are trying to do - match a custom OpenMPI build to a distro OpenMPI build - is a bit more difficult. Distro MPI builds tend to depend on numerous libraries and may build OpenMPI with a configuration that would not commonly be used if you are building it from source. Additionally as OpenMPI, PMIx etc. have released ever newer versions, compatibility issues have often disappeared. It's generally easier to get newer 4.x versions of OpenMPI talking host<-> container even with some mismatch in how they were built. Debian 10/buster's OpenMPI 3.1.3 is from October 2018 and, unfortunately, doesn't benefit from some of these improvements.

Here's the OpenMPI packaging for Debian buster - https://salsa.debian.org/hpc-team/openmpi/-/tree/debian/buster

If you look in the `debian` directory you'll see that they apply patches, and in the rules file you can see the complex set of options set for `./configure`

For maximum chance of success, what I would suggest is to identify the version of the pmix library that the Debian 10 package builds with, then build that on your host, and build OpenMPI against it.

From  https://packages.debian.org/buster/libopenmpi3 it appears that Debian 10 is using libpmix2 at version 3.1.2

Also there I note that Debian 10's OpenMPI is depending on libfabric. If you don't build OpenMPI with libfabric on the Debian 9 host, then you may find issues after you get to through the pmix issue. Though, these perhaps would be able to be worked around by setting transport options depending on exactly how Debian's OpenMPI is built.

Sorry this isn't a quick answer, but hopefully the pointers are useful.

DT

Andrew Reid

unread,
Apr 8, 2022, 11:54:39 AM4/8/22
to Singularity Community Edition
Thanks, that's useful, mostly in prodding me to move farther down the road I am already on. I hadn't thought of cracking open the .deb for Openmpi 3.1.3, but you're right that that's an excellent source of clues.

I did kind of think I was satisfying the pmix dependency on the host when I selected "--with-pmix=internal" on the host build, so I still don't really understand what's up there, but you've encouraged me to continue investigating host mpi configuration space. Hopefully the dependency chain isn't too long!

David Trudgian

unread,
Apr 8, 2022, 12:08:57 PM4/8/22
to Singularity Community Edition
The internal pmix with OpenMPI 3.1.3 appears to be v2.1.4  - see the commit message at https://github.com/open-mpi/ompi/tree/v3.1.3/opal/mca/pmix

Debian buster is using v3.1.2 https://packages.debian.org/buster/libpmi2-pmix

... so there's quite a gap there, even if compatibility is a goal across pmix versions.. so it's worth ruling out.

Good luck!

Andrew Reid

unread,
Apr 14, 2022, 5:36:23 PM4/14/22
to Singularity Community Edition
Success, or at least, running the cpi example without errors!

I did end up building v3.1.2 of libpmix. This can be done on Debian 9 with the package-provided libevents, and seems to work. Then when building Openmpi 3.1.3, there are a couple of additional glitches -- there are a couple of "#define" directives that were deprecated in v2.x of libpmix, and are gone from v3.x, but the unpatched OpenMPi 3.1.3 source tree I am using still expects them, so some minor surgery is indicated -- I chose to remove the relevant cases from a switch statement in ./opal/mca/pmix/ext2x/ext2x.c, which seems to be an error-reporting switch statement. Also, you have to point OpenMPI to the system-provided libevents directory (which is of course "/usr", it's system-provided, after all) or the config fails.

Then things run, but with fabric errors. These can be cured with MCA parameter settings, so the eventual run is:

> deb9$ /path/to/host/3.1.3/mpirun -mca pmi cm -mca mtl ofi -mca mtl_ofi_provider_include "verbs;ofi_rxma" /path/to/deb10/container.sif /path/to/executable 

This runs without errors, and for the cpi example (yes, I built an entire container just for cpi) it gets the right answer.

I don't really understand all the MCA parameter settings, but I think the upshot is that the default config does not do UCX, and modern OpenMPIs deprecate OpenIB, so the effect of them is to undo the site-specific config and force the data transfers on to the fabric in a UCX-compatible way? I think I found them in an earlier StackOverflow search on a related "new MPI on old OS" issue that did not involve containers.

Anyways, the next step is to test it with the user's actual MPI application, but I think this will; tide us over until the Deb9 systems get upgraded.

Andrew Reid

unread,
Apr 14, 2022, 5:39:00 PM4/14/22
to Singularity Community Edition
Ugh, just in case someone is actually relying on this, I left off the "exec" in my command line -- the thing that works is:

>  deb9$ /path/to/host/3.1.3/mpirun -mca pmi cm -mca mtl ofi -mca mtl_ofi_provider_include "verbs;ofi_rxma" exec /path/to/deb10/container.sif /path/to/executable
   
  -- A.
Reply all
Reply to author
Forward
0 new messages