Hi all --
(Pardon the markdown formatting, this was originally composed for StackOverflow, but I think it's better here.)
I have a requirement to run a fairly complicated MPI-aware executable, whose dependencies are reasonably easily satisfied on Debian 10, but I need to run it on a Debian 9 host system.
I have seen the instructions on the [Singluarity docs][1], and I think I want a variant of the "hybrid" model where the container MPI is provided by Debian packages. but I am having issues.
My general solution is to build a Debian 10 singularity container to grab the dependencies, and run that on the Debian 9 host. The containerized application consists of a bunch of MPI-aware Debian dependencies (python-mpi4py, libhdf5-openmpi, and some others) and then a build of the actual application from source. This feature of the application makes it hard to build MPI from source in the container, as recommended in the docs -- I would then have to also build all the MPI-aware dependencies, including Python, and the whole motivation for containerization (being able to use Debian 10 packages to satisfy dependencies) goes away.
My proposed scheme is a kind of "reverse hybrid" approach, where I build an OpenMPI on the Debian 9 host that matches the version packaged by Debian 10 (OpenMPI 3.1.3, as it happens), and then do the `mpirun` command on the host using this source-built MPI.
In the end, the scheme looks like this:
`deb9$ /path/to/v3.1.3/mpirun -np <n> /path/to/singularity exec /path/to/container /path/to/container/execuable`
The first naive attempt to do this with a very basic `cpi` MPI example gets some errors:
`[host:pid] PMIX ERROR: NOT-FOUND in file server/pmix_server_ops.c at line 1865
[host:pid] PMIX ERROR: NOT-FOUND in file server/pmix_server_ops.c at line 1865`
A bit of digging revealed that the Debian-10 packaged OpenMPI 3.1.3 has a dependency, this "pmix" library, which is not present on the Debian 9 host system.
OpenMPI 3.1.3 has some config options that looked promising, specifically `--with-pmix=internal` and `--enable-install-pmix`.
I did that on the host, but I am still getting the errors.
If I continue down this path, I guess my next step would be to build the pmix dependency for OpenMPI 3.1.3 on the host.
But, I'm concerned that maybe my whole "reverse hybrid" approach is just wrong-headed and doomed to fail, given that the documentation doesn't seem to support it?
Also the whole business is kind of time-limited, Debian 9 reaches end-of-life in June, so I'm kind of looking for an easy/fast way out until the OS upgrades happen.
The host's OpenMPI 3.1.3 build itself passes basic sanity checks, I can build and run `cpi.c` across multiple ranks with no errors, so either it doesn't need pmix, or it does and has it.
Basic container operations (`/path/to/singularity exec /path/to/container ls /usr/bin`) also seem to work fine.
The Singularity set-up is a locally-installed (i.e. not Debian-packaged) set-up of Singularity CE v3.8.
I have seen related issues on StackOverflow -- question [56298351][2] looks related but is unresolved, and question [65671771][3] is also unresolved, and might be conflated with a directory permissions problem.
[1]:
https://sylabs.io/guides/3.8/user-guide/mpi.html [2]:
https://stackoverflow.com/questions/56298351/using-mpi-communication-with-containerized-applications