As I recall I think OpenMPI needs a list that has an entry on each line, rather than one seperated by a space. See:
[root@holy7c26401 ~]# echo $SLURM_JOB_NODELIST
holy7c[26401-26405]
[root@holy7c26401 ~]# scontrol show hostnames $SLURM_JOB_NODELIST
holy7c26401
holy7c26402
holy7c26403
holy7c26404
holy7c26405
[root@holy7c26401 ~]# list=$(scontrol show hostname
$SLURM_NODELIST)
[root@holy7c26401 ~]# echo $list
holy7c26401 holy7c26402 holy7c26403 holy7c26404 holy7c26405
The first would be fine for OpenMPI (though usually you also need to have slots=numranks for each entry, where numranks is equal to the number of ranks per host you are trying to set up). The second I don't think would be interpreted properly. So you will need to make sure that things are passed in a manner that it can read. I usually just have it dump to file and then read in that file rather than holding it as a environmental variable.
-Paul Edmon-
Normally MPI will just pick up the host list from Slurm itself.
You just need to build MPI against Slurm and it will just grab it.
Typically this is transparent to the user. Normally you shouldn't
need to pass a host list at all. See:
https://slurm.schedmd.com/mpi_guide.html
The canonical way to do it if you need to would be the scontrol
show hostnames command against the $SLURM_JOB_NODELIST
(https://slurm.schedmd.com/scontrol.html#OPT_hostnames). That will
give you the list of hosts your job is set to run on.
-Paul Edmon-
Certainly a strange setup. I would probably talk with who ever is
providing MPI for you and ask them to build it against Slurm
properly. As in order to get correct process binding you
definitely want to have it integrated properly with slurm either
via PMI2 or PMIx. If you just use the bare hostlist, your ranks
may not end up properly bound to the specific cores they are
supposed to be allocated. So definitely proceed with caution and
validate your ranks are being laid out properly, as you will be
relying on mpirun/mpiexec to bootstrap rather than the scheduler.
-Paul Edmon-
Ah, that's even more fun. I know with Singularity you can launch MPI applications by calling MPI outside of the container and then having it link to the internal version: https://docs.sylabs.io/guides/3.3/user-guide/mpi.html Not sure about docker though.
-Paul Edmon-