This is a bit of a strange one, and we're not exactly sure where to start with the issue. I'll try to relay the information which I think is relevant.
We're trying to create a container which runs a specific piece of software (CodeAster) in parallel using MPI.
When installed locally (not in a container) it's called as follows:
</PATH/TO/EXECUTABLE> </PATH/TO/RUN/PARAMETERS>
That's it, the run parameters file contains all the info about mem allocation and number of processors etc. At no point do we need to call mpirun.
We've developed a singularity container on our local VM to test. This is how we run it:
singularity exec --contain --bind /path/to/export/file/:/mnt,/home/username/flasheur /home/username/code_aster_latest.sif /home/aster/aster/bin/as_run /mnt/export
This seems to work ok in parallel (single node, multiple cores within a VM). Notice, no use of 'mpirun'.
When we move this across to our HPC system (which is using slurm as a scheduler), this command doesn't work. We've also tested the same command with 'mpirun' at the start, and many other variations on this. Nothing so far works.
To eliminate the possibility of mpi/singularity/other not being set up properly on the cluster we've created an mpi_hello_world container to test (actually two, one each with a python and C script). Both of these work, but both scripts would require calling mpirun when running locally. So we just stick mpirun on the start of the singularity command and it works.
So it seems to be an issue with the way the executable is expecting to be called without mpirun.
Any suggestions gratefully appreciated.
Thanks