running singularity containers with mpirun or srun

161 views
Skip to first unread message

Steve Mehlberg

unread,
Jul 18, 2018, 2:55:27 PM7/18/18
to singularity
I keep getting questions that ask if when I do an mpirun -n 4 singularity exec image (or similar srun -n4) am I running in 4 different container instances of the same image or just 1.
How do I answer this question?  I found this:

"I think you are misunderstanding the basic nature of the Singularity “container”. It’s just a file system overlay. So “sharing” a container is no different than running on a node where the procs all see the same file system. Thus, having multiple containers that are identical makes no sense - it’s all the same file system."

Still not sure how to answer the question.  When I use instance.start and then mpirun -n 4 singularity exec instance://image, how is that different from the previous run (exec image)?  I can see that there is a PID linked to the container when I do a singularity instance.list.

Can someone explain to me how it works so I can answer these questions?

Thanks,
-Steve

Priedhorsky, Reid

unread,
Jul 18, 2018, 3:42:41 PM7/18/18
to singu...@lbl.gov

"I think you are misunderstanding the basic nature of the Singularity “container”. It’s just a file system overlay. So “sharing” a container is no different than running on a node where the procs all see the same file system. Thus, having multiple containers that are identical makes no sense - it’s all the same file system.”

If your app uses “cross-memory attach” (CMA) to communicate between processes, and the containers are in user namespaces (I forget what’s the default with Singularity, but it can use them), then this paragraph is not complete.

OpenMPI does this for more than one message transport, by default in 2.0+ IIRC. I’m sure it crops up elsewhere too.

CMA is the system calls process_vm_readv(2) and/or process_vm_writev(2), used to transfer memory directly between processes. This can save a copy vs. using POSIX or SysV shared memory segments. The problem is that these system calls are not permitted between processes in different user namespaces, even if they have the same file system tree underneath.

Here’s a somewhat longer explanation, written for Charliecloud but applicable for any container implementation: https://hpc.github.io/charliecloud/faq.html#communication-between-ranks-on-the-same-node-fails

HTH,
Reid
Reply all
Reply to author
Forward
0 new messages