This might seem a strange usage case but bear with me. Our HPC cluster uses containerised compute nodes to suballocate the compute resources. These virtual nodes are assigned CPU and memory limits. The base Docker image contains an SSSD setup that binds to our AD so users run their jobs with their own credentials. The virtual nodes are in a Slurm cluster. All of this has worked well for us.
Recently, there was interest in running containerised workloads. I can easily spawn the workflow containers from the virtual nodes on the host Docker engine with the same resource limits (and since these are running as children of the worker node containers it usefully dovetails with Slurm's view of things) but, naturally, all the workflow file access would be as root which is unworkable. I therefore thought of running the containers with Singularity. Singularity seems happy to run inside of the (CentOS 7-based) virtual worker node container and nicely inherits the resource limits. The file access is as the user which is great. However, this only seems to work if the virtual node Singularity launches into happens to be the Docker container with the highest PID number (most recently spawned). If it is an earlier launched container then Singularity fails halfway through with an error "ERROR : Failed to unshare root file system: Operation not permitted"
If I run the command in debug mode I can see where the behaviour diverges (last container versus earlier launched container):
• The first difference is that the running in the last container Singularity says "Overlay seems supported by the kernel" but in an earlier container it says "Overlay seems not supported by the kernel"
• The second difference is that the Singularity running in an earlier container doesn't reach "Create mount namespace"
Here's an edited diff -y to illustrate. last on left, earlier on right:
VERBOSE Set messagelevel to: 5 VERBOSE Set messagelevel to: 5
DEBUG PIPE_EXEC_FD value: 7 DEBUG PIPE_EXEC_FD value: 7
VERBOSE Container runtime VERBOSE Container runtime
VERBOSE Check if we are running as setuid VERBOSE Check if we are running as setuid
DEBUG Overlay seems supported by kernel | DEBUG Overlay seems not supported by kernel
DEBUG Drop privileges DEBUG Drop privileges
DEBUG Read json configuration from pipe DEBUG Read json configuration from pipe
DEBUG Set child signal mask DEBUG Set child signal mask
DEBUG Create socketpair for smaster communication chann DEBUG Create socketpair for smaster communication chann
DEBUG Wait C and JSON runtime configuration from sconta DEBUG Wait C and JSON runtime configuration from sconta
DEBUG Set parent death signal to 9 DEBUG Set parent death signal to 9
VERBOSE Spawn scontainer stage 1 VERBOSE Spawn scontainer stage 1
VERBOSE Get root privileges VERBOSE Get root privileges
DEBUG Set parent death signal to 9 DEBUG Set parent death signal to 9
DEBUG Entering in scontainer stage 1 DEBUG Entering in scontainer stage 1
DEBUG Set parent death signal to 9 DEBUG Set parent death signal to 9
VERBOSE Execute scontainer stage 1 VERBOSE Execute scontainer stage 1
DEBUG Entering scontainer stage 1 DEBUG Entering scontainer stage 1
DEBUG Entering image format intializer DEBUG Entering image format intializer
DEBUG Check for image format sif DEBUG Check for image format sif
DEBUG Receiving configuration from scontainer stage 1 DEBUG Receiving configuration from scontainer stage 1
DEBUG Wait completion of scontainer stage1 DEBUG Wait completion of scontainer stage1
VERBOSE Get root privileges VERBOSE Get root privileges
VERBOSE Create mount namespace | ERROR Failed to unshare root file system: Operation not
DEBUG Create RPC socketpair for communication between sc | srun: error: slurmd4xsacnodez1000: task 0: Exited with exit c
VERBOSE Spawn smaster process <
DEBUG Set parent death signal to 9 <
VERBOSE Spawn scontainer stage 2 <
VERBOSE Create mount namespace <
VERBOSE Spawn RPC server <
VERBOSE Execute smaster process <
I was using 3.0.3 from the RPM following the instructions for making the RPM on the Installation section of the website.