Singularity In Docker "Failed to unshare root file system: Operation not permitted"

1,207 views
Skip to first unread message

Sam Agnew

unread,
Feb 13, 2019, 1:29:53 AM2/13/19
to singularity
This might seem a strange usage case but bear with me. Our HPC cluster uses containerised compute nodes to suballocate the compute resources. These virtual nodes are assigned CPU and memory limits. The base Docker image contains an SSSD setup that binds to our AD so users run their jobs with their own credentials. The virtual nodes are in a Slurm cluster. All of this has worked well for us.

Recently, there was interest in running containerised workloads. I can easily spawn the workflow containers from the virtual nodes on the host Docker engine with the same resource limits (and since these are running as children of the worker node containers it usefully dovetails with Slurm's view of things) but, naturally, all the workflow file access would be as root which is unworkable. I therefore thought of running the containers with Singularity. Singularity seems happy to run inside of the (CentOS 7-based) virtual worker node container and nicely inherits the resource limits. The file access is as the user which is great. However, this only seems to work if the virtual node Singularity launches into happens to be the Docker container with the highest PID number (most recently spawned). If it is an earlier launched container then Singularity fails halfway through with an error "ERROR  : Failed to unshare root file system: Operation not permitted"

If I run the command in debug mode I can see where the behaviour diverges (last container versus earlier launched container):
• The first difference is that the running in the last container Singularity says "Overlay seems supported by the kernel" but in an earlier container it says "Overlay seems not supported by the kernel"
• The second difference is that the Singularity running in an earlier container doesn't reach "Create mount namespace"

Here's an edited diff -y to illustrate. last on left, earlier on right:
VERBOSE    Set messagelevel to: 5                               VERBOSE    Set messagelevel to: 5
DEBUG      PIPE_EXEC_FD value: 7                                DEBUG      PIPE_EXEC_FD value: 7
VERBOSE    Container runtime                                    VERBOSE    Container runtime
VERBOSE    Check if we are running as setuid                    VERBOSE    Check if we are running as setuid
DEBUG      Overlay seems supported by kernel                  | DEBUG      Overlay seems not supported by kernel
DEBUG      Drop privileges                                      DEBUG      Drop privileges
DEBUG       Read json configuration from pipe                   DEBUG       Read json configuration from pipe
DEBUG       Set child signal mask                               DEBUG       Set child signal mask
DEBUG       Create socketpair for smaster communication chann   DEBUG       Create socketpair for smaster communication chann
DEBUG       Wait C and JSON runtime configuration from sconta   DEBUG       Wait C and JSON runtime configuration from sconta
DEBUG       Set parent death signal to 9                        DEBUG       Set parent death signal to 9
VERBOSE     Spawn scontainer stage 1                            VERBOSE     Spawn scontainer stage 1
VERBOSE     Get root privileges                                 VERBOSE     Get root privileges
DEBUG      Set parent death signal to 9                         DEBUG      Set parent death signal to 9
DEBUG      Entering in scontainer stage 1                       DEBUG      Entering in scontainer stage 1
DEBUG       Set parent death signal to 9                        DEBUG       Set parent death signal to 9
VERBOSE    Execute scontainer stage 1                           VERBOSE    Execute scontainer stage 1
DEBUG      Entering scontainer stage 1                          DEBUG      Entering scontainer stage 1
DEBUG      Entering image format intializer                     DEBUG      Entering image format intializer
DEBUG      Check for image format sif                           DEBUG      Check for image format sif
DEBUG       Receiving configuration from scontainer stage 1     DEBUG       Receiving configuration from scontainer stage 1
DEBUG       Wait completion of scontainer stage1                DEBUG       Wait completion of scontainer stage1
VERBOSE     Get root privileges                                 VERBOSE     Get root privileges
VERBOSE    Create mount namespace                             | ERROR      Failed to unshare root file system: Operation not 
DEBUG      Create RPC socketpair for communication between sc | srun: error: slurmd4xsacnodez1000: task 0: Exited with exit c
VERBOSE    Spawn smaster process                              <
DEBUG      Set parent death signal to 9                       <
VERBOSE    Spawn scontainer stage 2                           <
VERBOSE    Create mount namespace                             <
VERBOSE    Spawn RPC server                                   <
VERBOSE    Execute smaster process                            <

The nearest Google could get me to something that sounds related was this post (which I wasn't completely able to follow): https://github.com/sylabs/singularity/issues/2397

I was using 3.0.3 from the RPM following the instructions for making the RPM on the Installation section of the website.

Thanks!

Sam
Message has been deleted

Sam Agnew

unread,
Feb 13, 2019, 2:13:51 AM2/13/19
to singularity
Just to confirm that the result is the same with singularity 3.1.0-rc2

Sam

David Dykstra

unread,
Feb 13, 2019, 2:32:14 PM2/13/19
to Sam Agnew, singu...@lbl.gov
Sam,

I don't know why you might see a difference between your last PID and
others, but the failure behavior you see, with debug message "Overlay
seems not supported by kernel" and error message "Failed to unshare root
file system: Operation not permitted" is exactly the behavior I see if I
run singularity inside a docker container that was created without the
--privileged option. Maybe that's a clue. I use --privileged to make
it work, although I'm pretty sure I also made it work once with a
smaller set of capabilities.

Dave

On Tue, Feb 12, 2019 at 10:29:53PM -0800, Sam Agnew wrote:
> This might seem a strange usage case but bear with me. Our HPC cluster uses
> containerised compute nodes to suballocate the compute resources. These
> virtual nodes are assigned CPU and memory limits. The base Docker image
> contains an SSSD setup that binds to our AD so users run their jobs with
> their own credentials. The virtual nodes are in a Slurm cluster. All of
> this has worked well for us.
>
> Recently, there was interest in running containerised workloads. I can
> easily spawn the workflow containers from the virtual nodes on the host
> Docker engine with the same resource limits (and since these are running as
> children of the worker node containers it usefully dovetails with Slurm's
> view of things) but, naturally, all the workflow file access would be as
> root which is unworkable. I therefore thought of running the containers
> with Singularity. Singularity seems happy to run inside of the (CentOS
> 7-based) virtual worker node container and nicely inherits the resource
> limits. The file access is as the user which is great. However, this only
> seems to work if the virtual node Singularity launches into happens to be
> the Docker container with the highest PID number (most recently spawned).
> If it is an earlier launched container then Singularity fails halfway
> through with an error "ERROR : Failed to unshare root file system:
> Operation not permitted"
>
> If I run the command in debug mode I can see where the behaviour diverges
> (last container versus earlier launched container):
> ??? The first difference is that the running in the last container
> Singularity says "Overlay seems supported by the kernel" but in an earlier
> container it says "Overlay seems not supported by the kernel"
> ??? The second difference is that the Singularity running in an earlier
> container doesn't reach "Create mount namespace"
>
> Here's an edited diff -y to illustrate. last on left, earlier on right:
> VERBOSE Set messagelevel to: 5 VERBOSE
> Set messagelevel to: 5
> DEBUG PIPE_EXEC_FD value: 7 DEBUG
> PIPE_EXEC_FD value: 7
> VERBOSE Container runtime VERBOSE
> Container runtime
> VERBOSE Check if we are running as setuid VERBOSE
> Check if we are running as setuid
> *DEBUG Overlay seems supported by kernel | DEBUG
> Overlay seems not supported by kernel*
> *VERBOSE Create mount namespace | ERROR
> Failed to unshare root file system: Operation not *
> DEBUG Create RPC socketpair for communication between sc | srun:
> error: slurmd4xsacnodez1000: task 0: Exited with exit c
> VERBOSE Spawn smaster process <
> DEBUG Set parent death signal to 9 <
> VERBOSE Spawn scontainer stage 2 <
> VERBOSE Create mount namespace <
> VERBOSE Spawn RPC server <
> VERBOSE Execute smaster process <
>
> The nearest Google could get me to something that sounds related was this
> post (which I wasn't completely able to
> follow): https://github.com/sylabs/singularity/issues/2397
>
> I was using 3.0.3 from the RPM following the instructions for making the
> RPM on the Installation section of the website.
>
> Thanks!
>
> Sam
>
> --
> You received this message because you are subscribed to the Google Groups "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to singularity...@lbl.gov.

Reply all
Reply to author
Forward
0 new messages