running docker on slurm compute images: permission denied

1,150 views
Skip to first unread message

kuber

unread,
Sep 19, 2020, 6:15:19 PM9/19/20
to google-cloud-slurm-discuss
i am trying to run a docker image on a slurm container, so i modified the "custom-compute-install" to install docker and set it up to run.

in particular, i added my slurm user (that is, the 'whoami' user printed on the compute machine) on the "docker" group so i can run containers.
on the slurm login machine and on the template slurm image, this is working, i can actually run commands on the installed docker image, like:

<prompt># docker run <my-image> <my-command>

in fact, if i run "groups" on the login/compute-image instances, it correctly prints that the user belongs to the "docker" group.

however, when i try to schedule a job with sbatch, it seems that the user is not on the "docker" group, thus giving me the error:
"got permission denied while trying to connect to the Docker daemon socket".

this is my "custom-compute-install" script:

******************************************************
#!/bin/bash

# install docker
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

# add docker to startup and start it
systemctl enable docker
systemctl restart docker

# add cluster user to the docker group
usermod -aG docker <my-cluster-user>

# pull the image that will be run on the cluster instances
docker pull <my-docker-image>
******************************************************

this is my sbatch script (compute.sh) that invokes the computation:

******************************************************
#!/bin/bash
#
#SBATCH --job-name=compute
#SBATCH --output=out_%j.txt
#SBATCH --nodes=1

srun docker run <my-image> <my-command>
******************************************************

and this is how the job is scheduled:

******************************************************
<prompt># sbatch compute.sh
******************************************************

the job is actually executed, but with the permission denied error i mentioned before.

what am i doing wrong?

Robert Moulton

unread,
Sep 19, 2020, 10:19:02 PM9/19/20
to kuber, google-cloud-slurm-discuss
is the same docker group in place on the controller instance?

--


You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.


To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/c643e2e7-f3ec-4faf-bbb8-436ce20d1ffbn%40googlegroups.com.


Marco Di Benedetto

unread,
Sep 20, 2020, 7:46:47 AM9/20/20
to Robert Moulton, google-cloud-slurm-discuss
the controller starts with no docker group, so i followed your advice and added it with the "custom-controller-install" script, like this:

************************************************************************
#!/bin/bash

groupadd docker
usermod -aG docker <my-cluster-user>
************************************************************************

and successfully verified the group was actually added.

however, the permission denied problem still arises, as nothing was changed.


--
Marco Di Benedetto, Ph.D.

Co-Founder at Transform and Lighting S.r.l.

Researcher at Italian National Research Council (CNR)

Robert Moulton

unread,
Sep 20, 2020, 1:20:05 PM9/20/20
to Marco Di Benedetto, google-cloud-slurm-discuss
hm ... are the numeric docker group IDs the same on controller and
compute instances?
> To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/CAHo%2BrHPGvxtekf5i7kQ2uzLkQ40Ye-6mUdYPFAB%2BcqH2yMv54A%40mail.gmail.com.

Marco Di Benedetto

unread,
Sep 20, 2020, 4:32:49 PM9/20/20
to Robert Moulton, google-cloud-slurm-discuss
i logged on each of the three instances (logn, controller, compute) after cluster creation, and ensured the same docker group was created with the same group id.

unfortunately, nothing changed... :(

m.


Ward Harold

unread,
Sep 20, 2020, 5:40:29 PM9/20/20
to kuber, google-cloud-slurm-discuss
Singularity is the path of least resistance to doing containers on a Slurm cluster. There is an example configuration here: https://github.com/SchedMD/slurm-gcp/tree/master/tf/examples/singularity

If there are Docker containers you want / need to use Singularity will pull them and convert them to Singularity containers on the fly: https://sylabs.io/guides/3.0/user-guide/singularity_and_docker.html

--
You received this message because you are subscribed to the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/c643e2e7-f3ec-4faf-bbb8-436ce20d1ffbn%40googlegroups.com.


--
... WkH
Ward Harold | Technical Curriculum Developer | w...@google.com | 512-751-9198

Ward Harold

unread,
Sep 20, 2020, 5:43:33 PM9/20/20
to kuber, google-cloud-slurm-discuss

Robert Moulton

unread,
Sep 20, 2020, 9:39:10 PM9/20/20
to Ward Harold, kuber, google-cloud-slurm-discuss
admittedly I have very limited slurm experience, but I'm wondering if
the use of srun within a script submitted via sbatch is causing the
docker permission issue. Unless there's a valid reason to include it,
I'd try without srun:

docker run <my-image> <my-command>

On Sun, Sep 20, 2020 at 2:44 PM 'Ward Harold' via
google-cloud-slurm-discuss
> To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/CAL3GZpkfP4b0AyURGq6nWR1-3fgS%3DA4DZNNpBvp4yLQx27u-3w%40mail.gmail.com.

Marco Di Benedetto

unread,
Sep 21, 2020, 7:23:27 AM9/21/20
to Robert Moulton, Ward Harold, google-cloud-slurm-discuss
surely it's my scarce experience in cloud/container management, but things are getting too much over-complicated to set up my research project as a single person.
of course you'll have more control with it, but there are other cloud providers that allow you to set up an hpc cluster with custom docker images with very little effort.
before switching, maybe i'll give singularity a look after i understand how it works, but in the meantime thank you all guys for your kind support.

m.

Joseph Schoonover

unread,
Sep 21, 2020, 12:33:03 PM9/21/20
to google-cloud-slurm-discuss
Hey Marco,
Have you tried restarting the controller daemon after defining the docker group on the controller ?, e.g.

sudo service slurmctld restart


Could you share more about your research project and the needs of independent researchers, like yourself ? 

Marco Di Benedetto

unread,
Sep 23, 2020, 9:47:20 AM9/23/20
to Joseph Schoonover, google-cloud-slurm-discuss
apparently, having the same docker group id solved the problem!
the key point was to do it in the right way, so putting everything in the custom install scripts seemed the best option.

here are the modifications:


custom compute script:
***********************************************************
#!/bin/bash

# install and run docker

curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh

groupmod -g <some-unused_id> docker
usermod -aG docker <my-cluster-user>

systemctl enable docker
systemctl restart docker

# pull image
docker pull <my-docker-image>
***********************************************************


custom controller script:
***********************************************************
#!/bin/bash

groupadd -g <some-unused_id> docker

usermod -aG docker <my-cluster-user>
***********************************************************

creating a custom group id "by hand" sure is just a hack and not the right option, anyway it works.

thank you all very much,
m.


You received this message because you are subscribed to a topic in the Google Groups "google-cloud-slurm-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-cloud-slurm-discuss/BYeF-bta7XY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-cloud-slurm-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-cloud-slurm-discuss/051a4a58-c0ba-41e5-89f8-9c2b4a9b8a9dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages