Docker breaks on latest M119 image - dial unix /var/run/docker.sock: connect: permission denied.

262 views
Skip to first unread message

Jaan Lı

unread,
Mar 26, 2024, 5:31:11 PMMar 26
to google-dl-platform
Hi all,

I'm running into a bug needing to deploy Ray clusters for training large language models for our work with hospitals.

Specifically, on the latest M119 image, projects/deeplearning-platform-release/global/images/pytorch-latest-cu121-v20240319-ubuntu-2004-py310, I get the following error:

> docker run -it --gpus all rayproject/ray:latest-py310-cu121

docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create": dial unix /var/run/docker.sock: connect: permission denied.

Is the Cloud Deep Learning Environments team aware of this?

I also filed a related bug on the Ray project: https://github.com/ray-project/ray/issues/44308

(The same docker container runs fine on our AWS deployments, but we need GCP as well for certain cloud products.)

Thank you so much for any pointers in debugging this!
Jaan

Ali Rafiq

unread,
Mar 27, 2024, 4:10:49 PMMar 27
to google-dl-platform
Hi Jaan

I got a similar error while trying to run a different image. Sharing the solution that worked for me. Follow the below steps:
  1. Create docker group if not exist : sudo groupadd docker

  2. Add user to docker group : sudo usermod -aG docker ${USER}

  3. Change docker.sock to new permission : sudo chmod 666 /var/run/docker.sock


Regards
Ali Rafiq
Reply all
Reply to author
Forward
0 new messages