Hi all,
I'm running into a bug needing to deploy Ray clusters for training large language models for our work with hospitals.
Specifically, on the latest M119 image, projects/deeplearning-platform-release/global/images/pytorch-latest-cu121-v20240319-ubuntu-2004-py310, I get the following error:
> docker run -it --gpus all rayproject/ray:latest-py310-cu121
docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create": dial unix /var/run/docker.sock: connect: permission denied.
Is the Cloud Deep Learning Environments team aware of this?
(The same docker container runs fine on our AWS deployments, but we need GCP as well for certain cloud products.)
Thank you so much for any pointers in debugging this!
Jaan