Deep Learning VM - no pre-installed NVIDIA drivers?

1,431 views
Skip to first unread message

Eric H

unread,
Dec 6, 2019, 3:58:16 PM12/6/19
to google-dl-platform
Hi I'm trying to run some models on a Deep Learning VM with Tensorflow 1.X. I noticed things were superslow and tensorflow-gpu did not find any GPUs. Looks like the NVIDIA drivers are not pre-installed?

```
ehulburd@tensorflow-2-vm:~$ nvidia-smi
-bash: nvidia-smi: command not found
ehulburd@tensorflow-2-vm:~$ sudo /opt/deeplearning/install-driver.sh
sudo: /opt/deeplearning/install-driver.sh: command not found
```

Yikes. This was after explicitly requesting that the image be created with nvidia drivers installed. Has anyone else experienced this problem and is there a quick solution other than installing the drivers manually?

Thanks,

Eric

Eric H

unread,
Dec 6, 2019, 4:30:07 PM12/6/19
to google-dl-platform
I had to drop my instance and then re-create through the command line (this mimics default settings when deploying through the marketplace UI):

gcloud compute instances create my-image-name --zone=us-west1-b --image-project=deeplearning-platform-release --image-family=tf-latest-gpu --maintenance-policy=TERMINATE --accelerator="type=nvidia-tesla-k80,count=1" --metadata="install-nvidia-driver=True" --machine-type=n1-highmem-8

Drivers were still not installed, so from within the VM:

sudo /opt/deeplearning/install-driver.sh

Viacheslav Kovalevskyi

unread,
Dec 6, 2019, 4:39:17 PM12/6/19
to Eric H, Yun Lu, google-dl-platform

--
You received this message because you are subscribed to the Google Groups "google-dl-platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-dl-platf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-dl-platform/19e47e64-a432-4871-91f5-ce511f6c83b9%40googlegroups.com.


--
WBR,

Viacheslav Kovalevskyi

Want to know more about DSET team? http://go/dsetreadme

Yun Lu

unread,
Dec 6, 2019, 4:58:03 PM12/6/19
to google-dl-platform
Hi Eric,

I'm notebook team oncall. I started an instance with the gcloud command you gave, and the Nvidia driver was installed successfully (see the attached log).

sudo /opt/deeplearning/install-driver.sh also works on my instance.


I'm guessing you may somehow fetched a wrong deeplearning VM image.

Can you SSH into your machine and send us the header you see in the SSH terminal?

Below is an example header from my instance.


======================================
Welcome to the Google Deep Learning VM
======================================

Version: tf-gpu.1-15.m39
Based on: Debian GNU/Linux 9.11 (stretch) (GNU/Linux 4.9.0-11-amd64 x86_64\n)

Resources:
 * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
 * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm

To reinstall Nvidia driver (if needed) run:
sudo /opt/deeplearning/install-driver.sh
TensorFlow comes pre-installed with this image. To install TensorFlow binaries in a virtualenv (or conda env),
please use the binaries that are pre-built for this image. You can find the binaries at
/opt/deeplearning/binaries/tensorflow/
If you need to install a different version of Tensorflow manually, use the common Deep Learning image with the
right version of CUDA

Linux my-image-name 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64





On Friday, December 6, 2019 at 1:39:17 PM UTC-8, Viacheslav Kovalevskyi wrote:
On Fri, Dec 6, 2019 at 1:30 PM 'Eric H' via google-dl-platform <google-dl...@googlegroups.com> wrote:
I had to drop my instance and then re-create through the command line (this mimics default settings when deploying through the marketplace UI):

gcloud compute instances create my-image-name --zone=us-west1-b --image-project=deeplearning-platform-release --image-family=tf-latest-gpu --maintenance-policy=TERMINATE --accelerator="type=nvidia-tesla-k80,count=1" --metadata="install-nvidia-driver=True" --machine-type=n1-highmem-8

Drivers were still not installed, so from within the VM:

sudo /opt/deeplearning/install-driver.sh


On Friday, December 6, 2019 at 12:58:16 PM UTC-8, Eric H wrote:
Hi I'm trying to run some models on a Deep Learning VM with Tensorflow 1.X. I noticed things were superslow and tensorflow-gpu did not find any GPUs. Looks like the NVIDIA drivers are not pre-installed?

```
ehulburd@tensorflow-2-vm:~$ nvidia-smi
-bash: nvidia-smi: command not found
ehulburd@tensorflow-2-vm:~$ sudo /opt/deeplearning/install-driver.sh
sudo: /opt/deeplearning/install-driver.sh: command not found
```

Yikes. This was after explicitly requesting that the image be created with nvidia drivers installed. Has anyone else experienced this problem and is there a quick solution other than installing the drivers manually?

Thanks,

Eric

--
You received this message because you are subscribed to the Google Groups "google-dl-platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-dl-platform+unsub...@googlegroups.com.
Screen Shot 2019-12-06 at 1.47.28 PM.png

Eric Hulburd

unread,
Dec 6, 2019, 7:31:52 PM12/6/19
to Yun Lu, google-dl-platform
You are right I had the CPU version, but what I don't understand is why I would if I select a GPU when I was creating from the UI. Attached a screenshot of my settings (of course I have a VM running as I want it now, so the GPU quota limit is understood).

Eric



======================================
Welcome to the Google Deep Learning VM
======================================

Version: tf-cpu.1-15.m38

Based on: Debian GNU/Linux 9.11 (stretch) (GNU/Linux 4.9.0-11-amd64 x86_64\n)

Resources:
 * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
 * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
 * Google Group: https://groups.google.com/forum/#!forum/google-dl-platform

To reinstall Nvidia driver (if needed) run:
sudo /opt/deeplearning/install-driver.sh
TensorFlow comes pre-installed with this image. To install TensorFlow binaries in a virtualenv (or conda env),
please use the binaries that are pre-built for this image. You can find the binaries at
/opt/deeplearning/binaries/tensorflow/
If you need to install a different version of Tensorflow manually, use the common Deep Learning image with the
right version of CUDA

Linux tensorflow-2-vm 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOL

To unsubscribe from this group and stop receiving emails from it, send an email to google-dl-platf...@googlegroups.com.


--
WBR,

Viacheslav Kovalevskyi

Want to know more about DSET team? http://go/dsetreadme

--
You received this message because you are subscribed to the Google Groups "google-dl-platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-dl-platf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-dl-platform/bfdda7f9-04d5-4ec8-bbcb-50832caf7270%40googlegroups.com.
Screen Shot 2019-12-06 at 4.30.54 PM (2).png
Reply all
Reply to author
Forward
0 new messages