Deprecation of the VirtualMachine.nvidia_driver_version field

124 views
Skip to first unread message

Tim Jennison

unread,
Sep 21, 2020, 10:46:25 AM9/21/20
to GCP Life Sciences Discuss, gcp-life-scie...@googlegroups.com

Release 85 of Container-Optimized OS (the operating system used by the Cloud Life Sciences API) offers a new method of driver installation using cos-extensions.  This method offers better security, up-to-date drivers, and decreases startup time by several minutes.  However, the new method does not support the same fine-grained version selection as the existing method.  As a result, we are planning to deprecate the VirtualMachine.nvidia_driver_version field of the Cloud Life Sciences RunPipeline method shortly after the release of COS 85 to the stable track.


In preparation for this transition, starting September 28th, the nvidia_driver_version field will be silently ignored, so that existing pipelines which specify a version will continue to work.  Our metrics indicate very limited usage of this field so this change is not anticipated to affect any users adversely.


Thanks
Tim

mboo...@google.com

unread,
Dec 2, 2020, 6:04:04 PM12/2/20
to GCP Life Sciences Discuss
We had a question about this, which came up on a dsub issue.
We were expecting the nvidia driver version to be 450.51.06, but instead it is 440.64.00.
Is this a misunderstanding on our part or has the move to pick up the COS-installed version just not happened yet?

Thanks!

Paul Grosu

unread,
Dec 2, 2020, 7:17:23 PM12/2/20
to GCP Life Sciences Discuss
Hi Matt,

They're there, you download it manually using the following link from the US location:


The whole list of links for the 450 versions are here:

$ curl https://storage.googleapis.com/nvidia-drivers-us-public/ | sed -e s/\</\\n\</g | grep 450.51.06 | cut -f2 -d\> | grep -v sha256

nvidia-cos-project/73/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_73-11647-600-0.cos
nvidia-cos-project/73/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_73-11647-656-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-1064-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-1072-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-1073-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-1079-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-1088-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-1096-0.cos
nvidia-cos-project/77/tesla/450_00/450.51.06/NVIDIA-Linux-x86_64-450.51.06_77-12371-326-0.cos

Hope it helps,
Paul

Tim Jennison

unread,
Dec 3, 2020, 7:46:54 AM12/3/20
to mboo...@google.com, GCP Life Sciences Discuss
Hi Matt,
No, this isn't a misunderstanding on your part. We encountered some stability issues when using cos-extensions and so have been waiting to migrate until they are resolved. In the meantime, the version used by the Life Sciences API sometimes lags behind the default COS version. Our release next week will bring it back in sync (450.51.06).

Thanks
Tim

--
You received this message because you are subscribed to the Google Groups "GCP Life Sciences Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gcp-life-sciences-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gcp-life-sciences-discuss/b86513ae-4e21-4297-9b41-5e8b2d297cb5n%40googlegroups.com.

mboo...@google.com

unread,
Dec 10, 2020, 7:21:42 PM12/10/20
to GCP Life Sciences Discuss
Thank-you, Tim. I want to make sure I understand how this will ultimately work from an end-user perspective, notably considering reproducibility of pipelines:

- At time t0, selecting an Accelerator.type will result in a specific driver (determined by the COS release) being available; my docker image needs to have software that works with that version
- At time t1 (t1 > t0), selecting an Accelerator.type could result in a newer driver (unlikely to ever be older); my docker image (unchanged from time t0) should generally still work (as newer drivers are likely to provide backward compatibility); I might need to move versions forward of software in my docker image.

If there were a case that the new driver no longer supports the software in my old docker image, could I use an older VirtualMachine.bootImage?

The real key point though is that I shouldn't expect to be able to select the nvidia driver version.

Thanks,

-Matt

Paul Grosu

unread,
Dec 11, 2020, 4:43:27 AM12/11/20
to GCP Life Sciences Discuss
Hi Matt,

If you rely only on the image to get you the right GPU driver version, I think the boot version is fixed for some time (March 2021) to only one cos version (85).  If you look at the GPU COS requirements, it must be 85 or higher and LTS:


This milestone release was labeled stable on Sep 01, 2020:


The next release will be introduced 6 months after, with a 4 months in development and 2 in stabilization before production.  So it will be basically in March 2021 before you get the next one, based on the following:


Given the above and the Pipeline API documents, I think the bootImage field will probably only accept the two given strings, as there are currently only one active 85 stable-LTS and one active 85 dev-LTS, based on the following:


Even though there are more releases if you search with the following command, but some are non-standard:

gcloud compute images list --project cos-cloud --no-standard-images

Hope it helps,
Paul

Tim Jennison

unread,
Dec 11, 2020, 8:20:29 AM12/11/20
to Matt Bookman, GCP Life Sciences Discuss
Yes Matt, your understanding is correct. In general, I wouldn't expect newer drivers to break older software for years, if ever.

COS may maintain a relatively stable driver version on their stable image track, but they may also update it regularly to fix GPU bugs along with their other fixes.

As Paul mentioned, we don't allow arbitrary COS boot images. Only the latest image in each family.

Thanks
Tim

Paul Grosu

unread,
Dec 11, 2020, 10:40:54 AM12/11/20
to GCP Life Sciences Discuss
And Matt, if you are experiencing issues among versions of Nvidia drivers, just post them here and we can assist with the troubleshooting.

~p
Reply all
Reply to author
Forward
0 new messages