Recipe for Installing NVidia drives & CUDA in VNFS Image?

1,355 views
Skip to first unread message

Dominic Daninger

unread,
Sep 22, 2017, 1:47:36 PM9/22/17
to Warewulf
Is anyone aware of a formula for installing NVidia drivers and CUDA in a RHEL/CentOS 7.x VNFS image?

The compute nodes will have the GPU cards and no GPU card in the head node.

Thanks
Dom

Lew Robbins

unread,
Sep 22, 2017, 1:58:50 PM9/22/17
to ware...@lbl.gov
Hello Dom,

There may be more elegant ways, but we took the path of least resistance.
On the GPU nodes, we provision a /etc/rc3.d/S98cuda file, which is nothing more than:

sh /share/src/cuda/8.0.44/NVIDIA-Linux-x86_64-367.44.run --kernel-source-path /usr/src/kernels/`uname -r` --no-questions --ui=none --no-questions --accept-license
/usr/bin/nvidia-smi -c 3

Depending on the version, the options may need to be modified a bit.

Thanks,
Lew

--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To post to this group, send email to ware...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/80573fa2-76f7-4bc8-b77b-0c2367e93c3c%40lbl.gov.
For more options, visit https://groups.google.com/a/lbl.gov/d/optout.

Jason Stover

unread,
Sep 22, 2017, 2:00:10 PM9/22/17
to ware...@lbl.gov
Hi Dominic,

See the OHPC thread for my answer. ;)

https://groups.io/g/OpenHPC-users/topic/6060569

-J

Ryan Novosielski

unread,
Sep 22, 2017, 2:08:40 PM9/22/17
to Warewulf
I compile these manually (extract the driver package and run the make yourself from the appropriate location) and copy them to the image. Pretty easy -- it's just a few files that you need to copy:

[root@pascal001 3.10.0-229.20.1.el7.x86_64]# find /lib/modules/3.10.0-229.20.1.el7.x86_64 -name "nvidia*.ko"
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia-drm.ko
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia-modeset.ko
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia-uvm.ko
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia.ko

That may get you started, but if not, I have to do this again to update my drivers, so I can write it up.
________________________________________
From: Dominic Daninger <do...@nor-tech.com>
Sent: Friday, September 22, 2017 1:47:35 PM
To: Warewulf
Subject: [Warewulf] Recipe for Installing NVidia drives & CUDA in VNFS Image?
--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov<mailto:warewulf+u...@lbl.gov>.
To post to this group, send email to ware...@lbl.gov<mailto:ware...@lbl.gov>.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/80573fa2-76f7-4bc8-b77b-0c2367e93c3c%40lbl.gov<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Flbl.gov%2Fd%2Fmsgid%2Fwarewulf%2F80573fa2-76f7-4bc8-b77b-0c2367e93c3c%2540lbl.gov%3Futm_medium%3Demail%26utm_source%3Dfooter&data=02%7C01%7Cnovosirj%40rutgers.edu%7C4ec702a734ba4761aa5c08d501e40832%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636417001262126530&sdata=Nsq9%2FOrnlgjMVgu0KpfVZ89UzutoJSib7e2bhlDNt6s%3D&reserved=0>.
For more options, visit https://groups.google.com/a/lbl.gov/d/optout<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fa%2Flbl.gov%2Fd%2Foptout&data=02%7C01%7Cnovosirj%40rutgers.edu%7C4ec702a734ba4761aa5c08d501e40832%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C636417001262126530&sdata=NGPxVcfaQHLN3C6tcBbXC85r7FgdF%2FMFqbWLPlNO2m4%3D&reserved=0>.

Dominic Daninger

unread,
Sep 22, 2017, 3:48:04 PM9/22/17
to ware...@lbl.gov
Thanks we will give some of these ideas a try.

Dom
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To post to this group, send email to ware...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/DM5PR14MB1209251B66B6C55D55BB6860CA670%40DM5PR14MB1209.namprd14.prod.outlook.com.
For more options, visit https://groups.google.com/a/lbl.gov/d/optout.

david.a...@sjsu.edu

unread,
Feb 18, 2018, 4:23:45 PM2/18/18
to Warewulf, novo...@rutgers.edu
Hi Ryan,

Could you please help me with some more defined steps for this approach? I've extracted the .run files but am not sure how to make the drivers and which files to copy where. I'm installing an OpenHPC setup on a set of compute and gpu nodes. I got the compute nodes working but an stuck on the GPU nodes as my head node does not have a GPU and I am unsure how to proceed to install the NVIDIA drivers on the GPU nodes. The approach above of running the install on provisioning failed because WW excludes the kernel source from the image (would bloat it by several hundred MBs) and GCC 4.6 was not default (7.X is). Your help would be greatly appreciated.

Thanks,
David


On Friday, September 22, 2017 at 11:08:40 AM UTC-7, Ryan Novosielski wrote:
I compile these manually (extract the driver package and run the make yourself from the appropriate location) and copy them to the image. Pretty easy -- it's just a few files that you need to copy:

[root@pascal001 3.10.0-229.20.1.el7.x86_64]# find /lib/modules/3.10.0-229.20.1.el7.x86_64 -name "nvidia*.ko"
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia-drm.ko
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia-modeset.ko
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia-uvm.ko
/lib/modules/3.10.0-229.20.1.el7.x86_64/extra/nvidia.ko

That may get you started, but if not, I have to do this again to update my drivers, so I can write it up.
________________________________________
From: Dominic Daninger <do...@nor-tech.com>
Sent: Friday, September 22, 2017 1:47:35 PM
To: Warewulf
Subject: [Warewulf] Recipe for Installing NVidia drives & CUDA in VNFS Image?

Is anyone aware of a formula for installing NVidia drivers and CUDA in a RHEL/CentOS 7.x VNFS image?

The compute nodes will have the GPU cards and no GPU card in the head node.

Thanks
Dom

--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov<mailto:warewulf+u...@lbl.gov>.
To post to this group, send email to ware...@lbl.gov<mailto:warewul...@lbl.gov>.

Aurelien Bouteiller

unread,
Feb 26, 2018, 6:15:42 PM2/26/18
to Warewulf
On our cluster we applie the following methodology for both Cuda and Mic drivers:

1. Run the same kernel as the deployed image on the preparation node (i.e. the node that contains the bootstraps images)
2. Install the mic/nvidia drivers on the preparation node
3. Edit /etc/warewulf/bootrap.conf to add nvidia-* and mic* to the bootstrap patterns
4. Prepare the bootstrap image
5. Install the cuda/mic libraries in the vnfs

At this point, the bootstrap itself contains the drivers.

David Anastasiu

unread,
Feb 26, 2018, 6:45:25 PM2/26/18
to ware...@lbl.gov
I ended up doing something similar. My initial problem was that the driver install program checks for the presence of the GPU, which we do not have on the head node. However, we were able to use the rpm package from NVIDIA in the GPU image chroot, which installed the drivers even though the GPU was not there.

A separate issue which I have not solved yet is installing cuda and frameworks for use on the GPU nodes. I am looking to use Spack to build modules for potentially different versions of the frameworks (TensorFlow, Cafe, etc). If anyone has a recipe for using Spack with modules (step by step instructions), I would greatly appreciate the help. I am not an admin, and figuring this our from scratch will take me some time between my other projects and teaching (I am faculty).

--
You received this message because you are subscribed to the Google Groups "Warewulf" group.
To unsubscribe from this group and stop receiving emails from it, send an email to warewulf+u...@lbl.gov.
To post to this group, send email to ware...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/warewulf/69ee133c-d5d1-4eaa-ad05-968b28b11b26%40lbl.gov.
For more options, visit https://groups.google.com/a/lbl.gov/d/optout.
--
David C. Anastasiu
Assistant Professor
Department of Computer Engineering 
San José State University

Lab: www.davidanastasiu.net
office: ENG 179
phone: (408) 924-2938

Ryan Novosielski

unread,
Feb 26, 2018, 9:00:22 PM2/26/18
to david.a...@sjsu.edu, Warewulf
Sorry for the delay. Been a busy few days. 

I actually don’t do this anymore, instead using the CUDA yum repository provided by NVIDIA to provide the drivers. It’s a bit of a pain in the neck to install an older driver, I’ve found — for example, we use the 375 stream instead of 384 and it wants to install the latest if you allow it to decide — but can be done by manually installing the one you want and the dependencies for it. It does make the image larger than a compute node image, but doesn’t pull in a lot of unnecessary stuff so far as I could tell. Note we don’t install the CUDA software itself in the image as we expect it to change from time to time, and it isn’t required. 

When I did do the below, it involved extracting the CUDA package, and then extracting the driver package within (there’s some sort of --extract-only flag or similar), and doing a “make” in the appropriate directory (it was pretty clear where if you’re familiar with compiling software — I’d have to check). 

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novo...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

David Anastasiu

unread,
Feb 26, 2018, 10:05:00 PM2/26/18
to Ryan Novosielski, Warewulf

Darin Schmidt

unread,
Feb 28, 2018, 12:48:29 PM2/28/18
to Warewulf, novo...@rutgers.edu
If you want to create a module for it you can just use an older one and change its properties, such as mine are located at /app/modules/devel/cuda/8.0, if I wanted to make one for CUDA 9.0 Id copy 8.0 and name it 9.0 and modify its internals such as:

#%Module
prepend-path PATH /app/cuda/cuda8.0/bin
prepend-path LD_LIBRARY_PATH /app/cuda/cuda8.0/lib64:/app/cuda/cuda8.0/lib

to

#%Module
prepend-path PATH /app/cuda/cuda9.0/bin
prepend-path LD_LIBRARY_PATH /app/cuda/cuda9.0/lib64:/app/cuda/cuda9.0/lib


AS LONG AS you keep the same directory structure when you install the new CUDA. My /app directory is shared so I only have to install it once.
David

To post to this group, send email to ware...@lbl.gov<mailto:warewulf...@lbl.gov>.

Ryan Novosielski

unread,
Feb 28, 2018, 12:54:51 PM2/28/18
to Warewulf
If you are using Lmod, you can do one better and even use the exact same module file, and just use the myModuleVersion() placeholder to fill out the number based on the filename. I understand some places use a single file and symbolic link it into place all over where needed.

Here’s mine, as an example (I see I have an inconsistency there where the module name is cuda and the directory PATH in the modules tree has CUDA), otherwise myModuleName() would be below too). I either wrote this by hand, or edited an example or something. (Now that I notice it, I also should change the second /opt/sw/packages to “tree” or not bother defining it — the stuff you notice on a second glance).

help(
[[
This module loads the environment for the CUDA parallel computing platform.
]])

whatis("Description: CUDA: Parallel computing platform and programming model invented by NVIDIA")
whatis("URL: http://www.nvidia.com/object/cuda_home_new.html")

local tree = "/opt/sw/packages"
local base = pathJoin("/opt/sw/packages", myModuleName(), myModuleVersion())

setenv("CUDA_HOME", base)
prepend_path("PATH", pathJoin(base, "bin"))
prepend_path("LD_LIBRARY_PATH", pathJoin(base, "lib64"))
prepend_path("MANPATH", pathJoin(base, "doc/man"))

-- Setup MODULEPATH for packages built with this myModuleVersion() of CUDA
local mroot = os.getenv("MODULEPATH_ROOT")
local mdir = pathJoin(mroot, "CUDA", myModuleVersion())
prepend_path("MODULEPATH", mdir)
>> To post to this group, send email to ware...@lbl.gov<mailto:ware...@lbl.gov>.
signature.asc
Reply all
Reply to author
Forward
0 new messages