Upgrading CUDA without reboot

4,024 views
Skip to first unread message

Daniel Povey

unread,
Jan 14, 2017, 2:30:10 PM1/14/17
to kaldi-developers
Does anyone have experience upgrading the CUDA toolkit without
rebooting the machine?
We need to upgrade the CUDA toolkit on our grid because some people
are using TensorFlow and they now support only the 8.0 toolkit, but
it's a hassle to have to reboot.

I've seen that doing
sudo rmmod nvidia
and then something like
nvidia-smi
will unload and then reload the kernel module, but I'm not sure if
this works when you upgrade the toolkit.

BTW, we don't use the Debian package, we download the installer
directly from NVidia.

Dan

Peter Smit

unread,
Jan 15, 2017, 3:51:08 AM1/15/17
to kaldi-developers
There is no CUDA kernel module, I guess you are talking about the nvidia gpu driver itself? You can first check if the current Nvidia gpu driver is actually supported by CUDA 8.0. Replacing the nvidia driver itself can indeed be done without reboot with "sudo rmmod nvidia" & "sudo nvidia-smi". You should anyway make sure that no current cuda processes are running.

Contrary to popular belief it is very well possible to install multiple compilers and cuda versions on the same machine. If you don't have anything like it for your cluster yet I suggest you look to easybuild and lmod to be able to have different toolchains installed next to each other. On our hpc-cluster we have this and it works very well.

--
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-develope...@googlegroups.com.
To post to this group, send email to kaldi-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/kaldi-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-developers/CAEWAuyQCKjg%3DfKyBTE18TW0kmK%2BS_%3DeS24PRRcrMEFiKSvCUOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

vinc...@yahoo.com

unread,
Feb 17, 2017, 6:29:23 AM2/17/17
to kaldi-developers, dpo...@gmail.com

Just FYI.

I never managed to change the Driver version without rebooting, maybe it works for some others,not for me. [I am talking about the driver like 367 375 ....]

However I was able to run multiple version of the CUDA toolkit on my machine.

What needs to be done is as follow:


You install for instance 8.0.44 in /usr/local/cuda-8.0.44 and 8.0.61 in /usr/local/cuda-8.0.61

then use a symlink /usr/local/cuda to one of the above directory

also you need to check the LIBRARY PATH.

I found easier to use /etc/ld.so.conf.d and change the cuda.conf in it to point to /usr/local/cuda
then run sudo ldconfig


The only issue you may have is that cuda 8 may require a later version of your current driver.

Daniel Povey

unread,
Feb 17, 2017, 12:27:06 PM2/17/17
to kaldi-developers
We managed to upgrade the drivers without a reboot.  I think we did, after upgrading the CUDA package,
sudo rmmod nvidia-uvm
sudo rmmod nvidia
and then run
nvidia-smi
to reload the kernel module.


--
visit http://kaldi-asr.org/forums.html to find out how to join.
---
You received this message because you are subscribed to the Google Groups "kaldi-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-developers+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages