Kaldi can't get CUDA context in EXCLUSIVE_PROCESS mode

1,596 views
Skip to first unread message

kim.y...@gmail.com

unread,
Jun 14, 2016, 10:48:43 PM6/14/16
to kaldi-help
Recent release of CUDA 8.0 library does not support EXCLUSIVE_THREAD computation mode... 

The problem is when I run the Kaldi recipe in my single machine with EXCLUSIVE_PROCESS mode, 
following error message happens: 

====== (start log message) ======

nnet3-train --print-interval=10 --momentum=0.5 --max-param-change=2.0 --optimization.min-deriv-time=0 'nnet3-am-c
opy --raw=true --learning-rate=0.0009 exp/nnet3/lstm_bidirectional_sp/0.mdl - |' 'ark,bg:nnet3-copy-egs --left-co
ntext=42 --right-context=42 ark:exp/nnet3/lstm_bidirectional_sp/egs/egs.1.ark ark:- | nnet3-shuffle-egs --buffer-
size=5000 --srand=0 ark:- ark:-| nnet3-merge-egs --minibatch-size=50 --measure-output-frames=false --discard-partial-minibatches=true ark:- ark:- |' exp/nnet3/lstm_bidirectional_sp/1.1.raw 
WARNING (nnet3-train:SelectGpuId():cu-device.cc:137) Will try again to get a GPU after 20 seconds.
Wed Jun 15 11:45:39 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
| 27%   35C    P8     6W / 180W |      2MiB /  8113MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:03:00.0     Off |                  N/A |
| 27%   36C    P8     5W / 180W |      2MiB /  8113MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 1080    Off  | 0000:82:00.0     Off |                  N/A |
| 27%   34C    P8     6W / 180W |      2MiB /  8113MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 1080    Off  | 0000:83:00.0     Off |                  N/A |
| 27%   29C    P8     6W / 180W |     10MiB /  8113MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
LOG (nnet3-train:SelectGpuId():cu-device.cc:146) num-gpus=4. Device 0: all CUDA-capable devices are busy or unavailable.  Device 1: all CUDA-capable devices are busy or unavailable.  Device 2: all CUDA-capable devices are busy or unavailable.  Device 3: all CUDA-capable devices are busy or unavailable.  
ERROR (nnet3-train:SelectGpuId():cu-device.cc:147) Failed to create CUDA context, no more unused GPUs? 
ERROR (nnet3-train:SelectGpuId():cu-device.cc:147) Failed to create CUDA context, no more unused GPUs? 

====== (end log messag) ======

Does Kaldi not support running in EXCLUSIVE_PROCESS mode? 

Hope your help^^ 

Daniel Povey

unread,
Jun 14, 2016, 10:55:58 PM6/14/16
to kaldi-help
It doesn't make a difference whether it's in exclusive-thread or
exclusive-process mode.
It might be a permissions issue, sometimes you have to do
sudo chmod a+rwx /dev/nvidia*
It's also possible that there are processes on those GPUs which are
not being listed because it's a "cheap" GPU.. not sure if NVidia still
does that.

Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

kim.y...@gmail.com

unread,
Jun 14, 2016, 11:59:02 PM6/14/16
to kaldi-help, dpo...@gmail.com
Hi Dan., 
Your first solution of permission change gave me the same error message. 
It may not belong to the permission issue...
 
For your second estimate on cheap GPU, can I a direct method to check about it? 
Actually I am testing on NVIDIA's GTX 1080, which is the latest release product. 
If there's un-listed process and CUDA 8.0 library can't support EXCLUSIVE-PROCESS mode, 
that should be a bug in CUDA 8.0 library!!! 

Thanks for your interest~

2016년 6월 15일 수요일 오전 11시 55분 58초 UTC+9, Dan Povey 님의 말:

kim.y...@gmail.com

unread,
Jun 15, 2016, 12:14:20 AM6/15/16
to kaldi-help, dpo...@gmail.com, kim.y...@gmail.com
Is there any method to check
 whether there are un-listed process in EXCLUSIVE_PROCESS mode 
and we cannot get CUDA context? 

2016년 6월 15일 수요일 오후 12시 59분 2초 UTC+9, kim.y...@gmail.com 님의 말:

Daniel Povey

unread,
Jun 15, 2016, 12:19:18 AM6/15/16
to kim.y...@gmail.com, kaldi-help
Actually I don't think it's un-listed processes.. it's printing the
header for the processes so it must be prepared to list the processes
if they were there.

Searching google for
"all CUDA-capable devices are busy or unavailable."
reveals that there are various reasons why this can happen, and they
all seem to be connected to driver problems. Investigate whether
other code that uses CUDA works with your GPU setup, and try rebooting
and un-installing and re-installing the drivers and the CUDA toolkit.
Dan

won...@gridspace.com

unread,
Jun 23, 2016, 6:57:45 PM6/23/16
to kaldi-help, kim.y...@gmail.com, dpo...@gmail.com
I ran into the same issue after my computing infrastructure got CUDA 8.0.

EXCLUSIVE_THREAD is deprecated from CUDA 7.5 and now unsupported at CUDA 8.0.


Is there anyone who successfully ran nnet2(or nnet3) training with EXCLUSIVE_PROCESS compute mode?

Daniel Povey

unread,
Jun 23, 2016, 7:20:50 PM6/23/16
to won...@gridspace.com, kaldi-help, Kim Young-Ik
EXCLUSIVE_PROCESS is fine. I'll change the documentation to say this.
Dan

Wonkyum Lee

unread,
Jun 23, 2016, 9:18:44 PM6/23/16
to kaldi-help, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com
I am using run.pl ...

SelectGpuId() does not seem to work properly with EXCLUSIVE_PROCESS option. 


nnet3-train --print-interval=10 --momentum=0.0 --max-param-change=2.0 'nnet3-am-copy --raw=true --learning-rate=0.00143947980056 exp/nnet3/tdnn/1963.mdl - |' 'ark,bg:nnet3-copy-egs --frame=7 --left-context=16 --right-context=12 ark:exp/nnet3/tdnn/egs/egs.80.ark ark:- | nnet3-shuffle-egs --buffer-size=5000 --srand=1963 ark:- ark:-| nnet3-merge-egs --minibatch-size=512 --measure-output-frames=false --discard-partial-minibatches=true ark:- ark:- |' exp/nnet3/tdnn/1964.1.raw
WARNING (nnet3-train:SelectGpuId():cu-device.cc:137) Will try again to get a GPU after 20 seconds.
Thu Jun 23 18:06:13 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27                 Driver Version: 367.27                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:06:00.0     Off |                    0 |
| N/A   28C    P8    26W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           On   | 0000:07:00.0     Off |                    0 |
| N/A   25C    P8    28W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           On   | 0000:0A:00.0     Off |                    0 |
| N/A   24C    P8    26W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           On   | 0000:0B:00.0     Off |                    0 |
| N/A   26C    P8    28W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   4  Tesla K80           On   | 0000:85:00.0     Off |                    0 |
| N/A   26C    P8    25W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   5  Tesla K80           On   | 0000:86:00.0     Off |                    0 |
| N/A   31C    P8    29W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   6  Tesla K80           On   | 0000:89:00.0     Off |                    0 |
| N/A   24C    P8    25W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   7  Tesla K80           On   | 0000:8A:00.0     Off |                    0 |
| N/A   27C    P8    29W / 149W |      2MiB / 11439MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
LOG (nnet3-train:SelectGpuId():cu-device.cc:146) num-gpus=8. Device 0: all CUDA-capable devices are busy or unavailable.  Device 1: all CUDA-capable devices are busy or unavailable.  Device 2: all CUDA-capable devices are busy or unavailable.  Device 3: all CUDA-capable devices are busy or unavailable.  Device 4: all CUDA-capable devices are busy or unavailable.  Device 5: all CUDA-capable devices are busy or unavailable.  Device 6: all CUDA-capable devices are busy or unavailable.  Device 7: all CUDA-capable devices are busy or unavailable.
ERROR (nnet3-train:SelectGpuId():cu-device.cc:147) Failed to create CUDA context, no more unused GPUs?

[ Stack-Trace: ]
nnet3-train() [0x7ae280]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::CuDevice::SelectGpuId(std::string)
main
__libc_start_main
nnet3-train() [0x591c59]

Daniel Povey

unread,
Jun 23, 2016, 10:09:58 PM6/23/16
to Wonkyum Lee, kaldi-help, Kim Young-Ik
I doubt this is related to the EXCLUSIVE_PROCESS option. If you just
upgraded your CUDA toolkit to 8.0, it *may* be possible that your code
is out of date and needs to be compiled with a more recent compute
capability (I doubt it, but it's possible)-- you could update your
code, "make clean" in cudamatrix/, and recompile.
Dan

Wonkyum Lee

unread,
Jun 23, 2016, 11:46:59 PM6/23/16
to kaldi-help, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com
Thanks for response Dan. 

I removed kaldi which is already compiled and compiled one with reasonable compute capability. However, the issue still remains same. 

It seems like that it has problem in getting CUDA context. Let me look into it more.

Jan Trmal

unread,
Jun 23, 2016, 11:49:16 PM6/23/16
to kaldi-help, won...@gridspace.com, kim.y...@gmail.com, Dan Povey
Wonkyum, did you try to run some of the examples from SDK? Those did work?
y.

Daniel Povey

unread,
Jun 23, 2016, 11:57:10 PM6/23/16
to Jan Trmal, kaldi-help, Wonkyum Lee, Kim Young-Ik
Hm. I don't know whether Kim Young-Ik found that reinstalling the
CUDA toolkit solved his problem.
The fact that he is the only one who had the same problem, and he also
was using the toolkit version 8.0, makes me think it might be a
problem with the 8.0 version of the toolkit that happens under certain
circumstances. I doubt that there is a problem in Kaldi code or
configuration that could make it incompatible with version 8.0, but
we'll find out.

Dan

Wonkyum Lee

unread,
Jun 24, 2016, 12:15:37 AM6/24/16
to kaldi-help, jtr...@gmail.com, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com
Thanks Dan and Yenda.

It seems like that it is cuda 8.0 and its driver problem. When I work with cuda 7.5 environment, whatever exclusive_thread or exclusive_process works fine. 

At CUDA 8.0 and its driver (367.27), it did not pass any SDK samples. I will talk to Nvidia people. 

Thanks,
Wonkyum

Wonkyum Lee

unread,
Jun 24, 2016, 12:16:31 AM6/24/16
to kaldi-help, jtr...@gmail.com, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com
I meant that At CUDA 8.0 and its driver (367.27), "exclusive_process" option did not pass any SDK samples.
Thanks,
Wonkyum

김영익

unread,
Jun 24, 2016, 12:22:42 AM6/24/16
to Wonkyum Lee, kaldi-help, jtr...@gmail.com, dpo...@gmail.com
Sorry for late response.. 

I also think there's some problem in CUDA 8.0 library on running EXCLUSIVE_PROCESS. 

For my testing the NVIDIA GTX 1080 device, I have run the CUDA 8.0 library with DEFAULT mode. 

To use my 4 GPU cards when running the "run.pl", I used a trick of sleeping about 3 seconds for each process^^ There was no problem on running DEFAULT mode in CUDA 8.0. 
You can use this trick to use many GPUs~ 

Good-Luck~ 

--
김 영 익, 이학박사 / 부장

()엔씨소프트

경기도 성남시 분당구 대왕판교로 644번길 12, ()463-400

tel: 02-6201-8741   mobile: 010-4400-5643   e-mail: you...@ncsoft.com

Wonkyum Lee

unread,
Jun 24, 2016, 12:30:53 AM6/24/16
to 김영익, kaldi-help, jtr...@gmail.com, dpo...@gmail.com
Thanks Young Ik. 
That sounds reasonable quick fix. 
Message has been deleted

Wonkyum Lee

unread,
Jun 24, 2016, 2:10:43 PM6/24/16
to kaldi-help, won...@gridspace.com, jtr...@gmail.com, dpo...@gmail.com, kim.y...@gmail.com
Nvidia is aware of this issue. They fixed this in the r361 branch yesterday. It works now. 
Thanks Dan and Yenda for having looked at this problem. I appreciate it.

Xiang Li

unread,
Jun 29, 2016, 10:57:18 PM6/29/16
to kaldi-help, kim.y...@gmail.com
Except for setting exclusive mode, setting CUDA_VISIBLE_DEVICES is an alternative.
And I think it's a better method under some circumstances, for example when the number of 
GPU is limited and you have other jobs to run on them.
But for kaldi, some parallel GPU jobs are started by run.pl and others are started in 
background by shell, for example train_multisplice_accel2.sh ,
so setting CUDA_VISIBLE_DEVICES globally is not that straightforward.
It may work by adding the env to the script that starts GPU parallel jobs in background, 
and using a 'setting CUDA_...' version run.pl when needed.



Xiang Li

unread,
Jun 29, 2016, 11:01:19 PM6/29/16
to kaldi-help
And for the distributed training across multiple nodes, by using slurm
which can automatically set CUDA_VISIBLE_DEVICES, it's quite convenient
to run the script without setting GPU exclusive.

在 2016年6月30日星期四 UTC+8上午10:57:18,Xiang Li写道:

Daniel Povey

unread,
Jun 30, 2016, 2:04:00 AM6/30/16
to kaldi-help
Kaldi does not require you to use GPU exclusive mode, it just strongly
recommends it. If CUDA_VISIBLE_DEVICES is being correctly set, then
only the chosen device will be visible and Kaldi will choose the
desired device. So there is nothing to prevent you running the
training in this way if that's what you want.

Dan
Reply all
Reply to author
Forward
0 new messages