Kaldi can't get CUDA context in EXCLUSIVE

kim.y...@gmail.com

unread,

Jun 14, 2016, 10:48:43 PM6/14/16

to kaldi-help

Recent release of CUDA 8.0 library does not support EXCLUSIVE_THREAD computation mode...

The problem is when I run the Kaldi recipe in my single machine with EXCLUSIVE_PROCESS mode,

following error message happens:

====== (start log message) ======

nnet3-train --print-interval=10 --momentum=0.5 --max-param-change=2.0 --optimization.min-deriv-time=0 'nnet3-am-c

opy --raw=true --learning-rate=0.0009 exp/nnet3/lstm_bidirectional_sp/0.mdl - |' 'ark,bg:nnet3-copy-egs --left-co

ntext=42 --right-context=42 ark:exp/nnet3/lstm_bidirectional_sp/egs/egs.1.ark ark:- | nnet3-shuffle-egs --buffer-

size=5000 --srand=0 ark:- ark:-| nnet3-merge-egs --minibatch-size=50 --measure-output-frames=false --discard-partial-minibatches=true ark:- ark:- |' exp/nnet3/lstm_bidirectional_sp/1.1.raw

WARNING (nnet3-train:SelectGpuId():cu-device.cc:137) Will try again to get a GPU after 20 seconds.

Wed Jun 15 11:45:39 2016

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 367.27 Driver Version: 367.27 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |

| 27% 35C P8 6W / 180W | 2MiB / 8113MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 1 GeForce GTX 1080 Off | 0000:03:00.0 Off | N/A |

| 27% 36C P8 5W / 180W | 2MiB / 8113MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 2 GeForce GTX 1080 Off | 0000:82:00.0 Off | N/A |

| 27% 34C P8 6W / 180W | 2MiB / 8113MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 3 GeForce GTX 1080 Off | 0000:83:00.0 Off | N/A |

| 27% 29C P8 6W / 180W | 10MiB / 8113MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

LOG (nnet3-train:SelectGpuId():cu-device.cc:146) num-gpus=4. Device 0: all CUDA-capable devices are busy or unavailable. Device 1: all CUDA-capable devices are busy or unavailable. Device 2: all CUDA-capable devices are busy or unavailable. Device 3: all CUDA-capable devices are busy or unavailable.

ERROR (nnet3-train:SelectGpuId():cu-device.cc:147) Failed to create CUDA context, no more unused GPUs?

====== (end log messag) ======

Does Kaldi not support running in EXCLUSIVE_PROCESS mode?

Hope your help^^

Daniel Povey

unread,

Jun 14, 2016, 10:55:58 PM6/14/16

to kaldi-help

It doesn't make a difference whether it's in exclusive-thread or
exclusive-process mode.
It might be a permissions issue, sometimes you have to do
sudo chmod a+rwx /dev/nvidia*
It's also possible that there are processes on those GPUs which are
not being listed because it's a "cheap" GPU.. not sure if NVidia still
does that.

Dan

> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

kim.y...@gmail.com

unread,

Jun 14, 2016, 11:59:02 PM6/14/16

to kaldi-help, dpo...@gmail.com

Hi Dan.,

Your first solution of permission change gave me the same error message.

It may not belong to the permission issue...

For your second estimate on cheap GPU, can I a direct method to check about it?

Actually I am testing on NVIDIA's GTX 1080, which is the latest release product.

If there's un-listed process and CUDA 8.0 library can't support EXCLUSIVE-PROCESS mode,

that should be a bug in CUDA 8.0 library!!!

Thanks for your interest~

2016년 6월 15일 수요일 오전 11시 55분 58초 UTC+9, Dan Povey 님의 말:

kim.y...@gmail.com

unread,

Jun 15, 2016, 12:14:20 AM6/15/16

to kaldi-help, dpo...@gmail.com, kim.y...@gmail.com

Is there any method to check

whether there are un-listed process in EXCLUSIVE_PROCESS mode

and we cannot get CUDA context?

2016년 6월 15일 수요일 오후 12시 59분 2초 UTC+9, kim.y...@gmail.com 님의 말:

Daniel Povey

unread,

Jun 15, 2016, 12:19:18 AM6/15/16

to kim.y...@gmail.com, kaldi-help

Actually I don't think it's un-listed processes.. it's printing the
header for the processes so it must be prepared to list the processes
if they were there.

Searching google for

"all CUDA-capable devices are busy or unavailable."

reveals that there are various reasons why this can happen, and they
all seem to be connected to driver problems. Investigate whether
other code that uses CUDA works with your GPU setup, and try rebooting
and un-installing and re-installing the drivers and the CUDA toolkit.
Dan

won...@gridspace.com

unread,

Jun 23, 2016, 6:57:45 PM6/23/16

to kaldi-help, kim.y...@gmail.com, dpo...@gmail.com

I ran into the same issue after my computing infrastructure got CUDA 8.0.

EXCLUSIVE_THREAD is deprecated from CUDA 7.5 and now unsupported at CUDA 8.0.

http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/#deprecated-features

Is there anyone who successfully ran nnet2(or nnet3) training with EXCLUSIVE_PROCESS compute mode?

Daniel Povey

unread,

Jun 23, 2016, 7:20:50 PM6/23/16

to won...@gridspace.com, kaldi-help, Kim Young-Ik

EXCLUSIVE_PROCESS is fine. I'll change the documentation to say this.
Dan

Wonkyum Lee

unread,

Jun 23, 2016, 9:18:44 PM6/23/16

to kaldi-help, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com

I am using run.pl ...

SelectGpuId() does not seem to work properly with EXCLUSIVE_PROCESS option.

nnet3-train --print-interval=10 --momentum=0.0 --max-param-change=2.0 'nnet3-am-copy --raw=true --learning-rate=0.00143947980056 exp/nnet3/tdnn/1963.mdl - |' 'ark,bg:nnet3-copy-egs --frame=7 --left-context=16 --right-context=12 ark:exp/nnet3/tdnn/egs/egs.80.ark ark:- | nnet3-shuffle-egs --buffer-size=5000 --srand=1963 ark:- ark:-| nnet3-merge-egs --minibatch-size=512 --measure-output-frames=false --discard-partial-minibatches=true ark:- ark:- |' exp/nnet3/tdnn/1964.1.raw

WARNING (nnet3-train:SelectGpuId():cu-device.cc:137) Will try again to get a GPU after 20 seconds.

Thu Jun 23 18:06:13 2016

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 367.27 Driver Version: 367.27 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla K80 On | 0000:06:00.0 Off | 0 |

| N/A 28C P8 26W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 1 Tesla K80 On | 0000:07:00.0 Off | 0 |

| N/A 25C P8 28W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 2 Tesla K80 On | 0000:0A:00.0 Off | 0 |

| N/A 24C P8 26W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 3 Tesla K80 On | 0000:0B:00.0 Off | 0 |

| N/A 26C P8 28W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 4 Tesla K80 On | 0000:85:00.0 Off | 0 |

| N/A 26C P8 25W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 5 Tesla K80 On | 0000:86:00.0 Off | 0 |

| N/A 31C P8 29W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 6 Tesla K80 On | 0000:89:00.0 Off | 0 |

| N/A 24C P8 25W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

| 7 Tesla K80 On | 0000:8A:00.0 Off | 0 |

| N/A 27C P8 29W / 149W | 2MiB / 11439MiB | 0% E. Process |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

LOG (nnet3-train:SelectGpuId():cu-device.cc:146) num-gpus=8. Device 0: all CUDA-capable devices are busy or unavailable. Device 1: all CUDA-capable devices are busy or unavailable. Device 2: all CUDA-capable devices are busy or unavailable. Device 3: all CUDA-capable devices are busy or unavailable. Device 4: all CUDA-capable devices are busy or unavailable. Device 5: all CUDA-capable devices are busy or unavailable. Device 6: all CUDA-capable devices are busy or unavailable. Device 7: all CUDA-capable devices are busy or unavailable.

ERROR (nnet3-train:SelectGpuId():cu-device.cc:147) Failed to create CUDA context, no more unused GPUs?

[ Stack-Trace: ]

nnet3-train() [0x7ae280]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)

kaldi::MessageLogger::~MessageLogger()

kaldi::CuDevice::SelectGpuId(std::string)

main

__libc_start_main

nnet3-train() [0x591c59]

Daniel Povey

unread,

Jun 23, 2016, 10:09:58 PM6/23/16

to Wonkyum Lee, kaldi-help, Kim Young-Ik

I doubt this is related to the EXCLUSIVE_PROCESS option. If you just
upgraded your CUDA toolkit to 8.0, it *may* be possible that your code
is out of date and needs to be compiled with a more recent compute
capability (I doubt it, but it's possible)-- you could update your
code, "make clean" in cudamatrix/, and recompile.
Dan

Wonkyum Lee

unread,

Jun 23, 2016, 11:46:59 PM6/23/16

to kaldi-help, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com

Thanks for response Dan.

I removed kaldi which is already compiled and compiled one with reasonable compute capability. However, the issue still remains same.

It seems like that it has problem in getting CUDA context. Let me look into it more.

Jan Trmal

unread,

Jun 23, 2016, 11:49:16 PM6/23/16

to kaldi-help, won...@gridspace.com, kim.y...@gmail.com, Dan Povey

Wonkyum, did you try to run some of the examples from SDK? Those did work?

y.

Daniel Povey

unread,

Jun 23, 2016, 11:57:10 PM6/23/16

to Jan Trmal, kaldi-help, Wonkyum Lee, Kim Young-Ik

Hm. I don't know whether Kim Young-Ik found that reinstalling the
CUDA toolkit solved his problem.
The fact that he is the only one who had the same problem, and he also
was using the toolkit version 8.0, makes me think it might be a
problem with the 8.0 version of the toolkit that happens under certain
circumstances. I doubt that there is a problem in Kaldi code or
configuration that could make it incompatible with version 8.0, but
we'll find out.

Dan

Wonkyum Lee

unread,

Jun 24, 2016, 12:15:37 AM6/24/16

to kaldi-help, jtr...@gmail.com, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com

Thanks Dan and Yenda.

It seems like that it is cuda 8.0 and its driver problem. When I work with cuda 7.5 environment, whatever exclusive_thread or exclusive_process works fine.

At CUDA 8.0 and its driver (367.27), it did not pass any SDK samples. I will talk to Nvidia people.

Thanks,

Wonkyum

Wonkyum Lee

unread,

Jun 24, 2016, 12:16:31 AM6/24/16

to kaldi-help, jtr...@gmail.com, won...@gridspace.com, kim.y...@gmail.com, dpo...@gmail.com

I meant that At CUDA 8.0 and its driver (367.27), "exclusive_process" option did not pass any SDK samples.

Thanks,

Wonkyum

김영익

unread,

Jun 24, 2016, 12:22:42 AM6/24/16

to Wonkyum Lee, kaldi-help, jtr...@gmail.com, dpo...@gmail.com

Sorry for late response..

I also think there's some problem in CUDA 8.0 library on running EXCLUSIVE_PROCESS.

For my testing the NVIDIA GTX 1080 device, I have run the CUDA 8.0 library with DEFAULT mode.

To use my 4 GPU cards when running the "run.pl", I used a trick of sleeping about 3 seconds for each process^^ There was no problem on running DEFAULT mode in CUDA 8.0.

You can use this trick to use many GPUs~

Good-Luck~

--

김 영 익, 이학박사 / 부장

(주)엔씨소프트

경기도 성남시 분당구 대왕판교로 644번길 12, (우)463-400

tel: 02-6201-8741 mobile: 010-4400-5643 e-mail: you...@ncsoft.com

Wonkyum Lee

unread,

Jun 24, 2016, 12:30:53 AM6/24/16

to 김영익, kaldi-help, jtr...@gmail.com, dpo...@gmail.com

Thanks Young Ik.

That sounds reasonable quick fix.

Message has been deleted

Wonkyum Lee

unread,

Jun 24, 2016, 2:10:43 PM6/24/16

to kaldi-help, won...@gridspace.com, jtr...@gmail.com, dpo...@gmail.com, kim.y...@gmail.com

Nvidia is aware of this issue. They fixed this in the r361 branch yesterday. It works now.

Thanks Dan and Yenda for having looked at this problem. I appreciate it.

Xiang Li

unread,

Jun 29, 2016, 10:57:18 PM6/29/16

to kaldi-help, kim.y...@gmail.com

Except for setting exclusive mode, setting CUDA_VISIBLE_DEVICES is an alternative.

And I think it's a better method under some circumstances, for example when the number of

GPU is limited and you have other jobs to run on them.

But for kaldi, some parallel GPU jobs are started by run.pl and others are started in

background by shell, for example train_multisplice_accel2.sh ,

so setting CUDA_VISIBLE_DEVICES globally is not that straightforward.

It may work by adding the env to the script that starts GPU parallel jobs in background,

and using a 'setting CUDA_...' version run.pl when needed.

Xiang Li

unread,

Jun 29, 2016, 11:01:19 PM6/29/16

to kaldi-help

And for the distributed training across multiple nodes, by using slurm

which can automatically set CUDA_VISIBLE_DEVICES, it's quite convenient

to run the script without setting GPU exclusive.

在 2016年6月30日星期四 UTC+8上午10:57:18，Xiang Li写道：

Daniel Povey

unread,

Jun 30, 2016, 2:04:00 AM6/30/16

to kaldi-help

Kaldi does not require you to use GPU exclusive mode, it just strongly
recommends it. If CUDA_VISIBLE_DEVICES is being correctly set, then
only the chosen device will be visible and Kaldi will choose the
desired device. So there is nothing to prevent you running the
training in this way if that's what you want.

Dan

Reply all

Reply to author

Forward

Kaldi can't get CUDA context in EXCLUSIVE_PROCESS mode

kim.y...@gmail.com

Daniel Povey

kim.y...@gmail.com

kim.y...@gmail.com

Daniel Povey

won...@gridspace.com

Daniel Povey

Wonkyum Lee

Daniel Povey

Wonkyum Lee

Jan Trmal

Daniel Povey

Wonkyum Lee

Wonkyum Lee

김영익

Wonkyum Lee

Wonkyum Lee

Xiang Li

Xiang Li

Daniel Povey