No CUDA GPU detected! no CUDA-capable device is detected

3,794 views
Skip to first unread message

Ryan Sharif

unread,
Jul 3, 2018, 5:43:33 PM7/3/18
to kaldi-help

I'm running into a bit of trouble getting multi-gpu working with a non-grid engine install:

nnet3-chain-train \
--use-gpu=wait \
--apply-deriv-weights=False \
--l2-regularize=5e-05 \
--leaky-hmm-coefficient=0.1 \
--write-cache=exp/chain/tdnn1g_sp/cache.1 \
--xent-regularize=0.1 \
--print-interval=10 \
--momentum=0.0 \
--max-param-change=1.41421356237 \
--backstitch-training-scale=0.0 \
--backstitch-training-interval=1 \
--l2-regularize-factor=0.5 \
--srand=0 "nnet3-am-copy \
--raw=true \
--learning-rate=0.002 \
--scale=1.0 exp/chain/tdnn1g_sp/0.mdl - |nnet3-copy \
--edits='set-dropout-proportion name=* proportion=0.0' - - |" exp/chain/tdnn1g_sp/den.fst 'ark,bg:nnet3-chain-copy-egs \
--frame-shift=1 ark:exp/chain/tdnn1g_sp/egs/cegs.1.ark ark:- | nnet3-chain-shuffle-egs \
--buffer-size=5000 \
--srand=0 ark:- ark:- | nnet3-chain-merge-egs \
--minibatch-size=128,64,32 ark:- ark:- |' exp/chain/tdnn1g_sp/1.1.raw
ERROR (nnet3-chain-train[5.4.192~1-8ce3a]:SelectGpuId():cu-device.cc:134)
No CUDA GPU detected!, diagnostics: cudaError_t 38 : "no CUDA-capable device is detected", in cu-device.cc:134

The above error occurs when I specifically set CUDA_VISIBLE_DEVICES=2 (the number of video cards on the system). If I do not specify this variable, then nvidia-smi reports that only a single card is running. Moreover, I've tried several settings:

--use-gpu=wait
--trainer.optimization.num-jobs-initial=2
--trainer.optimization.num-jobs-final=2

For context, I'm trying this on a freshly pulled copy of kaldi from Github using the mini librispeech recipe.

Lastly, for a sanity check, I rebooted the machine and performed a make test as well as a cuda test, e.g., cu-array-test with no issues.

Relevant hardware software info:
OS: Ubuntu 18.04
Video Cards: Titan XP, GeForce GTX 1080
nvidia-smi reports CUDA 9.1.85 390.67

I appreciate any help.

Daniel Povey

unread,
Jul 3, 2018, 5:51:24 PM7/3/18
to kaldi-help
CUDA_VISIBLE_DEVICES is not the number of devices, it's the
comma-separated ids of the device ids you want to be visible. E.g.
0,1 would mean both. As to why nvidia-smi is not picking up the other
device: I don't know why, but messing with CUDA_VISIBLE_DEVICES
certainly won't help you.
> --
> Go to http://kaldi-asr.org/forums.html find out how to join
> ---
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> To post to this group, send email to kaldi...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kaldi-help/9cf82723-6ec3-4e3a-bad3-31f312f451f0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ryan Sharif

unread,
Jul 5, 2018, 5:02:28 PM7/5/18
to kaldi-help
I misunderstood what the variable represented. Explicitly setting this variable to the ids that nvidia-smi reports for the cards, i.e., 
CUDA_VISIBLE_DEVICES=0,1
got my setup to work with both cards. Thank you, Dan.
Reply all
Reply to author
Forward
0 new messages