CUBLAS_STATUS_NOT_INITIALIZED Error

Prajwal Rao

unread,

Mar 6, 2018, 9:33:58 AM3/6/18

to kaldi-help

Hi all,

I have been trying to run librispeech example with some gsm data.

Although my cuda(version 8.0) is working alright. nnet-train-simple is throwing me an error



# nnet-shuffle-egs --buffer-size=5000 --srand=0 ark:exp/nnet5a_clean_100_gpu/egs/egs.1.0.ark ark:- | nnet-train-simple --minibatch-size=256 --srand=0 exp/nnet5a_clean_100_gpu/0.mdl ark:- exp/nnet5a_clean_100_gpu/1.1.mdl 
# Started at Tue Mar  6 19:47:59 IST 2018
#
nnet-shuffle-egs --buffer-size=5000 --srand=0 ark:exp/nnet5a_clean_100_gpu/egs/egs.1.0.ark ark:- 
nnet-train-simple --minibatch-size=256 --srand=0 exp/nnet5a_clean_100_gpu/0.mdl ark:- exp/nnet5a_clean_100_gpu/1.1.mdl 
WARNING (nnet-train-simple[5.2.119~123-807dc]:SelectGpuId():cu-device.cc:182) Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG (nnet-train-simple[5.2.119~123-807dc]:SelectGpuIdAuto():cu-device.cc:300) Selecting from 1 GPUs
LOG (nnet-train-simple[5.2.119~123-807dc]:SelectGpuIdAuto():cu-device.cc:315) cudaSetDevice(0): Quadro P5000 free:15373M, used:892M, total:16265M, free/total:0.945127
LOG (nnet-train-simple[5.2.119~123-807dc]:SelectGpuIdAuto():cu-device.cc:364) Trying to select device: 0 (automatically), mem_ratio: 0.945127
LOG (nnet-train-simple[5.2.119~123-807dc]:SelectGpuIdAuto():cu-device.cc:383) Success selecting device 0 free mem ratio: 0.945127
ERROR (nnet-train-simple[5.2.119~123-807dc]:FinalizeActiveGpu():cu-device.cc:217) cublasStatus_t 1 : "CUBLAS_STATUS_NOT_INITIALIZED" returned from 'cublasCreate(&handle_)'


[ Stack-Trace: ]


kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::CuDevice::FinalizeActiveGpu()
kaldi::CuDevice::SelectGpuId(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
main
__libc_start_main
_start




bash: line 1:  8284 Broken pipe             nnet-shuffle-egs --buffer-size=5000 --srand=0 ark:exp/nnet5a_clean_100_gpu/egs/egs.1.0.ark ark:-
      8285 Segmentation fault      (core dumped) | nnet-train-simple --minibatch-size=256 --srand=0 exp/nnet5a_clean_100_gpu/0.mdl ark:- exp/nnet5a_clean_100_gpu/1.1.mdl
# Accounting: time=2 threads=1
# Ended (code 139) at Tue Mar  6 19:48:01 IST 2018, elapsed time 2 seconds

Any suggestions?

Thanks in advance.

Regards,

Prajwal

Daniel Povey

unread,

Mar 6, 2018, 12:20:37 PM3/6/18

to kaldi-help

First try running the tests in cudamatrix/.

If they fail, it is likely either:

- You have an incompatible version of cublas on your path... do something like `ldd ./cu-vector-test` to figure out which one.

- There is some issue of write permission to your directory ~/.nv/. Yenda reported problems one time when that was on NFS, due to some kind of permission issue. You can do e.g. `strace ./cu-vector-test` to see whether, before it dies, it attempts to access some subdirectory of ~/.nv/.

Also, that error can occasionally arise randomly (non-repeatably), when running multiple jobs over NFS, due to contention for locking a file in ~/.nv/. That is due to a driver bug whose fix is going to be released soon. But since it happened on the first iteration, that won't be your problem.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/ae6f4d65-e485-4a64-817b-92d2f8fb523a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joe

unread,

Jan 12, 2019, 10:27:04 PM1/12/19

to kaldi-help

Hi Dan,

I encountered the same problem with cuda-9.0 and kaldi branch 5.4:

$ ./cu-device-test
LOG ([5.4.271~1-e50bd]:SelectGpuId():cu-device.cc:127) Manually selected to compute on CPU. 
......
LOG ([5.4.271~1-e50bd]:TestCuMatrixResize():cu-device-test.cc:76) For CuMatrix::Resize<double>, for size_multiple = 16, speed was 790.643 gigaflops.

LOG ([5.4.271~1-e50bd]:SelectGpuId():cu-device.cc:197) CUDA setup operating under Compute Exclusive Mode.
ERROR ([5.4.271~1-e50bd]:FinalizeActiveGpu():cu-device.cc:245) cublasStatus_t 1 : "CUBLAS_STATUS_NOT_INITIALIZED" returned from 'cublasCreate(&cublas_handle_)'


[ Stack-Trace: ]
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)

kaldi::FatalMessageLogger::~FatalMessageLogger()

kaldi::CuDevice::FinalizeActiveGpu()
kaldi::CuDevice::SelectGpuId(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
main
__libc_start_main
_start

terminate called after throwing an instance of 'std::runtime_error'
  what():
Aborted (core dumped)

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

$ ldd ./cu-device-test
        linux-vdso.so.1 =>  (0x00007ffdd41b2000)
        libcblas.so.3 => /usr/lib/libcblas.so.3 (0x00007fef76056000)
        liblapack_atlas.so.3 => /usr/lib/liblapack_atlas.so.3 (0x00007fef75dfa000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fef75bdd000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fef759d9000)
        libcublas.so.9.0 => /usr/local/cuda-9.0/lib64/libcublas.so.9.0 (0x00007fef725a3000)
        libcusparse.so.9.0 => /usr/local/cuda-9.0/lib64/libcusparse.so.9.0 (0x00007fef6ee3d000)
        libcudart.so.9.0 => /usr/local/cuda-9.0/lib64/libcudart.so.9.0 (0x00007fef6ebd0000)
        libcurand.so.9.0 => /usr/local/cuda-9.0/lib64/libcurand.so.9.0 (0x00007fef6ac6c000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fef6a8ea000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fef6a5e1000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fef6a3cb000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fef6a001000)
        libatlas.so.3 => /usr/lib/libatlas.so.3 (0x00007fef69a63000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fef69738000)
        libf77blas.so.3 => /usr/lib/libf77blas.so.3 (0x00007fef69518000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fef76278000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fef69310000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fef690d1000)

And the permission of ~/.nv is ok.

The kaldi source is the latest commit of the 5.4 branch, and I compiled kaldi by:

./configure --cudatk-dir=/usr/local/cuda-9.0
make

So could you please give me any suggestion?

Thank you.

Cheers,

Joe

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Jan 12, 2019, 10:51:29 PM1/12/19

to kaldi-help

Check what I asked the original poster to check, i.e. whether the directory

~/.nv/

has write permissions.

I haven't seen this error for a while and may not remember exactly how to debug it.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8efbfd43-fd69-4204-9a46-fc95bef8ef6b%40googlegroups.com.

Daniel Povey

unread,

Jan 12, 2019, 10:53:16 PM1/12/19

to kaldi-help

Also possibly doing

export CUDA_CACHE_DISABLE=1

in your profile or the path.sh might help.

Joe

unread,

Jan 13, 2019, 9:09:10 AM1/13/19

to kaldi-help

Thank you Dan, I tried CUDA_CACHE_DISABLE=1 and it worked! (although I don't understand the reason)

Amazing!

gaoxing...@163.com

unread,

Mar 18, 2019, 10:22:38 AM3/18/19

to dpovey, kaldi-help

Maybe I didn't make it clear. I mean on G layer or FSA layer, artificially construct a <eps> transtition, which is parallel to the symbol of spoken noise, so that if there is no noise in speech, then it can be skipped, if there is, then it can be detected.

On Sun, Mar 17, 2019 at 10:38 PM Daniel Povey <dpo...@gmail.com> wrote:
it is possible to work out from the normal alignments etc., whether there was silence there.
Search in ru.sh for get_prons.sh and dict_dir_add_pronprobs.sh, to see how they do it.

On Sun, Mar 17, 2019 at 10:30 PM gaoxing...@163.com <gaoxing...@163.com> wrote:

Also, I know that silence is actually such a principle. I want to know which program it is.

gaoxing...@163.com

From: Daniel Povey
Date: 2019-03-18 10:16
To: gaoxing...@163.com
Subject: Re: I have some question about alignment,thanks
You probably don't really want the #0 there, although in some circumstances it wouldn't matter because it would be deleted later anyway.

On Sun, Mar 17, 2019 at 10:13 PM gaoxing...@163.com <gaoxing...@163.com> wrote:
Hi, Dan

I want to get a alternative alignment graph , and I design the following structure.

I want to skip some special word when this word does not really exist in the wave.

But I found the resuting aligments will tranverse the "word" exactly but not the "#0" arc.

Is there something error ?

Thank~

Catch(03-18-10-2(03-18-22-21-32).jpg

Daniel Povey

unread,

Mar 18, 2019, 11:35:16 AM3/18/19

to kaldi-help

Well there are two separate issues: what to do in test time versus training. In training time you

only have a linear transcript, at least using the normal scripts, so you have to do it in L.fst.

I don't see how what you propose really differs from the normal way we do optional silence,

except you want the symbol to be displayed. It is quite possible to display it from the regular decoding,

because the information is not lost. E.g. use the --silence-label option to lattice-align-words. That only works,

though, if you have a single symbol for all your optional-silences.

Dan

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/201903182222323671395%40163.com.

gaoxing...@163.com

unread,

Mar 18, 2019, 9:58:40 PM3/18/19

to kaldi-help

Thank you very much for your patient explanation.

I understand what you mean.In fact, I want two optional symbols, not just one, in both the testing and training phases. The two are silence and spoken noise.

And both of them can be adjusted by different weights assigned artificially in advance and resulting different decoding paths.

This method can split speech more exactly.

Thanks~

Xinglong

gaoxing...@163.com

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyTWgVzpwH%3D9%2B0PJ_Oy4VHv08wT6YsCG3NtiGnuUXPCWug%40mail.gmail.com.

Catch(03-18-10-2(03-19-09-47-31).jpg

Daniel Povey

unread,

Mar 18, 2019, 10:21:16 PM3/18/19

to kaldi-help

Unless you have specific supervision information to train them separately, I doubt very much that you would get any benefit out of it. Even if you do have supervision for noise etc., it's usually best just to map it to silence.

You could certainly try to change the make_lexicon_* scripts to support adding a second optional silence, but it would require an understanding of FST determinization issues and disambiguation symbols.

Dan

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/201903190955430291188%40163.com.

gaoxing...@163.com

unread,

Mar 18, 2019, 10:44:10 PM3/18/19

to kaldi-help

Thank you very much.

Can I implement this strategy in the test phase?

I want to add a <eps> arc when constructing the decoding network.

This <eps> arc has weight assigned in advance, and it is parallel to <spoken noise>. In this way, if noise exists, it passes through, otherwise it will be skipped directly.

I've tried to do that.

However, I found that, in any case, the decoding path will go through this <spoken nosie> path.

gaoxing...@163.com

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQL0OjXMfz0mykK8KwtgzQpL6mLpU1onTLsfOiAO1%3DpwA%40mail.gmail.com.

Catch(03-18-10-2(03-19-10-39-00).jpg

Daniel Povey

unread,

Mar 18, 2019, 10:54:47 PM3/18/19

to kaldi-help

I suggest to search for hbka.pdf (chapter by Mohri) on FST-based decoding graph construction and try to read it, to understand disambiguation symbols.

I doubt you will be able to do this, and I don't have time to go through the individual steps.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/201903191044007573215%40163.com.

gaoxing...@163.com

unread,

Mar 18, 2019, 11:07:16 PM3/18/19

to kaldi-help

Okay, thanks.

I think this will be much easier implemented throgh tree-based decoder.

And <eps> can not be inserted in transcprits or arpa file, or it will not be determinizated.

All symbols should be have physical meanings.

gaoxing...@163.com

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQU%3DFjJQZ-OA%3D4b60He%2B9Z-qSgAMe_NHPFruh34faw_JA%40mail.gmail.com.

Catch(03-18-10-2(03-19-11-03-40).jpg

Daniel Povey

unread,

Mar 18, 2019, 11:15:23 PM3/18/19

to kaldi-help

that's not correct; read the thing I pointed you to.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/201903191106485649699%40163.com.

gaoxing...@163.com

unread,

May 6, 2019, 1:52:51 AM5/6/19

to kaldi-help

# nnet3-train --use-gpu=wait --read-cache=exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/cache.6 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --l2-regularize-factor=0.333333333333 --backstitch-training-interval=1 --srand=6 "nnet3-copy --learning-rate=0.00504944042974 --scale=1.0 exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/6.mdl - |" "ark,bg:nnet3-copy-egs --frame=5              ark:exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/egs/egs.21.ark ark:- |             nnet3-shuffle-egs --buffer-size=5000             --srand=6 ark:- ark:- |              nnet3-merge-egs --minibatch-size=512 ark:- ark:- |" exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/7.3.raw
# Started at Mon May  6 01:38:20 EDT 2019
#
nnet3-train --use-gpu=wait --read-cache=exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/cache.6 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --l2-regularize-factor=0.333333333333 --backstitch-training-interval=1 --srand=6 'nnet3-copy --learning-rate=0.00504944042974 --scale=1.0 exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/6.mdl - |' 'ark,bg:nnet3-copy-egs --frame=5              ark:exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/egs/egs.21.ark ark:- |             nnet3-shuffle-egs --buffer-size=5000             --srand=6 ark:- ark:- |              nnet3-merge-egs --minibatch-size=512 ark:- ark:- |' exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/7.3.raw
WARNING (nnet3-train[5.5]:SelectGpuId():

cu-device.cc:207) Waited 0 seconds before creating CUDA context
LOG (nnet3-train[5.5]:SelectGpuId():

cu-device.cc:216) CUDA setup operating under Compute Exclusive Mode.
ERROR (nnet3-train[5.5]:FinalizeActiveGpu():

cu-device.cc:264) cublasStatus_t 1 : "CUBLAS_STATUS_NOT_INITIALIZED" returned from 'cublasCreate(&cublas_handle_)'

[ Stack-Trace: ]
kaldi::MessageLogger::LogMessage() const
kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)
kaldi::CuDevice::FinalizeActiveGpu()
kaldi::CuDevice::SelectGpuId(std::string)
main
__libc_start_main
nnet3-train() [0x497319]

kaldi::KaldiFatalError
# Accounting: time=1 threads=1
# Ended (code 255) at Mon May 6 01:38:21 EDT 2019, elapsed time 1 seconds

gaoxing...@163.com

From: Daniel Povey
Date: 2019-01-13 11:53
To: kaldi-help

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyScKHZ3bRzpcORWxisxLpcSwb86b4rV9ekMpTymp8GR3g%40mail.gmail.com.

InsertPic_.png

gaoxing...@163.com

unread,

May 6, 2019, 1:54:12 AM5/6/19

to kaldi-help

# nnet3-train --use-gpu=wait --read-cache=exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/cache.6 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --l2-regularize-factor=0.333333333333 --backstitch-training-interval=1 --srand=6 "nnet3-copy --learning-rate=0.00504944042974 --scale=1.0 exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/6.mdl - |" "ark,bg:nnet3-copy-egs --frame=5              ark:exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/egs/egs.21.ark ark:- |             nnet3-shuffle-egs --buffer-size=5000             --srand=6 ark:- ark:- |              nnet3-merge-egs --minibatch-size=512 ark:- ark:- |" exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/7.3.raw
# Started at Mon May  6 01:38:20 EDT 2019
#
nnet3-train --use-gpu=wait --read-cache=exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/cache.6 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --l2-regularize-factor=0.333333333333 --backstitch-training-interval=1 --srand=6 'nnet3-copy --learning-rate=0.00504944042974 --scale=1.0 exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/6.mdl - |' 'ark,bg:nnet3-copy-egs --frame=5              ark:exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/egs/egs.21.ark ark:- |             nnet3-shuffle-egs --buffer-size=5000             --srand=6 ark:- ark:- |              nnet3-merge-egs --minibatch-size=512 ark:- ark:- |' exp/nnet3/1000h_segmented_6w_other_offline_1024_deeper/7.3.raw
WARNING (nnet3-train[5.5]:SelectGpuId():

cu-device.cc:207) Waited 0 seconds before creating CUDA context
LOG (nnet3-train[5.5]:SelectGpuId():

cu-device.cc:216) CUDA setup operating under Compute Exclusive Mode.

ERROR (nnet3-train[5.5]:FinalizeActiveGpu():

cu-device.cc:264) cublasStatus_t 1 : "CUBLAS_STATUS_NOT_INITIALIZED" returned from 'cublasCreate(&cublas_handle_)'

[ Stack-Trace: ]

kaldi::MessageLogger::LogMessage() const
kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)

kaldi::CuDevice::FinalizeActiveGpu()

kaldi::CuDevice::SelectGpuId(std::string)
main
__libc_start_main
nnet3-train() [0x497319]

kaldi::KaldiFatalError
# Accounting: time=1 threads=1
# Ended (code 255) at Mon May 6 01:38:21 EDT 2019, elapsed time 1 seconds

gaoxing...@163.com

From: Daniel Povey
Date: 2019-01-13 11:53
To: kaldi-help
Subject: Re: [kaldi-help] CUBLAS_STATUS_NOT_INITIALIZED Error

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyScKHZ3bRzpcORWxisxLpcSwb86b4rV9ekMpTymp8GR3g%40mail.gmail.com.

InsertPic_24D3.png

Daniel Povey

unread,

May 6, 2019, 12:29:13 PM5/6/19

to kaldi-help

Try adding

export CUDA_CACHE_DISABLE=1

to your path.sh.

That error can happen when there is difficulty locking something in the ~/.nv/ directory.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/201905061351552502560%40163.com.

seiten kaku

unread,

Feb 19, 2020, 11:47:14 AM2/19/20

to kaldi-help

I encountered the same problem after downgrading cuda from 10.2 to 10.1.

Following the suggestion by dan I did 'ldd ./cu-vector-test' and found libcublas.so and libcublasLt.so are still linked to their 10.2.xxx versions, so I modified the link and it worked.

Prajwal Rao於 2018年3月6日星期二 UTC+8下午10時33分58秒寫道：

Reply all

Reply to author

Forward