After updating CUDA from 6.5 to 7.5, error loading shared libarary

717 views
Skip to first unread message

林建廷

unread,
Nov 23, 2015, 5:02:35 AM11/23/15
to kaldi-help
After I Updated CUDA from 6.5 to 7.5,I re-compiled and built the Kaldi.
But when I ran the "train.sh" for DNN, it caused the following errors:

[...]
# NN-INITIALIZATION
Getting input/output dims :
feat-to-dim 'ark:copy-feats scp:exp/dnn-919-2048_1800-cv-lr0006-hv07-dr00/train.scp ark:- | nnet-forward exp/dnn-919-2048_1800-cv-lr0006-hv07-dr00/final.feature_transform ark:- ark:- |' -
nnet-forward: error while loading shared libraries: libcublas.so.6.5: cannot open shared object file: No such file or directory
copy-feats scp:exp/dnn-919-2048_1800-cv-lr0006-hv07-dr00/train.scp ark:-
ERROR (feat-to-dim:main():feat-to-dim.cc:58) Could not read any features (empty archive?)
WARNING (feat-to-dim:Close():kaldi-io.cc:446) Pipe copy-feats scp:exp/dnn-919-2048_1800-cv-lr0006-hv07-dr00/train.scp ark:- | nnet-forward exp/dnn-919-2048_1800-cv-lr0006-hv07-dr00/final.feature_transform ark:- ark:- | had nonzero return status 32512
ERROR (feat-to-dim:main():feat-to-dim.cc:58) Could not read any features (empty archive?)
[...]

Why does it still want to load old library(libcublas.so.6.5) but not the new one(libcublas.so.7.5)?
Could anyone help me thanks.

Daniel Povey

unread,
Nov 23, 2015, 3:20:05 PM11/23/15
to kaldi-help
Possibly it's a question of things not being recompiled after you changed your CUDA installation.  'make' does not always perfectly track system-level dependencies like that.  Doing 'make clean' in the cudamatrix/ directory and recompiling there, and then doing 'make' in src/, might help.
Of course, in all cases us a flag like '-j 8' to 'make', to keep it reasonably fast.
Dan


--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

林建廷

unread,
Nov 23, 2015, 10:17:09 PM11/23/15
to kaldi-help, dpo...@gmail.com
Thank you for reply.
I followed your steps: 
`make clean` in "src/cudamatrix" and `make`.
Then `make ext` in "src/" for online extensions installation.
But the error still occurred.

I found that I can type the command "nnet-forward" directly and it returns usage manual.
So it seems that the command can work correctly, right?
And I use `ldd nnet-forward` in "/src/nnetbin" to get the following:
      
        linux-vdso.so.1 =>  (0x00007ffd1268f000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa562aaf000)
        libkaldi-nnet.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-nnet.so (0x00007fa562809000)
        libkaldi-cudamatrix.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-cudamatrix.so (0x00007fa5622b7000)
        libkaldi-lat.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-lat.so (0x00007fa561af8000)
        libkaldi-hmm.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-hmm.so (0x00007fa5615d7000)
        libkaldi-tree.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-tree.so (0x00007fa5612ae000)
        libkaldi-matrix.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-matrix.so (0x00007fa560f84000)
        libkaldi-util.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-util.so (0x00007fa560d12000)
        libkaldi-base.so => /usr/local/kaldi-trunk_test/src/lib/libkaldi-base.so (0x00007fa560b09000)
        libfst.so.1 => /usr/local/kaldi-trunk_test/tools/openfst/lib/libfst.so.1 (0x00007fa56066a000)
        /usr/local/kaldi-trunk_new/tools/ATLAS/build/install/lib/libsatlas.so (0x00007fa55fd67000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa55fa61000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa55f843000)
        libcublas.so.7.5 => /usr/local/cuda/lib64/libcublas.so.7.5 (0x00007fa55df64000)
        libcudart.so.7.5 => /usr/local/cuda/lib64/libcudart.so.7.5 (0x00007fa55dd06000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa55d9f9000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa55d7e2000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa55d41d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa562cb3000)
        libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007fa55d0fc000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa55cef4000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007fa55ccb5000)

There is no dependency on "libcublas.so.6.5" but dependent on "libcublas.so.7.5".
So I don't know why it still loaded the 6.5 one.

Thank you again :)




Dan Povey於 2015年11月24日星期二 UTC+8上午4時20分05秒寫道:

Daniel Povey

unread,
Nov 23, 2015, 10:17:22 PM11/23/15
to kaldi-help
Possibly you have another version of Kaldi on your path in the location where you are running it.  You can do `which nnet-forward` to work out which version of the program you are picking up. 
If you can find someone local who is more experienced (in computers in general, not with Kaldi), they should be able to resolve the problem for you.
Dan


Possibly it's a question of things not being recompiled after you changed your CUDA installation.  'make' does not always perfectly track system-level dependencies like that.  Doing 'make clean' in the cudamatrix/ directory and recompiling there, and then doing 'make' in src/, might help.

paul89...@speech.cm.nctu.edu.tw

unread,
Nov 24, 2015, 1:28:48 AM11/24/15
to kaldi-help, dpo...@gmail.com
Thank you, I have resolved this problem.
When I use `which nnet-forward` through the command line directly, it returns "kaldi-trunk_test/src/nnetbin/nnet-forward".
"kaldi-trunk_test" is the version I re-compiled.
But when I add `which nnet-forward` in the "train.sh" to test which path it chooses to use "nnet-forward".
I found that the path in this shell is "kaldi_trunk/src/nnetbin/nnet-forward". ("kaldi_trunk" is the original version before updating CUDA)
Although I don't know why, I re-compile kaldi again and name it "kaldi_trunk" instead of "kaldi_trunk_test".
Then the problem never occurs.


Dan Povey於 2015年11月24日星期二 UTC+8上午11時17分22秒寫道:

Xingyu Na

unread,
Nov 24, 2015, 1:40:19 AM11/24/15
to kaldi...@googlegroups.com
train.sh takes path variable from path.sh. Did you check that?

Xingyu

paul89...@speech.cm.nctu.edu.tw

unread,
Nov 24, 2015, 2:11:12 AM11/24/15
to kaldi-help
Yes, you are right.
Sorry for my carelessness.


Xingyu Na於 2015年11月24日星期二 UTC+8下午2時40分19秒寫道:

Sage Khan

unread,
Jul 27, 2022, 1:03:22 AM7/27/22
to kaldi-help
I had a similar type of issue
The error was as follows:

steps/online/nnet2/train_ivector_extractor.sh --cmd run.pl --nj 10 --num-processes 2 data/train_100h_sp_hires exp_gv_100h/nnet3_8000_160000/diag_ubm exp_gv_100h/nnet3_8000_160000/extractor
steps/online/nnet2/train_ivector_extractor.sh: doing Gaussian selection and posterior computation
Accumulating stats (pass 0)
Summing accs (pass 0)
Updating model (pass 0)
Accumulating stats (pass 1)
Summing accs (pass 1)
Updating model (pass 1)
Accumulating stats (pass 2)
Summing accs (pass 2)
Updating model (pass 2)
Accumulating stats (pass 3)
Summing accs (pass 3)
Updating model (pass 3)
Accumulating stats (pass 4)
Summing accs (pass 4)
Updating model (pass 4)
Accumulating stats (pass 5)
Summing accs (pass 5)
Updating model (pass 5)
Accumulating stats (pass 6)
Summing accs (pass 6)
Updating model (pass 6)
Accumulating stats (pass 7)
Summing accs (pass 7)
Updating model (pass 7)
Accumulating stats (pass 8)
Summing accs (pass 8)
Updating model (pass 8)
Accumulating stats (pass 9)
Summing accs (pass 9)
Updating model (pass 9)
local/chain/Run_ivector.sh: extracting iVectors for training data
utils/data/modify_speaker_info.sh: copied data from data/train_100h_sp_hires to exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/train_100h_sp_hires_max2, number of speakers changed from 117 to 10095
utils/validate_data_dir.sh: Successfully validated data-directory exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/train_100h_sp_hires_max2
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 60 exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/train_100h_sp_hires_max2 exp_gv_100h/nnet3_8000_160000/extractor exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
run.pl: 60 / 60 failed, log is in exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/log/extract_ivectors.*.log

$ cat exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/log/extract_ivectors.*.log >> ivector-error.txt 

The issue turned out to be that CUDA was not detected by kaldi. Probably I updated it after compiling Kaldi. So the steps/online/nnet2/extract_ivectoers_online.sh was not running. 
ivector-extract-online2: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory it cannot find your cuda librarytry to run it from your command line:
. ./path.shivector-extract-online2

I went back to KALDI_ROOT/src and did the make process again. You can do simple ./configure or you can do ./configure --shared --use-cuda --cudatk-dir=/usr/local/cuda ... Then make clean, make depend and make.

Ensure Nvidia SMi set to exclusive compute mode instead of default mode
To check:
nvidia-smi  --query | grep 'Compute Mode'

To change:
sudo nvidia-smi -c 3

This fixed the issue

Regards
Reply all
Reply to author
Forward
0 new messages