ivector-extract-online2: error while loading shared libraries: libcudart.so.10.1:

157 views
Skip to first unread message

Sage Khan

unread,
Jul 27, 2022, 1:07:24 AM7/27/22
to kaldi-help
NOTE: THIS IS A SOLVED ISSUE I WANTED TO SHARE:

I had an issue with extracting vector using nnet.
The error was as follows:

steps/online/nnet2/train_ivector_extractor.sh --cmd run.pl --nj 10 --num-processes 2 data/train_100h_sp_hires exp_gv_100h/nnet3_8000_160000/diag_ubm exp_gv_100h/nnet3_8000_160000/extractor
steps/online/nnet2/train_ivector_extractor.sh: doing Gaussian selection and posterior computation
Accumulating stats (pass 0)
Summing accs (pass 0)
Updating model (pass 0)
Accumulating stats (pass 1)
Summing accs (pass 1)
Updating model (pass 1)
Accumulating stats (pass 2)
Summing accs (pass 2)
Updating model (pass 2)
Accumulating stats (pass 3)
Summing accs (pass 3)
Updating model (pass 3)
Accumulating stats (pass 4)
Summing accs (pass 4)
Updating model (pass 4)
Accumulating stats (pass 5)
Summing accs (pass 5)
Updating model (pass 5)
Accumulating stats (pass 6)
Summing accs (pass 6)
Updating model (pass 6)
Accumulating stats (pass 7)
Summing accs (pass 7)
Updating model (pass 7)
Accumulating stats (pass 8)
Summing accs (pass 8)
Updating model (pass 8)
Accumulating stats (pass 9)
Summing accs (pass 9)
Updating model (pass 9)
local/chain/Run_ivector.sh: extracting iVectors for training data
utils/data/modify_speaker_info.sh: copied data from data/train_100h_sp_hires to exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/train_100h_sp_hires_max2, number of speakers changed from 117 to 10095
utils/validate_data_dir.sh: Successfully validated data-directory exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/train_100h_sp_hires_max2
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 60 exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/train_100h_sp_hires_max2 exp_gv_100h/nnet3_8000_160000/extractor exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
run.pl: 60 / 60 failed, log is in exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/log/extract_ivectors.*.log

$ cat exp_gv_100h/nnet3_8000_160000/ivectors_train_100h_sp_hires/log/extract_ivectors.*.log >> ivector-error.txt 

The issue turned out to be that CUDA was not detected by kaldi. Probably I updated it after compiling Kaldi. So the steps/online/nnet2/extract_ivectoers_online.sh was not running. 
ivector-extract-online2: error while loading shared libraries: libcudart.so.10.1: cannot open shared object file: No such file or directory it cannot find your cuda librarytry to run it from your command line:
. ./path.shivector-extract-online2

I went back to KALDI_ROOT/src and did the make process again. You can do simple ./configure or you can do ./configure --shared --use-cuda --cudatk-dir=/usr/local/cuda ... Then make clean, make depend and make.

Ensure Nvidia SMi set to exclusive compute mode instead of default mode
To check:
nvidia-smi  --query | grep 'Compute Mode'

To change:
sudo nvidia-smi -c 3

(Found this in CUDA MATRIX documentation on kaldi-asr.org)

This fixed the issue.

One more thing... we add following to path.sh in the recipe
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/<path>/<to>/<cuda->/lib64
# usage export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/cm/local/apps/cuda/libs/current/lib64

Regards
Reply all
Reply to author
Forward
0 new messages