Error : run 'configure' on a machine that has the CUDA compiler 'nvcc' available.#

sreeji...@gmail.com

unread,

Oct 16, 2015, 9:18:56 AM10/16/15

to kaldi-help

Hi

I am trying to train a deep neural network acoustic model using cuda. Cuda is installed on the machine and the sample Cuda programs are executing fine. However, when I try to train the model using Kaldi, it gives the below error.

*******************************************************************************************************************************

steps/nnet/pretrain_dbn.sh --rbm-iter 1 data-fmllr-tri3/train exp/dnn4_pretrain-dbn

# INFO

steps/nnet/pretrain_dbn.sh : Pre-training Deep Belief Network as a stack of RBMs

dir : exp/dnn4_pretrain-dbn

Train-set : data-fmllr-tri3/train

### IS CUDA GPU AVAILABLE? 'LadyGaga-linux' ###

### CUDA WAS NOT COMPILED IN! ###

To support CUDA, you must run 'configure' on a machine that has the CUDA compiler 'nvcc' available.# Accounting: time=0 threads=1

# Ended (code 1) at Thu Oct 15 15:01:59 ADT 2015, elapsed time 0 seconds

# steps/nnet/pretrain_dbn.sh --rbm-iter 1 data-fmllr-tri3/train exp/dnn4_pretrain-dbn

# Started at Fri Oct 16 10:06:53 ADT 2015

#

steps/nnet/pretrain_dbn.sh --rbm-iter 1 data-fmllr-tri3/train exp/dnn4_pretrain-dbn

# INFO

steps/nnet/pretrain_dbn.sh : Pre-training Deep Belief Network as a stack of RBMs

dir : exp/dnn4_pretrain-dbn

Train-set : data-fmllr-tri3/train

*******************************************************************************************************************************

I have run the configure script and tried again. But no success. Could someone help me please?

Screenshot of executing 'configuration' and a sample cuda program is given below.

But even after configuring, the issue persists.

Thanks,

Sreejith

Daniel Povey

unread,

Oct 16, 2015, 1:52:01 PM10/16/15

to kaldi-help

It's not enough to run 'configure', you have to run 'make' after this. I think we were assuming the reader would infer this.

Dan

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jan Trmal

unread,

Oct 16, 2015, 1:55:31 PM10/16/15

to kaldi-help

Another possibility is that (for some reason) the program "cuda-gpu-available" is not on the path or haven't been compiled.

There is also an option to pretrain_dbn.sh "--skip-cuda-check true" which you can try

y.

Jesus_Is_Lord

unread,

Dec 29, 2017, 11:28:46 AM12/29/17

to kaldi-help

I've a quite similar issue with this log:

/home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/steps/nnet/pretrain_dbn.sh : Pre-training Deep Belief Network as a stack of RBMs

dir : /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn

Train-set : /home/user/1111/kaldi/data-fmllr/kaldi/System1/train '10875'

LOG ([5.3.24~1-c948]:main():cuda-gpu-available.cc:49)

### IS CUDA GPU AVAILABLE? 'marius' ###

### CUDA WAS NOT COMPILED IN! ###

To support CUDA, you must run 'configure' on a machine that has the CUDA compiler 'nvcc' available.# Accounting: time=1 threads=1

# Ended (code 1) at Fri Dec 29 16:04:53 CET 2017, elapsed time 1 seconds

queue.pl: Error submitting jobs to queue (return status was 32512)

queue log file is /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log, command was qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* -o /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log -l gpu=1 -q g.q /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.sh >>/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log 2>&1

Output of qsub was: sh: 1: qsub: not found

I've set export cuda_cmd="queue.pl --gpu 1", do you've any idea? thanks in advance!

Daniel Povey

unread,

Dec 29, 2017, 2:53:47 PM12/29/17

to kaldi-help

Well, I don't recommend to run that part of the script if you don't have an NVidia GPU (and you'd know if you had one; they are bulky)... because it will be extremely slow.

The error about qsub is because you are using queue.pl to parallelize (probably it's set that way in cmd.sh by default) and if you don't have GridEngine installed you should use run.sh. But again, don't run that part of the script, it would be super slow without a GPU.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---

You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/f6a2ca14-7831-42fc-b9ac-1a966f3f30f6%40googlegroups.com.

Jesus_Is_Lord

unread,

Dec 30, 2017, 5:14:39 PM12/30/17

to kaldi-help

now I'm using run.sh, but have this error:

ln: failed to create symbolic link 'links/??-?.?': File exists

ln: failed to create symbolic link 'links/??-??.?': File exists

wsj_data_prep.sh: Spot check of command line arguments failed

Command line arguments must be absolute pathnames to WSJ directories

with names like 11-13.1.

Note: if you have old-style WSJ distribution,

local/cstr_wsj_data_prep.sh may work instead, see run.sh for example.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Dec 30, 2017, 6:03:58 PM12/30/17

to kaldi-help

You probably don't even have the WSJ data.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7382f3e5-d275-479b-b725-0499bf213212%40googlegroups.com.

Jesus_Is_Lord

unread,

Dec 30, 2017, 6:19:55 PM12/30/17

to kaldi-help

no I don't think I've it. Instead I'm trying to install GridEngine and use queue.pl. Which one do you think is easier?

Jesus_Is_Lord

unread,

Dec 31, 2017, 8:30:20 PM12/31/17

to kaldi-help

I've installed and configured GrideEngine, but still have the error:

queue.pl: Error submitting jobs to queue (return status was 256)

queue log file is /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log, command was qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* -o /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.sh >>/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log 2>&1

Output of qsub was: Unable to run job: job rejected: your user id 0 is lower than minimum user id 1000 of cluster configuration

warning: root's job is not allowed to run in any queue

Exiting.

On Sunday, December 31, 2017 at 12:03:58 AM UTC+1, Dan Povey wrote:

Daniel Povey

unread,

Dec 31, 2017, 8:32:47 PM12/31/17

to kaldi-help

you can probably do

qconf -mconf

and edit min_uid from 1000 to 0.

You may have to edit min_gid as well.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8fb4c715-95d6-47dd-934e-c2855c2f8cdf%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 2, 2018, 6:46:11 AM1/2/18

to kaldi-help

thanks a lot! I edited both min_uid and min_gid from 1000 to 0, then it seems the script has moved one step further, but still have this error:queue.pl: Error submitting jobs to queue (return status was 256)

queue log file is /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log, command was qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* -o /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.sh >>/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/q/pretrain_dbn.log 2>&1

Output of qsub was: Unable to run job: warning: root's job is not allowed to run in any queue

Your job 9 ("pretrain_dbn.sh") has been submitted

Exiting.

Jesus_Is_Lord

unread,

Jan 2, 2018, 7:07:57 AM1/2/18

to kaldi-help

Morover the qstat shows: root@marius:/home/user/1111/kaldi/kaldi-script# qstat

job-ID prior name user state submit/start at queue slots ja-task-ID

-----------------------------------------------------------------------------------------------------------------

8 0.75000 pretrain_d root qw 01/02/2018 12:47:47 1

9 0.25000 pretrain_d root qw 01/02/2018 12:53:33 1

Jan Trmal

unread,

Jan 2, 2018, 7:26:37 AM1/2/18

to kaldi-help

I think you need to stop trying to run the tasks as root at this point and run them as a normal user.

y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c432a9a2-6f0b-4857-9afd-9ef6bd60e5e0%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 2, 2018, 7:36:48 AM1/2/18

to kaldi-help

I've already tried that option(running as a normal user), unfortunately, it complains about a file writing permission/privilege issue.

Jan Trmal

unread,

Jan 2, 2018, 7:39:53 AM1/2/18

to kaldi-help

then you have to fix the access permissions in the directory.

y.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/f0f62a8f-3793-48b7-9fc3-c5f95b1197e6%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 2, 2018, 8:30:12 AM1/2/18

to kaldi-help

fixed a permission issue, but have this log: (I tried to add a user 'yonad' through qconf -as yonasd can't resolve hostname "yonasd")

queue.pl: Error submitting jobs to queue (return status was 256)

queue log file is /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/q/make_fmllr_feats.log, command was qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* -o /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/q/make_fmllr_feats.log -t 1:2 /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/q/make_fmllr_feats.sh >>/home/user/1111/kaldi/data-fmllr/kaldi/System1/test/q/make_fmllr_feats.log 2>&1

Output of qsub was: Unable to run job: warning: yonasd's job is not allowed to run in any queue

Your job-array 12.1-2:1 ("make_fmllr_feats.sh") has been submitted

Exiting.

Daniel Povey

unread,

Jan 2, 2018, 3:18:37 PM1/2/18

to kaldi-help

You'd need to add a user as the UNIX level before adding it to GridEngine (qconf -am). I don't think it would be a good use of our time to help you much further, as your questions mostly don't touch core Kaldi issues-- you need someone with more UNIX experience.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e1a4c316-6369-46cb-b7c2-b886f9a42b48%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 3, 2018, 1:27:09 PM1/3/18

to kaldi-help

Dear Dan,

Sorry for taking your time as the Unix-support peoples aren't around due to the holiday. The user 'yonasd' is aleardy in the sudo group, I also added it to the admin group, then qconf -am, but still have the same error. qconf -sm gives

yonasd@marius:~/1111/kaldi/kaldi-script$ qconf -sm

root

sgeadmin

yonasd

Please could you help me? thanks in advance!

Daniel Povey

unread,

Jan 3, 2018, 3:34:55 PM1/3/18

to kaldi-help

I actually was wrong about the qconf -am command. It adds the user as a manager, which is not necessary. All users by default (assuming they are above the minimum userid) can run in the queue.

Probably you are submitting the job as root. You would need to change user (su) to the user you want to run as, but before that you'd have to chown the entire directory tree to that user, else you'd get permission problems.

But if you have further questions wait for your sysadmin, I have a lot to dol

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/f4b2da81-9ad2-4952-9e51-4233ff9e2110%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 4, 2018, 7:05:01 PM1/4/18

to kaldi-help

do you've any idea on:

Job 32 (-l arch=*64*) cannot run in queue "all.q@marius" because job requests unknown resource (arch)

Daniel Povey

unread,

Jan 4, 2018, 7:07:41 PM1/4/18

to kaldi-help

Delete

-l arch=*64*

wherever you find it in utils/queue.pl

and it should work.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/63d70066-c0ea-4b57-a88f-fbb85f6d5bd9%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 4, 2018, 7:43:50 PM1/4/18

to kaldi-help

Executes only one script and hanging(seems executing the next script) at this step: /home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/utils/validate_data_dir.sh, here is the log:

/home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/steps/nnet/make_fmllr_feats.sh --nj 2 --cmd /home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/utils/queue.pl --transform-dir /home/user/1111/kaldi/exp/System1/tri3b/decode /home/user/1111/kaldi/data-fmllr/kaldi/System1/test data/test /home/user/1111/kaldi/exp/System1/tri3b /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/log /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/data

/home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/steps/nnet/make_fmllr_feats.sh: feature type is lda_fmllr

/home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/utils/copy_data_dir.sh: copied data from data/test to /home/user/1111/kaldi/data-fmllr/kaldi/System1/test

Checking /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/text ...

--> reading /home/user/1111/kaldi/data-fmllr/kaldi/System1/test/text

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

/home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/utils/validate_data_dir.sh: Successfully validated data-directory /home/user/1111/kaldi/data-fmllr/kaldi/System1/test

I checked it with htop, but seems doing nothing

Daniel Povey

unread,

Jan 4, 2018, 9:04:16 PM1/4/18

to kaldi-help

probably an issue in your GridEngine setup, you can do `qstat` to see pending jobs and `qstat -j <job-id>` to see why it's pending.

but try to figure that out yourself

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/ffc9721d-e797-4eb5-a1ab-72f957bad22a%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 5, 2018, 11:10:00 AM1/5/18

to kaldi-help

As troubleshooting Gridengine is taking much time, I'm planning to run my script (run_dnn.sh)on Amazon EC2 to take the advantage of its GPU. Do you think it makes sense (doable) from your experience?

Daniel Povey

unread,

Jan 5, 2018, 3:33:02 PM1/5/18

to kaldi-help

Sure, that might work.

In any case, GridEngine is not an alternative to using a GPU; if the script requires a GPU then installing GridEngine is not going to help you run it.

Dan

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bda98f30-199b-424c-b208-9f7b59f7dc74%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 11, 2018, 6:48:14 PM1/11/18

to kaldi-help

I'm googling what this problem means bash: line 1: /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn_dnn/log/train_nnet.log: Text file busy, maybe if iyou have a quick answer. thanks!

Daniel Povey

unread,

Jan 11, 2018, 6:51:08 PM1/11/18

to kaldi-help

probably you are trying to modify that file while also trying to execute it-- which doesn't make sense as it is a log file.

you may have to do

lsof /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn_dnn/log/train_nnet.log

and kill any processes that are trying to execute it.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a185ce03-ddca-42a1-b65e-362b8a5f0d13%40googlegroups.com.

Jesus_Is_Lord

unread,

Jan 11, 2018, 7:11:58 PM1/11/18

to kaldi-help

Sorry it didn't help, just in case, I'm trying to put it in context:

yonasd@marius:~/1111/kaldi/kaldi-script$ ./05_run_dnn.sh

Training .................

# /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn_dnn/log/train_nnet.log /home/user/1111/kaldi-trunk/kaldi/egs/wsj/s5/steps/nnet/train.sh --feature-transform /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/final.feature_transform --dbn /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/6.dbn --hid-layers 0 --learn-rate 0.008 /home/user/1111/kaldi/data-fmllr/kaldi/System1//home/user/1111/kaldi/data/train_tr90 /home/user/1111/kaldi/data-fmllr/kaldi/System1//home/user/1111/kaldi/data/train_cv10 /home/user/1111/kaldi/lang /home/user/1111/kaldi/exp/System1/tri3b_ali /home/user/1111/kaldi/exp/System1/tri3b_ali /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn_dnn

# Started at Fri Jan 12 01:21:46 CET 2018

#

bash: line 1: /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn_dnn/log/train_nnet.log: Text file busy

Daniel Povey

unread,

Jan 11, 2018, 7:19:56 PM1/11/18

to kaldi-help

probably there was supposed to be run.pl at the beginning of that line and it got removed, causing the log file to be treated as something that needs to be executed.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e26ff5c8-d2a7-4572-80b3-65df51505881%40googlegroups.com.

Jesus_Is_Lord

unread,

Nov 14, 2018, 4:28:01 PM11/14/18

to kaldi-help

I've tried to configure Kaldi after installing a GPU card (GT730) along with nvidia drivers, nvidia-cuda-dev and nvidia-cuda-toolkit.

'./configure' looks OK as it gives me:

Configuring ...

Checking compiler clang++-3.8 ...

Checking OpenFst library in /home/yonasd/1111/kaldi-trunk/kaldi/tools/openfst ...

Doing OS specific configurations ...

On Linux: Checking for linear algebra header files ...

Using ATLAS as the linear algebra library.

Successfully configured for Debian/Ubuntu Linux [dynamic libraries] with ATLASLIBS =/usr/lib/libatlas.so.3 /usr/lib/libf77blas.so.3 /usr/lib/libcblas.so.3 /usr/lib/liblapack_atlas.so.3

Using CUDA toolkit /usr/ (nvcc compiler and runtime libraries)

Info: configuring Kaldi not to link with Speex (don't worry, it's only needed if you

intend to use 'compress-uncompress-speex', which is very unlikely)

SUCCESS

To compile: make clean -j; make depend -j; make -j

... or e.g. -j 10, instead of -j, to use a specified number of CPUs

Whereas, 'make clean -j 8; make depend -j 8; make -j 8' returns the following errors:

/home/yonasd/1111/kaldi-trunk/kaldi/tools/openfst/include/fst/fst.h:744: undefined reference to `fst::FstHeader::Write(std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const'

clang: error: linker command failed with exit code 1 (use -v to see invocation)

<builtin>: recipe for target 'lattice-to-nbest' failed

make[2]: *** [lattice-to-nbest] Error 1

clang: error: linker command failed with exit code 1 (use -v to see invocation)

<builtin>: recipe for target 'lattice-prune' failed

make[2]: *** [lattice-prune] Error 1

make[2]: Leaving directory '/home/yonasd/1111/kaldi-trunk/kaldi/src/latbin'

Makefile:142: recipe for target 'latbin' failed

make[1]: *** [latbin] Error 2

make[1]: Leaving directory '/home/yonasd/1111/kaldi-trunk/kaldi/src'

Makefile:35: recipe for target 'all' failed

make: *** [all] Error 2

Please do you have any idea? Thanks in advance!

Jan Trmal

unread,

Nov 14, 2018, 4:32:17 PM11/14/18

to kaldi...@googlegroups.com

this error message usually says that you compiled (by accident) the OpenFST using a different compiler or with differently set parameter -stdlib

I cannot help more from just looking at why you wrote.

y.

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d6ee452f-5869-4c29-b3ba-14378b5d24f3%40googlegroups.com.

Daniel Povey

unread,

Nov 14, 2018, 4:32:39 PM11/14/18

to kaldi...@googlegroups.com

Likely the default system compiler changed after you installed OpenFst, leading to those linking errors. If so, you should just remove the directory where OpenFst is installed (subdir of tools with name starting with 'openfst'), and do 'make' in tools/ again.

On Wed, Nov 14, 2018 at 4:28 PM Jesus_Is_Lord <yonasde...@gmail.com> wrote:

--

Jesus_Is_Lord

unread,

Nov 14, 2018, 6:42:56 PM11/14/18

to kaldi-help

thanks, that fixed the errors!

Jesus_Is_Lord

unread,

Nov 14, 2018, 8:51:01 PM11/14/18

to kaldi-help

Now the CUDA setup is operating under the Compute Exclusive Mode while dnn training. But,

# PRE-TRAINING RBM LAYER 1
# initializing '/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/1.rbm.init'
# pretraining '/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/1.rbm' (input gauss, lrate 0.01, iters 28)

is taking more than 1:30 hours (not finished yet). I really don't understand why it's still slow. Should I further configure and set up CUDA to speedup the training?

Daniel Povey

unread,

Nov 14, 2018, 8:55:20 PM11/14/18

to kaldi...@googlegroups.com

I don't know how long that's expected to take, I don't really use the nnet1 scripts. You would definitely have to run ./configure in src/, and recompile Kaldi, to make use of CUDA. It would be clear from the log messages if you were using CUDA.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d0e16dac-de53-4c59-8f22-2d32128de2f3%40googlegroups.com.

Jan Trmal

unread,

Nov 14, 2018, 8:57:19 PM11/14/18

to kaldi...@googlegroups.com

also, nvidia-smi might show you something, albeit on gaming cards I'm not sure how much detailed info you would get.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQitRe2vrqhpw9p1Zji3Zob%2BuB8mhtKNWOx03hgGXtbYQ%40mail.gmail.com.

Jesus_Is_Lord

unread,

Nov 14, 2018, 9:16:53 PM11/14/18

to kaldi-help

Already I did that after re-compiling opefst. And CUDA seems to be well detected and started to be used Kaldi. Also have this log info:

### HURRAY, WE GOT A CUDA GPU FOR COMPUTATION!!! ##

### Testing CUDA setup with a small computation (setup = cuda-toolkit + gpu-driver + kaldi):
### Test OK!

# PREPARING FEATURES
copy-feats --compress=true scp:/home/user/1111/kaldi/data-fmllr/kaldi/System1/train/feats.scp ark,scp:/tmp/kaldi.kdwD/train.ark,/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/train_sorted.scp
LOG (copy-feats[5.3.24~1392-c948]:main():copy-feats.cc:143) Copied 10875 feature matrices.
# 'apply-cmvn' not used,
feat-to-dim 'ark:copy-feats scp:/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/train.scp ark:- |' -
copy-feats scp:/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/train.scp ark:-
WARNING (feat-to-dim[5.3.24~1392-c948]:Close():kaldi-io.cc:512) Pipe copy-feats scp:/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/train.scp ark:- | had nonzero return status 36096
# feature dim : 40 (input of 'feature_transform')
+ default 'feature_transform_proto' with splice +/-5 frames
nnet-initialize --binary=false /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/splice5.proto /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/tr_splice5.nnet
VLOG[1] (nnet-initialize[5.3.24~1392-c948]:Init():nnet-nnet.cc:314) <Splice> <InputDim> 40 <OutputDim> 440 <BuildVector> -5:5 </BuildVector>
LOG (nnet-initialize[5.3.24~1392-c948]:main():nnet-initialize.cc:63) Written initialized model to /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/tr_splice5.nnet
# compute normalization stats from 10k sentences
compute-cmvn-stats ark:- /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/cmvn-g.stats
nnet-forward --print-args=true --use-gpu=yes /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/tr_splice5.nnet 'ark:copy-feats scp:/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/train.scp.10k ark:- |' ark:-
LOG (nnet-forward[5.3.24~1392-c948]:SelectGpuId():cu-device.cc:178) CUDA setup operating under Compute Exclusive Mode.
LOG (nnet-forward[5.3.24~1392-c948]:FinalizeActiveGpu():cu-device.cc:234) The active GPU is [0]: GeForce GT 730 free:1936M, used:65M, total:2001M, free/total:0.967462 version 3.5
copy-feats scp:/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/train.scp.10k ark:-
LOG (copy-feats[5.3.24~1392-c948]:main():copy-feats.cc:143) Copied 10000 feature matrices.
LOG (nnet-forward[5.3.24~1392-c948]:main():nnet-forward.cc:192) Done 10000 files in 0.767585min, (fps 143295)
LOG (compute-cmvn-stats[5.3.24~1392-c948]:main():compute-cmvn-stats.cc:168) Wrote global CMVN stats to /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/cmvn-g.stats
LOG (compute-cmvn-stats[5.3.24~1392-c948]:main():compute-cmvn-stats.cc:171) Done accumulating CMVN stats for 10000 utterances; 0 had errors.
# + normalization of NN-input at '/home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/tr_splice5_cmvn-g.nnet'
LOG (nnet-concat[5.3.24~1392-c948]:main():nnet-concat.cc:53) Reading /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/tr_splice5.nnet
LOG (nnet-concat[5.3.24~1392-c948]:main():nnet-concat.cc:65) Concatenating cmvn-to-nnet /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/cmvn-g.stats -|
cmvn-to-nnet /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/cmvn-g.stats -
LOG (cmvn-to-nnet[5.3.24~1392-c948]:main():cmvn-to-nnet.cc:114) Written cmvn in 'nnet1' model to: -
LOG (nnet-concat[5.3.24~1392-c948]:main():nnet-concat.cc:82) Written model to /home/user/1111/kaldi/exp/System1/dnn4b_pretrain-dbn/tr_splice5_cmvn-g.nnet

but the change doesn't seem to speedup the training compared to the non-gpu setting, though I'm not sure how much time it should take.

Jesus_Is_Lord

unread,

Nov 14, 2018, 9:20:44 PM11/14/18

to kaldi-help

@Yenda this what nvidia-smi is telling me:

yonasd@marius:~$ nvidia-smi
Thu Nov 15 01:04:11 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0 GeForce GT 730      Off | 00000000:01:00.0 N/A |                  N/A |
| N/A   95C    P0    N/A / N/A |    328MiB / 2001MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
| GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Daniel Povey

unread,

Nov 14, 2018, 9:28:30 PM11/14/18

to kaldi...@googlegroups.com

I don't know how long it should take either-- it depends how much data you have. It's certainly not super fast. It should be a lot faster on CPU though.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6fdcaace-7189-4dae-9ae5-5d43908d285b%40googlegroups.com.

Daniel Povey

unread,

Nov 14, 2018, 9:28:39 PM11/14/18

to kaldi...@googlegroups.com

I mean it should be a lot faster than on CPU.

Jan Trmal

unread,

Nov 14, 2018, 9:57:25 PM11/14/18

to kaldi...@googlegroups.com

OK, that doesn't say much. I guess you could check the temperature during normal conditions and the while running the pretraining. If the temperature doesn't change significantly, then you are probably not using gpu.

Also make sure you did 'make clean && make', otherwise you could end up with some programs using gpu and some not (I think). But I'm speculating on this one -- depends on too many things.

y.

On Wed, Nov 14, 2018 at 9:20 PM Jesus_Is_Lord <yonasde...@gmail.com> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6fdcaace-7189-4dae-9ae5-5d43908d285b%40googlegroups.com.

joseph.an...@gmail.com

unread,

Nov 15, 2018, 5:30:44 AM11/15/18

to kaldi-help

GT 730 is a pretty slow card and you only have 2GB of DDR3 Memory. Maybe even slower than current gen CPUs with AVX512. I remember running some tensorflow code and it was far slower on GT730 than with tensorflow compiled with AVX2 support. I suggest you try with atleast a GTX 1060 or a 1050 Ti.

Regards,

Anand

Jesus_Is_Lord

unread,

Jan 15, 2019, 12:52:53 AM1/15/19

to kaldi-help

Hi Yenda,

I'm trying to build multilingual ASR models with Amharic and Swahili datasets using babel scripts. While the Swahili's corpus is available in //catalog.ldc.upenn.edu/LDC2017S05, I couldn't find for Amharic (IARPA-babel307b-v1.0b) in LDC, and also seems unavailable in other places like IARPA-babel. Could you direct me places where it's available? In the mean time, I'm trying to adapt the datasets available in http://www.openslr.org/25/ to babel multilingual scripts, but it seems time taking to get them to work as these scripts written for babel datasets.

thanks in advance!

Daniel Povey

unread,

Jan 15, 2019, 1:05:22 AM1/15/19

to kaldi-help

They may not have publicly released all the datasets that were used in the evaluations.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8109e641-6548-4003-bd5f-f7057010636f%40googlegroups.com.

Reply all

Reply to author

Forward