How to set up correct options to run two GPU on kaldi nnet2

brucesp...@gmail.com

unread,

Jul 26, 2016, 8:14:20 PM7/26/16

to kaldi-help

My hardware conditions are:

Two Tesla M40 GPU, each with 24GB memory

Server: 64CPU, 129GB memory

The involved script is:

https://github.com/kaldi-asr/kaldi/blob/master/egs/sre10/v1/local/dnn/train_multisplice_accel2.sh

The following can be successfully run (which needs 6 days)

nvidia-smi –c 0

local/dnn/train_multisplice_accel2.sh --stage -10 --feat-type raw --splice-indexes “layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2” --num-epochs 6 --num-hidden-layers 6 --num-jobs-initial 3 --num-jobs-final 18 --num-threads 1 --minibatch-size 512 --parallel-opts -l gpu=1 --mix-up 10500 --initial-effective-lrate 0.0015 --final-effective-lrate 0.00015 --cmd run.pl --egs-dir --pnorm-input-dim 3500 --pnorm-output-dim 350 data/train_hires_asr data/lang exp/tri5a exp/nnet2_online/nnet_ms_a

So I want to speed it up without sacrifice system performance and I will try the following:

nvidia-smi –c 1

local/dnn/train_multisplice_accel2.sh --stage -10 --feat-type raw --splice-indexes “layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2” --num-epochs 6 --num-hidden-layers 6 --num-jobs-initial 3 --num-jobs-final 18 --num-threads 2 --minibatch-size 512 --parallel-opts -l gpu=2 --mix-up 10500 --initial-effective-lrate 0.0015 --final-effective-lrate 0.00015 --cmd run.pl --egs-dir --pnorm-input-dim 3500 --pnorm-output-dim 350 data/train_hires_asr data/lang exp/tri5a exp/nnet2_online/nnet_ms_a

Is the above modification right?

To make it more convenient for your suggestions:

I plan to replace the following:

--num-threads 1 --parallel-opts -l gpu=1

with

--num-threads 2 --parallel-opts -l gpu=1

Should I also replace the following:

--num-jobs-initial 3 --num-jobs-final 18

with

--num-jobs-initial 2 --num-jobs-final 2 (according to: https://groups.google.com/forum/#!searchin/kaldi-help/two$20gpu/kaldi-help/ZXBTNFZOb7k/iwOOA1fXAgAJ )

Thanks for all your help.

Bruce

Daniel Povey

unread,

Jul 26, 2016, 9:04:55 PM7/26/16

to kaldi-help

No, the "-l gpu=1" option just tells it how to reserve a single GPU
via GridEngine (assuming you're using that). You shouldn't change
that.

You could try reducing the --num-jobs-initial and --num-jobs-final
both to 2 (which is how many GPUs you have) and reducing the number of
epochs from 6 to, say, 3 or 4. That would save a little time (due to
fewer epochs) and the performance might be about the same.

Dan

> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Vijayaditya Peddinti

unread,

Jul 26, 2016, 9:05:26 PM7/26/16

to kaldi-help

Even if you want to use multiple GPUs num-threads has to be 1, as this value determines if GPU code is going to be used or not.

352 if [ $num_threads -eq 1 ]; then

353 parallel_suffix="-simple" # this enables us to use GPU code if

354 # we have just one thread.

355 parallel_train_opts=

356 if ! cuda-compiled; then

357 echo "$0: WARNING: you are running with one thread but you have not compiled"

358 echo " for CUDA. You may be running a setup optimized for GPUs. If you have"

359 echo " GPUs and have nvcc installed, go to src/ and do ./configure; make"

360 fi

361 else

362 parallel_suffix="-parallel"

363 parallel_train_opts="--num-threads=$num_threads"

364 fi

As you have just 2 GPUs you can use --num-jobs-initial 2 --num-jobs-final 2. This has the added benefit that your optimization would be better with smaller number of jobs (see http://arxiv-web3.library.cornell.edu/abs/1410.7455). However if you do not have sufficient data you might have to reduce the model parameters as better optimization in some cases leads to overfitting.

--Vijay

--

Bruce Park

unread,

Jul 27, 2016, 12:20:35 AM7/27/16

to kaldi...@googlegroups.com

Daniel and Vijay,

Thank you so much for your timely support and detailed explantion.

That solves all my concerns.

Bruce

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/6J7TEwDVnNk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.

Reply all

Reply to author

Forward