How to set up correct options to run two GPU on kaldi nnet2

1,824 views
Skip to first unread message

brucesp...@gmail.com

unread,
Jul 26, 2016, 8:14:20 PM7/26/16
to kaldi-help

My hardware conditions are:

Two Tesla M40 GPU, each with 24GB memory

Server: 64CPU, 129GB memory

The involved script is:

https://github.com/kaldi-asr/kaldi/blob/master/egs/sre10/v1/local/dnn/train_multisplice_accel2.sh

 

The following can be successfully run (which needs 6 days)

nvidia-smi –c 0

local/dnn/train_multisplice_accel2.sh --stage -10  --feat-type raw --splice-indexes “layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2” --num-epochs 6 --num-hidden-layers 6 --num-jobs-initial 3 --num-jobs-final 18 --num-threads 1 --minibatch-size 512 --parallel-opts -l gpu=1 --mix-up 10500 --initial-effective-lrate 0.0015 --final-effective-lrate 0.00015 --cmd run.pl --egs-dir  --pnorm-input-dim 3500 --pnorm-output-dim 350 data/train_hires_asr data/lang exp/tri5a exp/nnet2_online/nnet_ms_a

 

So I want to speed it up without sacrifice system performance and I will try the following:

nvidia-smi –c 1

local/dnn/train_multisplice_accel2.sh --stage -10 --feat-type raw --splice-indexes “layer0/-2:-1:0:1:2 layer1/-1:2 layer3/-3:3 layer4/-7:2” --num-epochs 6 --num-hidden-layers 6 --num-jobs-initial 3 --num-jobs-final 18 --num-threads 2 --minibatch-size 512 --parallel-opts -l gpu=2 --mix-up 10500 --initial-effective-lrate 0.0015 --final-effective-lrate 0.00015 --cmd run.pl --egs-dir  --pnorm-input-dim 3500 --pnorm-output-dim 350 data/train_hires_asr data/lang exp/tri5a exp/nnet2_online/nnet_ms_a

 

Is the above modification right?

To make it more convenient for your suggestions:

I plan to replace the following:

--num-threads 1  --parallel-opts -l gpu=1

with

--num-threads 2  --parallel-opts -l gpu=1


Should I also replace the following:
--num-jobs-initial 3 --num-jobs-final 18
with

--num-jobs-initial 2 --num-jobs-final 2 (according to: https://groups.google.com/forum/#!searchin/kaldi-help/two$20gpu/kaldi-help/ZXBTNFZOb7k/iwOOA1fXAgAJ )

 

Thanks for all your help.

 

Bruce

Daniel Povey

unread,
Jul 26, 2016, 9:04:55 PM7/26/16
to kaldi-help
No, the "-l gpu=1" option just tells it how to reserve a single GPU
via GridEngine (assuming you're using that). You shouldn't change
that.

You could try reducing the --num-jobs-initial and --num-jobs-final
both to 2 (which is how many GPUs you have) and reducing the number of
epochs from 6 to, say, 3 or 4. That would save a little time (due to
fewer epochs) and the performance might be about the same.

Dan
> --
> You received this message because you are subscribed to the Google Groups
> "kaldi-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kaldi-help+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Vijayaditya Peddinti

unread,
Jul 26, 2016, 9:05:26 PM7/26/16
to kaldi-help
Even if you want to use multiple GPUs num-threads has to be 1, as this value determines if GPU code is going to be used or not.

352 if [ $num_threads -eq 1 ]; then
353   parallel_suffix="-simple" # this enables us to use GPU code if
354                          # we have just one thread.
355   parallel_train_opts=
356   if ! cuda-compiled; then
357     echo "$0: WARNING: you are running with one thread but you have not compiled"
358     echo "   for CUDA.  You may be running a setup optimized for GPUs.  If you have"
359     echo "   GPUs and have nvcc installed, go to src/ and do ./configure; make"
360   fi
361 else
362   parallel_suffix="-parallel"
363   parallel_train_opts="--num-threads=$num_threads"
364 fi


As you have just 2 GPUs you can use --num-jobs-initial 2 --num-jobs-final 2.  This has the added benefit that your optimization would be better with smaller number of jobs (see http://arxiv-web3.library.cornell.edu/abs/1410.7455). However if you do not have sufficient data you might have to reduce the model parameters as better optimization in some cases leads to overfitting.

--Vijay

--

Bruce Park

unread,
Jul 27, 2016, 12:20:35 AM7/27/16
to kaldi...@googlegroups.com
Daniel and Vijay,
Thank you so much for your timely support and detailed explantion.
That solves all my concerns.

Bruce

--
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/6J7TEwDVnNk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages