Maximize GPU memory usage?

charles....@cloudminds.com

unread,

May 14, 2018, 3:06:42 AM5/14/18

to kaldi-help

Hi,

When doing chain training, my max GPU memory usage is maybe 30-ish% or about 2.2GB. Are there any training or other config parameters that would max this out more, and is that worth it from a speed perspective? Or is the memory more a function of the network configuration?

Thanks!

Charles

Peter Smit

unread,

May 14, 2018, 3:48:40 AM5/14/18

to kaldi...@googlegroups.com

Someone can correct me if I'm wrong, but the memory usage is a function of network configuration, batch size (frames-per-chunk / chunk-per-minibatch) and in case of a recurrent network also the left/right context.

Note that simply changing the parameters for batch size and context is unlikely to be good for training performance, you would also need to adapt learning rate and possibly other parameters. Also, recent research [https://arxiv.org/abs/1804.07612] has shown that we should reconsider using large batches (but no idea how this applies to ASR and Kaldi in particular).

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a097e962-5dab-4c81-99e9-055ebdac2fcf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Povey

unread,

May 14, 2018, 3:21:45 PM5/14/18

to kaldi-help

You can certainly try increasing the minibatch size (num-chunk-per-minibatch).
But you'd have to test to see if the results were better or worse--
it's not guaranteed. And it likely wouldn't provide a vast speedup
either. It doesn't use much memory because at Hopkins we relied for a
long time on GPUs without much memory, so I went to a lot of effort to
make the nnet3 tools very memory efficient. Likely a solution based
on a standard framework would use quite a bit more memory.

Dan

> https://groups.google.com/d/msgid/kaldi-help/CABhvBWTQy43KDrRQtUaHVx4hSfHgGp1242tyoioyAHe6pDXYVQ%40mail.gmail.com.

charles....@cloudminds.com

unread,

May 15, 2018, 1:44:31 AM5/15/18

to kaldi-help

Dan,

Thanks for the feedback! Sounds like this may not go far.

Here's another angle: before I new about CUDA_VISIBLE_DEVICES, when I ran training, it ran multiple jobs on 1 GPU, and I was limited by the GPU memory (the GPUs were in default compute mode, allowing multiple threads). I could do 3 at once with 8 GB memory. With parallel GPUs, I can now do 4, and of course 4>3.

Is there a configuration that allows running both "wide" (across multiple GPUs) and "deep" (multiple threads on a GPU), so you could do like 12 jobs at once, 3 per GPU? I tried setting CUDA_VISIBLE_DEVICES with the default compute mode, and got and out of memory error; it looked like it never knew to go to the next GPU.

Thanks!

Charles

Daniel Povey

unread,

May 15, 2018, 1:54:38 AM5/15/18

to kaldi-help

There isn't a configuratin like that.
Sharing GPUs isn't generally a good idea, it will make things quite slow.

> https://groups.google.com/d/msgid/kaldi-help/7ef23fcd-2299-464b-ad16-964ae26b7719%40googlegroups.com.

Reply all

Reply to author

Forward