Controlling GPU memory usage when using nnet3

403 views
Skip to first unread message

Adam Nitzan

unread,
Apr 19, 2021, 2:23:37 AM4/19/21
to kaldi-help
When running Libri training with nnet3 on GPU"s with a different memory capacity - the process always uses half of the available memory - ene though I didn't change any batch size parameter, etc.
Is there some dynamic allocation of the batch size - according to the GPU memory size ?
I want to utilize the memory better and tried changing different parameters
in the run_tdnn script - like :num-chunk-per-minibatch
But the memory utilization didn't change.
Would appreciate any help
tnx

Daniel Povey

unread,
Apr 19, 2021, 3:53:03 AM4/19/21
to kaldi-help
It's the --cuda-memory-proportion option, but it's not exposed unless the binary does
 RegisterCuAllocatorOptions()
before doing po.Read(argc, argv);



--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a9b2719f-6ce9-4f95-b2a2-73f144de0e83n%40googlegroups.com.

Adam Nitzan

unread,
Apr 19, 2021, 4:38:28 AM4/19/21
to kaldi...@googlegroups.com
Thanks Daniel.
Is there any reason this is configured by default to 50% ?
I didn't see any significant spikes in the GPU train memory usage - so I think I could safely enlarge this to 80-90% without having the risk of crashing on exceeding the GPU memory.

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/y-sH76YPwfM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyRxUJWE6E%3Dxe3qZSWYpa-eO0WNj%3DXNUgvp3vjsR9SQJPw%40mail.gmail.com.

Daniel Povey

unread,
Apr 19, 2021, 5:03:22 AM4/19/21
to kaldi-help
That's just the initial amount it uses; it grabs more as it needs to.
The only real problem with allocating a too-small amount is the problem of fragmenting the memory into small pieces,
which could cause a problem in principle with very large allocations (e.g. if no one region has enough memory).
So IMO you might want to decrease   that, but likely not increase it.



Adam Nitzan

unread,
Apr 26, 2021, 6:33:48 AM4/26/21
to kaldi...@googlegroups.com
From your answer I understand that the parameter is related to some initial memory allocation.
I was trying to ask about something else. Let me first clarify my premise and see if it makes sense
in the Kaldi context.

I am used to training models in tensorflow and pytorch.
When training with a GPU I usually see that the GPU cycle time is the same if I use 20% of the memory
or 95% of the memory - so I would usually increase my batchsize until I am close to full utilization of
the GPU memory to maximize my throughput (I usually update the params once every few mini-batches).
Does this make sense also in the Kaldi framework?
If it is so - my question was about how to achieve this,

(I am running a Kaldi train using the standard libri chain recipe:
kaldi/egs/librispeech/s5/local/chain/tuning/run_tdnn_1b.sh
In the train - I observe a constant usage of 50% of the GPU memory - I don't see the train process
grabbing more GPU memory along the way).


Adam Nitzan

unread,
Apr 26, 2021, 6:47:00 AM4/26/21
to kaldi...@googlegroups.com
Had a typo. I am using : kaldi/egs/librispeech/s5/local/chain/tuning/run_tdnn_1d.sh

Daniel Povey

unread,
Apr 26, 2021, 7:09:41 AM4/26/21
to kaldi-help
You can try increasing the batch size but I doubt that will help as we pretty much already tuned that in the recipes.
Kaldi is quite optimized in terms of memory usage during training, plus the models we use tend to be on the smaller side, so we're
not really limited by batch size.

Adam Nitzan

unread,
Apr 26, 2021, 7:44:44 AM4/26/21
to kaldi...@googlegroups.com
Thanks.
I think I'd like to play with it - even just to get a sense of the behaviour.
Which is the parameter that controls the batch size?

I tried changing :
num-chunk-per-minibatch - but didn't see any change in GPU memory usage.
No other param that is exposed in the run_tdnn seems fit.
(--minibatch-size in nnet3-chain-merge-egs?)

Daniel Povey

unread,
Apr 26, 2021, 8:45:23 AM4/26/21
to kaldi-help
Yes, num-chunk-per-minibatch.
You wouldn't see an effect on GPU memory until it was using more than the initially allocated amount.
Possibly a log message would be printed at the end of each run, saying the max amount of memory that was ever allocated.


Adam Nitzan

unread,
Apr 26, 2021, 8:51:33 AM4/26/21
to kaldi...@googlegroups.com
Thanks, that makes sense.
Will try.



Reply all
Reply to author
Forward
0 new messages