Controlling GPU memory usage when using nnet3

Adam Nitzan

unread,

Apr 19, 2021, 2:23:37 AM4/19/21

to kaldi-help

When running Libri training with nnet3 on GPU"s with a different memory capacity - the process always uses half of the available memory - ene though I didn't change any batch size parameter, etc.
Is there some dynamic allocation of the batch size - according to the GPU memory size ?

I want to utilize the memory better and tried changing different parameters

in the run_tdnn script - like :num-chunk-per-minibatch

But the memory utilization didn't change.

Would appreciate any help

tnx

Daniel Povey

unread,

Apr 19, 2021, 3:53:03 AM4/19/21

to kaldi-help

It's the --cuda-memory-proportion option, but it's not exposed unless the binary does

RegisterCuAllocatorOptions()

before doing po.Read(argc, argv);

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a9b2719f-6ce9-4f95-b2a2-73f144de0e83n%40googlegroups.com.

Adam Nitzan

unread,

Apr 19, 2021, 4:38:28 AM4/19/21

to kaldi...@googlegroups.com

Thanks Daniel.

Is there any reason this is configured by default to 50% ?

I didn't see any significant spikes in the GPU train memory usage - so I think I could safely enlarge this to 80-90% without having the risk of crashing on exceeding the GPU memory.

You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/y-sH76YPwfM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyRxUJWE6E%3Dxe3qZSWYpa-eO0WNj%3DXNUgvp3vjsR9SQJPw%40mail.gmail.com.

Daniel Povey

unread,

Apr 19, 2021, 5:03:22 AM4/19/21

to kaldi-help

That's just the initial amount it uses; it grabs more as it needs to.

The only real problem with allocating a too-small amount is the problem of fragmenting the memory into small pieces,

which could cause a problem in principle with very large allocations (e.g. if no one region has enough memory).

So IMO you might want to decrease that, but likely not increase it.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAF_r3JSEH%3DPw5uBM2wfa6W6jKQxE6S3a5UG%3DX9hOx98AKqfCLw%40mail.gmail.com.

Adam Nitzan

unread,

Apr 26, 2021, 6:33:48 AM4/26/21

to kaldi...@googlegroups.com

From your answer I understand that the parameter is related to some initial memory allocation.
I was trying to ask about something else. Let me first clarify my premise and see if it makes sense

in the Kaldi context.

I am used to training models in tensorflow and pytorch.

When training with a GPU I usually see that the GPU cycle time is the same if I use 20% of the memory

or 95% of the memory - so I would usually increase my batchsize until I am close to full utilization of

the GPU memory to maximize my throughput (I usually update the params once every few mini-batches).

Does this make sense also in the Kaldi framework?

If it is so - my question was about how to achieve this,

(I am running a Kaldi train using the standard libri chain recipe:
kaldi/egs/librispeech/s5/local/chain/tuning/run_tdnn_1b.sh

In the train - I observe a constant usage of 50% of the GPU memory - I don't see the train process

grabbing more GPU memory along the way).

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyTkL%3DHu9E%2BABy-3-mX9zUN1AA7ssBfHNVzDaO6ygCMfiw%40mail.gmail.com.

Adam Nitzan

unread,

Apr 26, 2021, 6:47:00 AM4/26/21

to kaldi...@googlegroups.com

Had a typo. I am using : kaldi/egs/librispeech/s5/local/chain/tuning/run_tdnn_1d.sh

Daniel Povey

unread,

Apr 26, 2021, 7:09:41 AM4/26/21

to kaldi-help

You can try increasing the batch size but I doubt that will help as we pretty much already tuned that in the recipes.

Kaldi is quite optimized in terms of memory usage during training, plus the models we use tend to be on the smaller side, so we're

not really limited by batch size.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAF_r3JTc%3DLaWHKg8crZg71mfQ6%3DErEdBmeQFDN%2BzLZ0F5uxNzw%40mail.gmail.com.

Adam Nitzan

unread,

Apr 26, 2021, 7:44:44 AM4/26/21

to kaldi...@googlegroups.com

Thanks.

I think I'd like to play with it - even just to get a sense of the behaviour.

Which is the parameter that controls the batch size?

I tried changing :

num-chunk-per-minibatch - but didn't see any change in GPU memory usage.

No other param that is exposed in the run_tdnn seems fit.

(--minibatch-size in nnet3-chain-merge-egs?)

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQoD3%3DibpjLuTFry1rfUDoi-9%3DHVzbN3saWfHNjoHPdpQ%40mail.gmail.com.

Daniel Povey

unread,

Apr 26, 2021, 8:45:23 AM4/26/21

to kaldi-help

Yes, num-chunk-per-minibatch.

You wouldn't see an effect on GPU memory until it was using more than the initially allocated amount.

Possibly a log message would be printed at the end of each run, saying the max amount of memory that was ever allocated.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAF_r3JRmsDRBCUAXUwooix0uqYBFDTqz_rCAe48pZGECbrBwzw%40mail.gmail.com.

Adam Nitzan

unread,

Apr 26, 2021, 8:51:33 AM4/26/21

to kaldi...@googlegroups.com

Thanks, that makes sense.

Will try.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyRsLS5yqF_CdQS-nCv7_Aq_30oKM1o%2Bek4q45YdqSxPOw%40mail.gmail.com.

Reply all

Reply to author

Forward