GPU out of memory error

1,998 views
Skip to first unread message

Jaskaran Singh Puri

unread,
Apr 30, 2019, 12:19:25 AM4/30/19
to kaldi-help
I'm training a nnet3 model. However, I'm getting the "GPU out of memory error", the Nvidia GPU I have is 16 GB while still kaldi fails to allocate 3 GB when required.

It also says to run GPU in exclusive mode, which I cannot as I do not have root permissions. Is there another way around? I've already reduced the mini-batch size to 32 from 128
Should I reduce this more?

Please guide

Daniel Povey

unread,
Apr 30, 2019, 12:21:04 AM4/30/19
to kaldi-help
You may be able to get around it by reducing the number of jobs (e.g. --num-jobs-initial and --num-jobs-final) to no more than the number of GPUs you have (e.g. 1).  The results may change though.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/2f7dc384-4efa-4774-bef3-010502c8be4a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jaskaran Singh Puri

unread,
Apr 30, 2019, 12:23:55 AM4/30/19
to kaldi-help
Thanks, but are you saying it may have a significant impact on the WER?


On Tuesday, April 30, 2019 at 9:51:04 AM UTC+5:30, Dan Povey wrote:
You may be able to get around it by reducing the number of jobs (e.g. --num-jobs-initial and --num-jobs-final) to no more than the number of GPUs you have (e.g. 1).  The results may change though.


On Tue, Apr 30, 2019 at 12:19 AM Jaskaran Singh Puri <jaskar...@gmail.com> wrote:
I'm training a nnet3 model. However, I'm getting the "GPU out of memory error", the Nvidia GPU I have is 16 GB while still kaldi fails to allocate 3 GB when required.

It also says to run GPU in exclusive mode, which I cannot as I do not have root permissions. Is there another way around? I've already reduced the mini-batch size to 32 from 128
Should I reduce this more?

Please guide

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Apr 30, 2019, 12:27:37 AM4/30/19
to kaldi-help
It may have some impact as it could affect the tuning.  You should probably reduce the number of epochs by about 25% or so.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Jaskaran Singh Puri

unread,
Jun 22, 2019, 11:55:26 PM6/22/19
to kaldi-help
So, I'm again running into this issue, Kaldi is trying to allocate around 16 GB in GPU, whereas i can see 32GB free memory in the GPU logs. I have both num-jobs params set to 1, and batch size is 128.

I still can't run GPU exclusively, what could be a possible work around here? Do I have to increase number of GPUs or reduce batch size? 

Daniel Povey

unread,
Jun 23, 2019, 11:20:14 AM6/23/19
to kaldi-help
Sounds to me like you are not accurately describing the problem.  You should always show a screen paste.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Jaskaran Singh Puri

unread,
Jun 24, 2019, 2:51:29 AM6/24/19
to kaldi-help
nnet3-chain-train --use-gpu=yes --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/cache.2722 --write-cache=/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/cache.2723 --xent-regularize=0.1 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=1.0 --srand=2722 'nnet3-am-copy --raw=true --learning-rate=0.0002972304523027908 --scale=1.0 /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/2722.mdl - |' /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/den.fst 'ark,bg:nnet3-chain-copy-egs                          --frame-shift=2                         ark:/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/egs/cegs.427.ark ark:- |                         nnet3-chain-shuffle-egs --buffer-size=5000                         --srand=2722 ark:- ark:- | nnet3-chain-merge-egs                         --minibatch-size=64 ark:- ark:- |' /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/2723.1.raw
WARNING
(nnet3-chain-train[5.5]:SelectGpuId():cu-device.cc:211) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:331) Selecting from 1 GPUs
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:346) cudaSetDevice(0): Tesla V100-SXM2-32GB  free:32162M, used:318M, total:32480M, free/total:0.990198
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:393) Trying to select device: 0 (automatically), mem_ratio: 0.990198
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:412) Success selecting device 0 free mem ratio: 0.990198
LOG
(nnet3-chain-train[5.5]:FinalizeActiveGpu():cu-device.cc:266) The active GPU is [0]: Tesla V100-SXM2-32GB   free:31968M, used:512M, total:32480M, free/total:0.984225 version 7.0
nnet3
-am-copy --raw=true --learning-rate=0.0002972304523027908 --scale=1.0 /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/2722.mdl -
LOG
(nnet3-chain-train[5.5]:PrintMemoryUsage():cu-allocator.cc:368) Memory usage: 0/0 bytes currently allocated/total-held; 0/0 blocks currently allocated/free; largest free/allocated block sizes are 0/0; time taken total/cudaMalloc is 0/0.503543, synchronized the GPU 0 times out of 0 frees; device memory info: free:31968M, used:512M, total:32480M, free/total:0.984225maximum allocated: 0current allocated: 0
ERROR
(nnet3-chain-train[5.5]:AllocateNewRegion():cu-allocator.cc:519) Failed to allocate a memory region of 16761487360 bytes.  Possibly this is due to sharing the GPU.  Try switching the GPUs to exclusive mode (nvidia-smi -c 3) and using the option --use-gpu=wait to scripts like steps/nnet3/chain/train.py.  Memory info: free:31968M, used:512M, total:32480M, free/total:0.984225


So, I'm running this training on batch-size of 64, reduce from 128, and have 'final-jobs' set to 1 i.e. same as no. of GPU's
I can't run this on exclusive mode due to lack of root permissions. Is there any work-around for this? Should I keep reducing the batch-size further?

It stopped at 2700th iteration on 64 batch size and 700th iteration in batch size of 128.

Please guide

Daniel Povey

unread,
Jun 24, 2019, 11:25:10 AM6/24/19
to kaldi-help
It says it's trying to allocate 16G on a GPU with 32G of memory (which is a lot!), and essentially no memory on the GPU is currently being used (only 512M, which is likely reserved for system usage).  This doesn't really make sense.  I suspect a driver bug.
You could probably make it work just by starting again at that same iteration with the --stage option.  Likely the error is not repeatable.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Justin Luitjens

unread,
Jun 24, 2019, 11:28:06 AM6/24/19
to kaldi...@googlegroups.com
maybe the allocation is too large?  Can you try lowering the gpu memory proprotion?

For example:
 --cuda-memory-proportion=0.1

Jaskaran Singh Puri

unread,
Jun 24, 2019, 12:50:50 PM6/24/19
to kaldi-help
But this happened twice as mentioned above

Daniel Povey

unread,
Jun 24, 2019, 12:52:19 PM6/24/19
to kaldi-help
Yes but not repeatably.  Likely driver or hardware issue.  Not Kaldi related, most likely.

On Mon, Jun 24, 2019 at 12:50 PM Jaskaran Singh Puri <jaskar...@gmail.com> wrote:
But this happened twice as mentioned above

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Justin Luitjens

unread,
Jun 24, 2019, 1:40:06 PM6/24/19
to kaldi...@googlegroups.com
Can you modify the error output to also output the error string?

i.e. in cu-allocator.cc add the line below:

 if (e != cudaSuccess) {
    PrintMemoryUsage();
    KALDI_ERR << "Failed to allocated memory.  CUDA error is " << cudaGetErrorString(e);  //ADD THIS LINE
    if (!CuDevice::Instantiate().IsComputeExclusive()) {
      KALDI_ERR << "Failed to allocate a memory region of " << region_size

                << " bytes.  Possibly this is due to sharing the GPU.  Try "
                << "switching the GPUs to exclusive mode (nvidia-smi -c 3) and using "
                << "the option --use-gpu=wait to scripts like "
                << "steps/nnet3/chain/train.py.  Memory info: "
                << mem_info;



Justin Luitjens

unread,
Jun 24, 2019, 10:51:17 PM6/24/19
to kaldi...@googlegroups.com
A change to add the error string output was just merged in.  Please update and reproduce and provide the error message.

Jaskaran Singh Puri

unread,
Jun 30, 2019, 8:46:01 AM6/30/19
to kaldi-help
The image has to be compiled again right? I don't see any error messages that I added to the .cc file in the train.xx.log files


On Tuesday, June 25, 2019 at 8:21:17 AM UTC+5:30, Justin Luitjens wrote:
A change to add the error string output was just merged in.  Please update and reproduce and provide the error message.

On Mon, Jun 24, 2019 at 11:39 AM Justin Luitjens <luit...@gmail.com> wrote:
Can you modify the error output to also output the error string?

i.e. in cu-allocator.cc add the line below:

 if (e != cudaSuccess) {
    PrintMemoryUsage();
    KALDI_ERR << "Failed to allocated memory.  CUDA error is " << cudaGetErrorString(e);  //ADD THIS LINE
    if (!CuDevice::Instantiate().IsComputeExclusive()) {
      KALDI_ERR << "Failed to allocate a memory region of " << region_size
                << " bytes.  Possibly this is due to sharing the GPU.  Try "
                << "switching the GPUs to exclusive mode (nvidia-smi -c 3) and using "
                << "the option --use-gpu=wait to scripts like "
                << "steps/nnet3/chain/train.py.  Memory info: "
                << mem_info;



On Mon, Jun 24, 2019 at 10:52 AM Daniel Povey <dpo...@gmail.com> wrote:
Yes but not repeatably.  Likely driver or hardware issue.  Not Kaldi related, most likely.

On Mon, Jun 24, 2019 at 12:50 PM Jaskaran Singh Puri <jaskar...@gmail.com> wrote:
But this happened twice as mentioned above

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bc95c4ff-6af1-4d8f-b53a-46aa45beca35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Justin Luitjens

unread,
Jun 30, 2019, 9:00:54 AM6/30/19
to kaldi...@googlegroups.com
Yes, get latest and recompile

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 14, 2019, 4:13:20 AM7/14/19
to kaldi-help
Still getting the same error and I did not get that line printed of the file https://github.com/kaldi-asr/kaldi/blob/master/src/cudamatrix/cu-allocator.cc, line 525

<< " CUDA error: '" << cudaGetErrorString(e) << "'";

This wasn't returned in my LOG File

Justin Luitjens

unread,
Jul 14, 2019, 8:17:27 AM7/14/19
to kaldi...@googlegroups.com
Are you sure you have the latest source?  If so make a clean build.  If that doesn’t work please include the full output.

Sent from my iPhone
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Jaskaran Singh Puri

unread,
Jul 14, 2019, 1:12:28 PM7/14/19
to kaldi-help
Yes, got it compiled it a couple of days ago.

following is the log:


# nnet3-chain-train --use-gpu=yes --verbose=1 --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/cache.780 --write-cache=/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/cache.781 --xent-regularize=0.1 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=1.0 --srand=780 "nnet3-am-copy --raw=true --learning-rate=0.0007063383326394145 --scale=1.0 /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/780.mdl - |" /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/den.fst "ark,bg:nnet3-chain-copy-egs                          --frame-shift=1                         ark:/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/egs/cegs.207.ark ark:- |                         nnet3-chain-shuffle-egs --buffer-size=5000                         --srand=780 ark:- ark:- | nnet3-chain-merge-egs                         --minibatch-size=128 ark:- ark:- |" /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/781.1.raw
# Started at Wed Jul 10 20:23:34 UTC 2019
#
nnet3
-chain-train --use-gpu=yes --verbose=1 --apply-deriv-weights=False --l2-regularize=5e-05 --leaky-hmm-coefficient=0.1 --read-cache=/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/cache.780 --write-cache=/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/cache.781 --xent-regularize=0.1 --print-interval=10 --momentum=0.0 --max-param-change=2.0 --backstitch-training-scale=0.0 --backstitch-training-interval=1 --l2-regularize-factor=1.0 --srand=780 'nnet3-am-copy --raw=true --learning-rate=0.0007063383326394145 --scale=1.0 /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/780.mdl - |' /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/den.fst 'ark,bg:nnet3-chain-copy-egs                          --frame-shift=1                         ark:/notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/egs/cegs.207.ark ark:- |                         nnet3-chain-shuffle-egs --buffer-size=5000                         --srand=780 ark:- ark:- | nnet3-chain-merge-egs                         --minibatch-size=128 ark:- ark:- |' /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/781.1.raw
WARNING
(nnet3-chain-train[5.5]:SelectGpuId():cu-device.cc:221) Not in compute-exclusive mode.  Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:349) Selecting from 1 GPUs
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:364) cudaSetDevice(0): Tesla V100-SXM2-16GB free:15812M, used:318M, total:16130M, free/total:0.980263
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:411) Trying to select device: 0 (automatically), mem_ratio: 0.980263
LOG
(nnet3-chain-train[5.5]:SelectGpuIdAuto():cu-device.cc:430) Success selecting device 0 free mem ratio: 0.980263
LOG
(nnet3-chain-train[5.5]:FinalizeActiveGpu():cu-device.cc:284) The active GPU is [0]: Tesla V100-SXM2-16GB free:15646M, used:484M, total:16130M, free/total:0.969971 version 7.0
nnet3
-am-copy --raw=true --learning-rate=0.0007063383326394145 --scale=1.0 /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/780.mdl -
LOG
(nnet3-chain-train[5.5]:PrintMemoryUsage():cu-allocator.cc:368) Memory usage: 0/0 bytes currently allocated/total-held; 0/0 blocks currently allocated/free; largest free/allocated block sizes are 0/0; time taken total/cudaMalloc is 0/0.283798, synchronized the GPU 0 times out of 0 frees; device memory info: free:15646M, used:484M, total:16130M, free/total:0.969971maximum allocated: 0current allocated: 0
ERROR
(nnet3-chain-train[5.5]:AllocateNewRegion():cu-allocator.cc:519) Failed to allocate a memory region of 8204058624 bytes.  Possibly this is due to sharing the GPU.  Try switching the GPUs to exclusive mode (nvidia-smi -c 3) and using the option --use-gpu=wait to scripts like steps/nnet3/chain/train.py.  Memory info: free:15646M, used:484M, total:16130M, free/total:0.969971


[ Stack-Trace: ]
kaldi
::MessageLogger::LogMessage() const
kaldi
::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)
kaldi
::CuMemoryAllocator::AllocateNewRegion(unsigned long)
kaldi
::CuMemoryAllocator::MallocPitch(unsigned long, unsigned long, unsigned long*)
kaldi
::CuMatrix<float>::Resize(int, int, kaldi::MatrixResizeType, kaldi::MatrixStrideType)
kaldi
::CuMatrix<float>::Swap(kaldi::Matrix<float>*)
kaldi
::CuMatrix<float>::Read(std::istream&, bool)
kaldi
::nnet3::FixedAffineComponent::Read(std::istream&, bool)
kaldi
::nnet3::Component::ReadNew(std::istream&, bool)
kaldi
::nnet3::Nnet::Read(std::istream&, bool)
main
__libc_start_main
_start


WARNING
(nnet3-chain-train[5.5]:Close():kaldi-io.cc:515) Pipe nnet3-am-copy --raw=true --learning-rate=0.0007063383326394145 --scale=1.0 /notebooks/jpuri/training_v3/chain_300k/exp/chain/tdnn_7b/780.mdl - | had nonzero return status 13
kaldi
::KaldiFatalError
# Accounting: time=3 threads=1
# Ended (code 255) at Wed Jul 10 20:23:37 UTC 2019, elapsed time 3 seconds



Daniel Povey

unread,
Jul 14, 2019, 1:53:42 PM7/14/19
to kaldi-help
If it happens only occasionally,  it could be that two jobs are simultaneously trying to allocate memory, like a race condition.  Setting to exclusive mode and running train.py with --use-gpu=wait would fix it; reducing --cuda-memory-proportion  to, say, 0.25 might help too.  In any case you can restart from where it failed, using the --stage --train-stage option to the run_xxx.sh script (or --stage option to train.py)


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages