WARNING (nnet3-latgen-faster:CheckMemoryUsage():determinize-lattice-pruned.cc:327) Did not reach requested beam in determinize-lattice: size exceeds maximum 10000000 bytes;
Can we change this memory limit or how can we decode large files that too parallely?
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d8c327fe-47aa-4467-aca2-7596b15f7c75%40googlegroups.com.
stdbuf -o 0 $KALDI_ROOT/src/$path/$decoder $cuda_flags --frame-subsampling-factor=3 --config="/notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/conf/online.conf" --frames-per-chunk=264 --file-limit=-1 --max-mem=10000000 --beam=30 --lattice-beam=6.0 --acoustic-scale=1.0 --determinize-lattice=true --max-active=10000 --word-symbol-table="/notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt" /notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/final.mdl /notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst "scp:/notebooks/jpuri/corning_gpu/wav.scp" "ark:|gzip -c > /notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/decode_gpu/lat.$decoder.corning_gpu_60.gz" 2>&1 | tee gpu.log
/opt/kaldi/src/cudadecoderbin/batched-wav-nnet3-cuda --cuda-use-tensor-cores=true --iterations=1 --main-q-capacity=40000 --aux-q-capacity=500000 --cuda-memory-proportion=.5 --max-batch-size=200 --cuda-control-threads=2 --batch-drain-size=15 --cuda-worker-threads=54 --frame-subsampling-factor=3 --config=/notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/conf/online.conf --frames-per-chunk=264 --file-limit=-1 --max-mem=10000000 --beam=30 --lattice-beam=6.0 --acoustic-scale=1.0 --determinize-lattice=true --max-active=10000 --word-symbol-table=/notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt /notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/final.mdl /notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst scp:/notebooks/jpuri/corning_gpu/wav.scp 'ark:|gzip -c > /notebooks/jpuri/aspire_original/s5/exp/tdnn_7b_chain_online/decode_gpu/lat.batched-wav-nnet3-cuda.corning_gpu_60.gz'
WARNING (batched-wav-nnet3-cuda[5.5]:SelectGpuId():cu-device.cc:221) Not in compute-exclusive mode. Suggestion: use 'nvidia-smi -c 3' to set compute exclusive mode
LOG (batched-wav-nnet3-cuda[5.5]:SelectGpuIdAuto():cu-device.cc:349) Selecting from 1 GPUs
LOG (batched-wav-nnet3-cuda[5.5]:SelectGpuIdAuto():cu-device.cc:364) cudaSetDevice(0): Tesla V100-SXM2-16GB free:15756M, used:374M, total:16130M, free/total:0.976791
LOG (batched-wav-nnet3-cuda[5.5]:SelectGpuIdAuto():cu-device.cc:411) Trying to select device: 0 (automatically), mem_ratio: 0.976791
LOG (batched-wav-nnet3-cuda[5.5]:SelectGpuIdAuto():cu-device.cc:430) Success selecting device 0 free mem ratio: 0.976791
LOG (batched-wav-nnet3-cuda[5.5]:FinalizeActiveGpu():cu-device.cc:284) The active GPU is [0]: Tesla V100-SXM2-16GB free:15592M, used:538M, total:16130M, free/total:0.966624 version 7.0
LOG (batched-wav-nnet3-cuda[5.5]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (batched-wav-nnet3-cuda[5.5]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (batched-wav-nnet3-cuda[5.5]:Collapse():nnet-utils.cc:1378) Added 1 components, removed 2
LOG (batched-wav-nnet3-cuda[5.5]:Initialize():batched-threaded-nnet3-cuda-pipeline.cc:32) BatchedThreadedNnet3CudaPipeline Initialize with 2 control threads, 54 worker threads and batch size 200
LOG (batched-wav-nnet3-cuda[5.5]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (batched-wav-nnet3-cuda[5.5]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 4087349248 bytes; current memory info is: free:7794M, used:8336M, total:16130M, free/total:0.483192
LOG (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 2043674624 bytes; current memory info is: free:3896M, used:12234M, total:16130M, free/total:0.241538
LOG (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 1021313024 bytes; current memory info is: free:1946M, used:14184M, total:16130M, free/total:0.120649
LOG (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 510656512 bytes; current memory info is: free:972M, used:15158M, total:16130M, free/total:0.0602663
LOG (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 390070272 bytes; current memory info is: free:484M, used:15646M, total:16130M, free/total:0.030013
LOG (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 201326592 bytes; current memory info is: free:112M, used:16018M, total:16130M, free/total:0.00695112
LOG (batched-wav-nnet3-cuda[5.5]:PrintMemoryUsage():cu-allocator.cc:368) Memory usage: 15033003008/16228810752 bytes currently allocated/total-held; 1962/25 blocks currently allocated/free; largest free/allocated block sizes are 800000000/389283840; time taken total/cudaMalloc is 0.0389941/0.0269516, synchronized the GPU 60 times out of 773 frees; device memory info: free:112M, used:16018M, total:16130M, free/total:0.00695112maximum allocated: 15270767616current allocated: 15033003008ERROR (batched-wav-nnet3-cuda[5.5]:AllocateNewRegion():cu-allocator.cc:519) Failed to allocate a memory region of 201326592 bytes. Possibly this is due to sharing the GPU. Try switching the GPUs to exclusive mode (nvidia-smi -c 3) and using the option --use-gpu=wait to scripts like steps/nnet3/chain/train.py. Memory info: free:112M, used:16018M, total:16130M, free/total:0.00695112
[ Stack-Trace: ]kaldi::MessageLogger::LogMessage() constkaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)kaldi::CuMemoryAllocator::AllocateNewRegion(unsigned long)kaldi::CuMemoryAllocator::MallocPitch(unsigned long, unsigned long, unsigned long*)kaldi::CuMatrix<float>::Resize(int, int, kaldi::MatrixResizeType, kaldi::MatrixStrideType)kaldi::nnet3::NnetComputer::ExecuteCommand()kaldi::nnet3::NnetComputer::Run()kaldi::nnet3::NnetBatchComputer::Compute(bool)kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchNnet(kaldi::nnet3::NnetBatchComputer&, int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&)kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()
cloneWARNING (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:436) Printing some background info since error was detectedLOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:437) matrix m1(37504, 40), m2(128, 100), m3(36992, 220), m4(36992, 1024), m5(12288, 4096), m6(12288, 3072), m7(12032, 3072), m8(11776, 3072), m9(11520, 3072), m10(11264, 1024), m11(11264, 1024), m12(11264, 8629), m13(11264, 8629)# The following show how matrices correspond to network-nodes and# cindex-ids. Format is: matrix = <node-id>.[value|deriv][ <list-of-cindex-ids> ]# where a cindex-id is written as (n,t[,x]) but ranges of t values are compressed# so we write (n, tfirst:tlast).m1 == value: input[(0,-17:275), (1,-17:275), (2,-17:275), (3,-17:275), (4,-17:275), (5,-17:275), (6,-17:275), (7,-17:2 ... ,-17:275), (122,-17:275), (123,-17:275), (124,-17:275), (125,-17:275), (126,-17:275), (127,-17:275)]m2 == value: ivector[(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0), (10,0), (11,0), (12,0), (13,0 ... , (117,0), (118,0), (119,0), (120,0), (121,0), (122,0), (123,0), (124,0), (125,0), (126,0), (127,0)]m3 == value: Tdnn_0_affine_input[(0,-16), (1,-16), (2,-16), (3,-16), (4,-16), (5,-16), (6,-16), (7,-16), (8,-16), (9,-16), (10,-16), ... , (119,272), (120,272), (121,272), (122,272), (123,272), (124,272), (125,272), (126,272), (127,272)]m4 == value: Tdnn_0_affine[(0,-16), (1,-16), (2,-16), (3,-16), (4,-16), (5,-16), (6,-16), (7,-16), (8,-16), (9,-16), (10,-16), ... , (119,272), (120,272), (121,272), (122,272), (123,272), (124,272), (125,272), (126,272), (127,272)]m5 == value: Tdnn_1_affine_input[(0,-15), (1,-15), (2,-15), (3,-15), (4,-15), (5,-15), (6,-15), (7,-15), (8,-15), (9,-15), (10,-15), ... , (119,270), (120,270), (121,270), (122,270), (123,270), (124,270), (125,270), (126,270), (127,270)]m6 == value: Tdnn_2_affine_input[(0,-12), (1,-12), (2,-12), (3,-12), (4,-12), (5,-12), (6,-12), (7,-12), (8,-12), (9,-12), (10,-12), ... 648), (123,-2147483648), (124,-2147483648), (125,-2147483648), (126,-2147483648), (127,-2147483648)]m7 == value: Tdnn_3_affine_input[(0,-9), (1,-9), (2,-9), (3,-9), (4,-9), (5,-9), (6,-9), (7,-9), (8,-9), (9,-9), (10,-9), (11,-9), ( ... 648), (123,-2147483648), (124,-2147483648), (125,-2147483648), (126,-2147483648), (127,-2147483648)]m8 == value: Tdnn_4_affine_input[(0,-6), (1,-6), (2,-6), (3,-6), (4,-6), (5,-6), (6,-6), (7,-6), (8,-6), (9,-6), (10,-6), (11,-6), ( ... 648), (123,-2147483648), (124,-2147483648), (125,-2147483648), (126,-2147483648), (127,-2147483648)]m9 == value: Tdnn_5_affine_input[(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0), (10,0), (11,0), (12,0), (13,0 ... 648), (123,-2147483648), (124,-2147483648), (125,-2147483648), (126,-2147483648), (127,-2147483648)]m10 == value: Tdnn_5_affine[(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0), (10,0), (11,0), (12,0), (13,0 ... , (119,261), (120,261), (121,261), (122,261), (123,261), (124,261), (125,261), (126,261), (127,261)]m11 == value: Tdnn_pre_final_chain_affine[(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0), (10,0), (11,0), (12,0), (13,0 ... , (119,261), (120,261), (121,261), (122,261), (123,261), (124,261), (125,261), (126,261), (127,261)]m12 == value: Final_affine[(0,0), (1,0), (2,0), (3,0), (4,0), (5,0), (6,0), (7,0), (8,0), (9,0), (10,0), (11,0), (12,0), (13,0 ... , (119,261), (120,261), (121,261), (122,261), (123,261), (124,261), (125,261), (126,261), (127,261)]m13 == value: output[(0,0), (0,3), (0,6), (0,9), (0,12), (0,15), (0,18), (0,21), (0,24), (0,27), (0,30), (0,33), (0,36), ... , (127,237), (127,240), (127,243), (127,246), (127,249), (127,252), (127,255), (127,258), (127,261)]
LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c0: m1 = user input [for node: 'input']LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c1: m2 = user input [for node: 'ivector']LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c2: [no-op-permanent]LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c3: m3 = undefined(36992,220)LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c4: m3(0:36991, 0:39).CopyRows(1, m1[0, 293, 586, 879, 1172, 1465, 1758, 2051, 2344, 2637, 2930, 322 ... 33690, 33983, 34276, 34569, 34862, 35155, 35448, 35741, 36034, 36327, 36620, 36913, 37206, 37499])LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c5: m3(0:36991, 40:79).CopyRows(1, m1[1, 294, 587, 880, 1173, 1466, 1759, 2052, 2345, 2638, 2931, 32 ... 33691, 33984, 34277, 34570, 34863, 35156, 35449, 35742, 36035, 36328, 36621, 36914, 37207, 37500])LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c6: m3(0:36991, 80:119).CopyRows(1, m1[2, 295, 588, 881, 1174, 1467, 1760, 2053, 2346, 2639, 2932, 3 ... 33692, 33985, 34278, 34571, 34864, 35157, 35450, 35743, 36036, 36329, 36622, 36915, 37208, 37501])LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c7: m1 = []LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c8: m3(0:36991, 120:219).CopyRows(1, m2[0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:12 ... 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127, 0:127])LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c9: m2 = []LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c10: m4 = undefined(36992,1024)LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c11: L0_fixaffine.Tdnn_0_affine.Propagate(NULL, m3, &m4)LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c12: m3 = []LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c13: Tdnn_0_relu.Propagate(NULL, m4, &m4)LOG (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:439) c14: Tdnn_0_renorm.Propagate(NULL, m4, &m4)ERROR (batched-wav-nnet3-cuda[5.5]:ExecuteCommand():nnet-compute.cc:443) Error running command c15: m5 = undefined(12288,4096)
[ Stack-Trace: ]kaldi::MessageLogger::LogMessage() constkaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)kaldi::nnet3::NnetComputer::ExecuteCommand()kaldi::nnet3::NnetComputer::Run()kaldi::nnet3::NnetBatchComputer::Compute(bool)kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ComputeBatchNnet(kaldi::nnet3::NnetBatchComputer&, int, std::vector<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*, std::allocator<kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::TaskState*> >&)kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::ExecuteWorker(int)std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline::*)(int)> (kaldi::cuda_decoder::BatchedThreadedNnet3CudaPipeline*, int)> >::_M_run()
clone
LOG (batched-wav-nnet3-cuda[5.5]:PrintMinibatchStats():nnet-batch-compute.cc:104) Minibatch stats: seconds-taken,frames-in:frames-out*minibatch-size=num-done(percent-full%) 0.44,293:88*128=15(96%) 0.00,293:88*16=1(68%)LOG (batched-wav-nnet3-cuda[5.5]:PrintMinibatchStats():nnet-batch-compute.cc:105) Did 1868 tasks in 16 minibatches, taking 0.44438 seconds.ERROR (batched-wav-nnet3-cuda[5.5]:~NnetBatchComputer():nnet-batch-compute.cc:119) Tasks are pending but object is being destroyedIt's not an error, it's a warning. It means that lattice generation generated a less-deep-than-normal lattice.It shouldn't be a problem, it won't affect the one-best path which may be all you needDan
On Tue, Jul 23, 2019 at 10:45 AM Jaskaran Singh Puri <jaskar...@gmail.com> wrote:
Facing the following error while decoding multiple files of more than 10 minutes parallely
WARNING (nnet3-latgen-faster:CheckMemoryUsage():determinize-lattice-pruned.cc:327) Did not reach requested beam in determinize-lattice: size exceeds maximum 10000000 bytes;
Can we change this memory limit or how can we decode large files that too parallely?
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/025fb0f6-aa0c-4d67-ad59-bcb1d2c0ddc2%40googlegroups.com.
Feature extraction currently has memory usage O(audio length) as does nnet3. I’d suggest segmenting audio to about 30 seconds at a time. However, I have noticed that extract-segments in an “ark:” command becomes a bottleneck so do the segmenting as a preprocessing step.
If this doesn’t solve your issue then we can look at other parameters (max batch size and gpu-control-threads being the two primary parameters).
As this is for decoding new files only
What are some scripts that can be used to segment large files, where training the model is not an option, like segment_long_utt, where input model has to be given
As this is for decoding new files only
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6d4e367d-d13b-4d3e-98c1-c42f8aadf358%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/94227b3b-a315-4062-b972-378da46a520a%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/yNdfPGDPifI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CALCo63ePhkjSXz9TGwcU0anTau2yzrKVSWBnv%2B6pN%3DwavbqvOw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAMb1WPSYAvOUW-A7phGZyuRMD5t5L1VEgTbtcTo1NdH6iLi-1A%40mail.gmail.com.
And how exactly is the output from this command to be used in my decoding?
Will the output file replace wav.scp?
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/7c310ce5-b784-48e6-a188-39b8978d80c5%40googlegroups.com.