utils/split_scp.pl: Argument list too long

162 views
Skip to first unread message

Virendra Dhakhada

unread,
Jul 5, 2021, 2:17:20 AM7/5/21
to kaldi-help
Hello,
We're training the Kaldi model for Indian English. While we're able to train many models with small datasets up to 30K audio clips, we're facing a strange issue with the bigger dataset.

In  local/chain/tuning/run_tdnn_1j.sh script, we're facing issue in stage 16. 
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --frames-per-chunk 140 --nj 85989 --cmd run.pl --mem 16G --num-threads 4 --online-ivector-dir exp/nnet3/ivectors_test_hires exp/chain/tree_sp/graph_tgsmall data/test_hires exp/chain/tdnn1j_sp/decode_tgsmall_test

/opt/kaldi/egs/mini_librispeech/s5/utils/split_data.sh: line 111: utils/split_scp.pl: Argument list too long

local/chain/tuning/run_tdnn_1j.sh: there was a problem while decoding

Screenshot attachedScreenshot (37).png

Here are some specifications we're using:

  1.  We're using docker image for kaldi setup
    docker run -it -v /mnt/disk1/virendra/kaldi/:/opt/kaldi/ --gpus all kaldiasr/kaldi:gpu-latest
  2. We have only 1 GPU, Cuda v 11.3 and nvcc v 10.1
  3. Our training data and testing data are the same. Total 85989 clips, around 115 hours of data.
  4. We're using Vosk in our production, so largely following Build model for Vosk repo's scripts and advice for training data formatting.

As mentioned earlier we're able to get many models trained using this setup and procedure, but with this 86K clips, we're facing issue. I suspect that by keeping test data smaller, we might able to resolve this but not sure.

Can you please help us, to find the root cause of this error and resolve it.

Also attaching our custom run.sh for reference.
new_run.sh

Daniel Povey

unread,
Jul 5, 2021, 2:39:45 AM7/5/21
to kaldi-help
You are hitting a linux kernel limitation.  There might not be an easy workaround.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/19720491-aebf-4c1a-8eff-3e32c37ff72fn%40googlegroups.com.

Virendra Dhakhada

unread,
Jul 5, 2021, 2:48:38 AM7/5/21
to kaldi-help
Okay, so two questions...
1. where to start for solving the Linux kernel limitation problem? Or moving to larger EC2 will help?
2. if I reduce my test data size significantly, should it help?

Thanks, Dan for the quick response. I'm grateful for the support you guys are providing for open-source software.

Daniel Povey

unread,
Jul 5, 2021, 2:51:57 AM7/5/21
to kaldi-help
It's configured when the kernel is compiled.
Easy fix is to just use fewer jobs.    There's no reason to use 85k jobs.  Just use 1000 or something.


Mukesh Badgujar

unread,
May 12, 2022, 5:38:35 AM5/12/22
to kaldi-help
Sir I have same problem please check, 

Please tell me how i can modify the  number of jobs, because this command is automatically running 
As i put the model building with kaldi.

Please see screenshot
P_20220512_150316_1.jpg

Jan Yenda Trmal

unread,
May 13, 2022, 8:00:43 AM5/13/22
to kaldi-help
just use a smaller number of jobs (a couple of hundreds at most).
y.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
May 13, 2022, 8:00:47 AM5/13/22
to kaldi-help
You'll have to change the calling script then.  

--
Reply all
Reply to author
Forward
0 new messages