utils/split_scp.pl: Argument list too long

Virendra Dhakhada

unread,

Jul 5, 2021, 2:17:20 AM7/5/21

to kaldi-help

Hello,
We're training the Kaldi model for Indian English. While we're able to train many models with small datasets up to 30K audio clips, we're facing a strange issue with the bigger dataset.

In local/chain/tuning/run_tdnn_1j.sh script, we're facing issue in stage 16.

steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --frames-per-chunk 140 --nj 85989 --cmd run.pl --mem 16G --num-threads 4 --online-ivector-dir exp/nnet3/ivectors_test_hires exp/chain/tree_sp/graph_tgsmall data/test_hires exp/chain/tdnn1j_sp/decode_tgsmall_test

/opt/kaldi/egs/mini_librispeech/s5/utils/split_data.sh: line 111: utils/split_scp.pl: Argument list too long

local/chain/tuning/run_tdnn_1j.sh: there was a problem while decoding

Screenshot attached Screenshot (37).png

Here are some specifications we're using:

We're using docker image for kaldi setup
docker run -it -v /mnt/disk1/virendra/kaldi/:/opt/kaldi/ --gpus all kaldiasr/kaldi:gpu-latest
We have only 1 GPU, Cuda v 11.3 and nvcc v 10.1
Our training data and testing data are the same. Total 85989 clips, around 115 hours of data.
We're using Vosk in our production, so largely following Build model for Vosk repo's scripts and advice for training data formatting.

As mentioned earlier we're able to get many models trained using this setup and procedure, but with this 86K clips, we're facing issue. I suspect that by keeping test data smaller, we might able to resolve this but not sure.

Can you please help us, to find the root cause of this error and resolve it.

Also attaching our custom run.sh for reference.

new_run.sh

Daniel Povey

unread,

Jul 5, 2021, 2:39:45 AM7/5/21

to kaldi-help

You are hitting a linux kernel limitation. There might not be an easy workaround.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/19720491-aebf-4c1a-8eff-3e32c37ff72fn%40googlegroups.com.

Virendra Dhakhada

unread,

Jul 5, 2021, 2:48:38 AM7/5/21

to kaldi-help

Okay, so two questions...
1. where to start for solving the Linux kernel limitation problem? Or moving to larger EC2 will help?

2. if I reduce my test data size significantly, should it help?

Thanks, Dan for the quick response. I'm grateful for the support you guys are providing for open-source software.

Daniel Povey

unread,

Jul 5, 2021, 2:51:57 AM7/5/21

to kaldi-help

It's configured when the kernel is compiled.

Easy fix is to just use fewer jobs. There's no reason to use 85k jobs. Just use 1000 or something.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/01d98497-ea91-4fbd-80e8-82ab0282c729n%40googlegroups.com.

Mukesh Badgujar

unread,

May 12, 2022, 5:38:35 AM5/12/22

to kaldi-help

Sir I have same problem please check,

Please tell me how i can modify the number of jobs, because this command is automatically running

As i put the model building with kaldi.

Please see screenshot

Jan Yenda Trmal

unread,

May 13, 2022, 8:00:43 AM5/13/22

to kaldi-help

just use a smaller number of jobs (a couple of hundreds at most).

y.

--

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c08438fd-4f0a-42df-b0fc-2437168973a0n%40googlegroups.com.

Daniel Povey

unread,

May 13, 2022, 8:00:47 AM5/13/22

to kaldi-help

You'll have to change the calling script then.

--

Reply all

Reply to author

Forward