Refusing to split data because number of speakers 9 is less than the number of output .scp files 12

128 views
Skip to first unread message

Syed Uzair

unread,
Feb 26, 2022, 3:22:18 AM2/26/22
to kaldi...@googlegroups.com
Dear All,

I'm building an ASR system for Punjabi language. I have my own Data. When i start training, it runs correctly till sgmm2. But when training of NNET starts, it gives the following error. 

If anyone knows its solution, please let me know. Thanks in advance.

Succeeded creating CMVN stats for test_hires
fix_data_dir.sh: kept all 104 utterances.
fix_data_dir.sh: old files are kept in data/test_hires/.backup
steps/online/nnet2/train_diag_ubm.sh --cmd utils/run.pl --nj 6 --num-threads 6 --num-frames 200000 data/train_hires 256 exp/tri3 exp/nnet2_online/diag_ubm
steps/online/nnet2/train_diag_ubm.sh: Directory exp/nnet2_online/diag_ubm already exists. Backing up diagonal UBM in exp/nnet2_online/diag_ubm/backup.Vh1
steps/online/nnet2/train_diag_ubm.sh: initializing model from E-M in memory,
steps/online/nnet2/train_diag_ubm.sh: starting from 128 Gaussians, reaching 256;
steps/online/nnet2/train_diag_ubm.sh: for 20 iterations, using at most 200000 frames of data
Getting Gaussian-selection info
steps/online/nnet2/train_diag_ubm.sh: will train for 4 iterations, in parallel over
steps/online/nnet2/train_diag_ubm.sh: 6 machines, parallelized with 'utils/run.pl'
steps/online/nnet2/train_diag_ubm.sh: Training pass 0
steps/online/nnet2/train_diag_ubm.sh: Training pass 1
steps/online/nnet2/train_diag_ubm.sh: Training pass 2
steps/online/nnet2/train_diag_ubm.sh: Training pass 3
steps/online/nnet2/train_ivector_extractor.sh --cmd utils/run.pl --nj 6 --num-threads 3 --num-processes 2 --ivector-dim 50 --online-cmvn-iextractor false data/train_hires exp/nnet2_online/diag_ubm exp/nnet2_online/extractor
steps/online/nnet2/train_ivector_extractor.sh: Directory exp/nnet2_online/extractor already exists. Backing up iVector extractor in exp/nnet2_online/extractor/backup.xlc
utils/split_scp.pl: Refusing to split data because number of speakers 9 is less than the number of output .scp files 12
--
Syed Uzair
BS.c Computer Science
Gujranwala, Pakistan
mobile: +92 302 2224500
syeduzai...@gmail.com
Facebook icon  LinkedIn icon  Instagram icon  

Daniel Povey

unread,
Feb 26, 2022, 4:04:33 AM2/26/22
to kaldi-help
That's not really enough data to train an ivector extractor, but hopefully the scripts will still run...
You'll have to adjust the args to train_ivector_extractor, e.g. reduce  --num-processes 2  to  --num-processes 1.
Even then it may use too much memory; in that case you can reduce --nj from 6 to 2 or something like that.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAF9LwJN-QmJK0miwihKi15z29PMjLsqza7eH5cY4gK5-%2Bo09-w%40mail.gmail.com.

Syed Uzair

unread,
Feb 26, 2022, 4:05:44 AM2/26/22
to kaldi...@googlegroups.com

Gaurav Lotey

unread,
Feb 26, 2022, 4:07:06 AM2/26/22
to kaldi...@googlegroups.com
Hello Dan, 
I am using wsj recipe to train and decode what I note is that my systems CPU is maxed out I am using NJ 20. I have i9 10th Gen RTX 2080 and 32GB ram. While my GPU and Ram is not being used much. Can I reduce my CPU usage somehow as my server can't handle any other task if I run kaldi 

Daniel Povey

unread,
Feb 26, 2022, 4:19:57 AM2/26/22
to kaldi-help
the GPU can be used for nnet training later, but not for the GMM stages.

Gaurav Lotey

unread,
Feb 26, 2022, 5:13:14 AM2/26/22
to kaldi...@googlegroups.com
Thanks for the information. So it seems I can't reduce my CPU utilization anyhow then.

Reply all
Reply to author
Forward
0 new messages