Allocating training subset examples take too much time

201 views
Skip to first unread message

Brij Mohan Lal Srivastava

unread,
Feb 14, 2021, 10:33:46 AM2/14/21
to kaldi-help
Hello,

I am trying to train a x-vector model using custom 256-dim features. The command to create egs for training looks like this:

sid/nnet3/xvector/get_egs.sh --cmd "$train_cmd" \
  --nj 8 \
  --stage 0 \
  --frames-per-iter 50000000 \
  --frames-per-iter-diagnostic 100000 \
  --min-frames-per-chunk 200 \
  --max-frames-per-chunk 400 \
  --num-diagnostic-archives 3 \
  --num-repeats 35 \
  "$data" $egs_dir

The command proceeds to produce the following log on console:

sid/nnet3/xvector/get_egs.sh: Preparing train and validation lists
sid/nnet3/xvector/get_egs.sh: Producing 92 archives for training
sid/nnet3/xvector/get_egs.sh: Allocating training examples
sid/nnet3/xvector/get_egs.sh: Allocating training subset examples

And then its stuck there for more than 20h. Even the log file says nothing. Basically it is stuck at this line: 


Can anybody please let me know what could be the issue and how to debug this?

Thanks,
Brij







Brij Mohan Lal Srivastava

unread,
Feb 16, 2021, 4:54:45 PM2/16/21
to kaldi-help
The program is going into an infinite loop here:


The utt_len is always smaller than the length in the while loop specified in the above link.

This is the command that is executed:

$ sid/nnet3/xvector/allocate_egs.py --prefix train_subset --num-repeats=1 --min-frame
s-per-chunk=200 --max-frames-per-chunk=400 --randomize-chunk-length false --frames-per-iter=100000 --num-archives=3 --num-jobs=1 --utt2len-filename=exp/asv_models/asv_n
oise_baseline/egs/temp/utt2num_frames.train_subset --utt2int-filename=exp/asv_models/asv_noise_baseline/egs/temp/utt2int.train_subset --egs-dir=exp/asv_models/asv_noise
_baseline/egs


Any ideas why this is happening?

Thanks,
Brij

Desh Raj

unread,
Feb 16, 2021, 5:07:54 PM2/16/21
to kaldi...@googlegroups.com
I think we may need to revert this PR: https://github.com/kaldi-asr/kaldi/pull/4320. It was originally intended to fix https://github.com/kaldi-asr/kaldi/issues/4319 but it seems it often leads to infinite looping, as Dan had predicted.

- Desh

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e8b8201a-0cbf-45e3-8db1-c390fee63e6dn%40googlegroups.com.

Brij Mohan Lal Srivastava

unread,
Feb 16, 2021, 5:50:24 PM2/16/21
to kaldi-help
Thanks for pointing out the exact issue, Desh! I will make changes locally to proceed with my experiment for now.
One question: I see that there is no change in the body of get_random_utt function based on min_length param. How was it used in the function?

Thanks,
Brij

Desh Raj

unread,
Feb 16, 2021, 6:28:19 PM2/16/21
to kaldi...@googlegroups.com
It wasn't being used at all.

Daniel Povey

unread,
Feb 17, 2021, 3:06:15 AM2/17/21
to kaldi-help
Mm, I'd rather merge a fix that actually fixes the original issue as well as the looping, but I don't have time to create one myself.
Shouldn't be too hard though.


Brij Mohan Lal Srivastava

unread,
Feb 21, 2021, 12:03:11 PM2/21/21
to kaldi-help
Dear Desh and Dan,

I made local changes to allocate_egs.sh to revert to the old code but I face the same issue as this: https://github.com/kaldi-asr/kaldi/issues/4319
So I guess I need to find a better solution to continue the training.

Can you give me some idea what has to be done here so that I can implement it and bypass this issue?

Thanks,
Brij

Daniel Povey

unread,
Feb 22, 2021, 8:24:06 AM2/22/21
to kaldi-help
It should be a question of thoroughly reading that script and figuring out its logic, and implementing a fix.  I don't have time to figure it out right now though.


Brij Mohan Lal Srivastava

unread,
Feb 26, 2021, 12:43:41 PM2/26/21
to kaldi-help
Added a fix to break out of infinite loop when the utterances are exhausted.

Thanks,
Brij

Reply all
Reply to author
Forward
0 new messages