get_egs chain model error (Supervision does not have expected length )

549 views
Skip to first unread message

Truong Do

unread,
Nov 10, 2017, 6:26:49 AM11/10/17
to kaldi-help
I've encountered an error while running `steps/nnet3/chain/get_egs.sh`. The get_egs.*.log tells me the following warning,
```
WARNING (nnet3-chain-get-egs[5.2.187~1-a44d3]:LengthsMatch():nnet-example-utils.cc:562) Supervision does not have expected length for utterance sp1.1-b1018-utt-015497: expected length = (743 + 3 - 1) / 3 = 248, got: 273 (note: --frame-subsampling-factor=3)
```

There are 15 log files and 5 of them have the above warning on all utterances, so they return -1.

It seems like the number of input frames and number of output frames are mismatch (http://kaldi-asr.org/doc/nnet3-discriminative-get-egs_8cc_source.html#l00053). But I don't know why.
I've check alignments and lattice generation logs files but the particular utterance above is aligned without any warning and errors.

I would greatly appreciate if anyone could give a clue why this error occurred.

Truong Do

unread,
Nov 10, 2017, 6:32:34 AM11/10/17
to kaldi-help
The odd thing is the expected length = (743 + 3 - 1) / 3 = 248 is calculated from number of input frames and
the number of output frames "273" is calculated from alignments or lattices. Then how come the number of expected length is smaller than number of output frames?

Daniel Povey

unread,
Nov 10, 2017, 3:26:43 PM11/10/17
to kaldi-help
Is it possible that you used different feature configurations for the alignments versus the features you gave to the DNN training?  Normally we use mfcc.conf for the alignments and mfcc_hires.conf for the DNN's input.  If you changed the --frame-length or --frame-shift options in one but not the other, or messed with the segments file or wave files somehow, it might have this effect.

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/277fe1d6-59af-4d32-a4fa-2ea0ccec2fc3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Truong Do

unread,
Nov 10, 2017, 4:28:12 PM11/10/17
to kaldi-help
Thanks for your reply! I don't change anything out of default values of those feature configuration files.
frame-length and frame-shift are not used at all.
I also don't use the segments file and I think the wave files are ok.

I have total ~8M utterances and only ~2M utterances failed. Those failed utterances are all speed perturbed files.
The interesting part is the noise mixed utterances which are copied version of speed perturbed files with noise mixed are ok.

I confirm that the feature length of the speed-perturbed only and speed-perturbed + noise-mixed of the above mismatch utterance is the same (743).
The alignments and lattices are generated on the speed-perturbed files and are copied to the noise-mixed utterances based
on utt2uniq files.

I'm not sure where I could do wrong because the program does not output any warning for the noise-mixed utterances but
only the speed-perturbed ones.

On Saturday, November 11, 2017 at 5:26:43 AM UTC+9, Dan Povey wrote:
Is it possible that you used different feature configurations for the alignments versus the features you gave to the DNN training?  Normally we use mfcc.conf for the alignments and mfcc_hires.conf for the DNN's input.  If you changed the --frame-length or --frame-shift options in one but not the other, or messed with the segments file or wave files somehow, it might have this effect.

Dan

On Fri, Nov 10, 2017 at 6:32 AM, Truong Do <truon...@gmail.com> wrote:
The odd thing is the expected length = (743 + 3 - 1) / 3 = 248 is calculated from number of input frames and
the number of output frames "273" is calculated from alignments or lattices. Then how come the number of expected length is smaller than number of output frames?

On Friday, November 10, 2017 at 8:26:49 PM UTC+9, Truong Do wrote:
I've encountered an error while running `steps/nnet3/chain/get_egs.sh`. The get_egs.*.log tells me the following warning,
```
WARNING (nnet3-chain-get-egs[5.2.187~1-a44d3]:LengthsMatch():nnet-example-utils.cc:562) Supervision does not have expected length for utterance sp1.1-b1018-utt-015497: expected length = (743 + 3 - 1) / 3 = 248, got: 273 (note: --frame-subsampling-factor=3)
```

There are 15 log files and 5 of them have the above warning on all utterances, so they return -1.

It seems like the number of input frames and number of output frames are mismatch (http://kaldi-asr.org/doc/nnet3-discriminative-get-egs_8cc_source.html#l00053). But I don't know why.
I've check alignments and lattice generation logs files but the particular utterance above is aligned without any warning and errors.

I would greatly appreciate if anyone could give a clue why this error occurred.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Nov 10, 2017, 4:50:24 PM11/10/17
to kaldi-help
Hm.
Definitely this would have been caused by inconsistencies in the num-frames of utterances extracted during two different calls to steps/make_mfcc_feats.sh (one with regular-resolution and one with high-resolution).  You should check that the wav.scp in both directories are the same.  You can use feat-to-len scp:data/foo/feats.scp ark,t:lengths.txt to get the num-frames for each utterance from the two different sources.  You'll have to look into the MFCC generation logs, and at the wav.scp files, to figure out why the lengths became different.
Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Truong Do

unread,
Nov 10, 2017, 5:04:21 PM11/10/17
to kaldi-help
Sorry for not mention this clearly in the previous reply but I did check the feature-length using feat-to-len on
the regular-resolution and high-resolution ones for one particular utterance that has the warning.
I'm certain that the length is the same for both feature versions.

Daniel Povey

unread,
Nov 10, 2017, 5:07:40 PM11/10/17
to kaldi-help
Then check the file times for your input alignments (ali.*.gz), maybe you generated some of them previously with an older version of the features that had a different length.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Truong Do

unread,
Nov 10, 2017, 5:08:54 PM11/10/17
to kaldi-help
how the  "number of output frames" is calculated? is there any way I can check this from the alignment in the script level?

Daniel Povey

unread,
Nov 10, 2017, 5:10:36 PM11/10/17
to kaldi-help
The formula that you showed before.

On Fri, Nov 10, 2017 at 5:08 PM, Truong Do <truon...@gmail.com> wrote:
how the  "number of output frames" is calculated? is there any way I can check this from the alignment in the script level?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Truong Do

unread,
Nov 10, 2017, 5:57:48 PM11/10/17
to kaldi-help
utils/filter_scp.pl data/train-all-big_sp_vol_mix_hires/split15/10/utt2spk \
      exp/chain_north_big/tdnn_lstm1d_sp/egs/lat.scp | \
      lattice-align-phones --replace-output-symbols=true \
      exp/chain_north_big/tri4b_train-all-big_sp_vol_mix_hires_lats/final.mdl scp:- ark:- |\
      chain-get-supervision --lattice-input=true --frame-subsampling-factor=3 \
      --right-tolerance=5 --left-tolerance=5 \
      exp/chain_north_big/tdnn_lstm1d_sp/tree exp/chain_north_big/tdnn_lstm1d_sp/0.trans_mdl \
      ark:- ark:- | \
      nnet3-chain-get-egs --srand=$[10+0] --left-context=51 --right-context=21 --num-frames=140,100,160 \
      --frame-subsampling-factor=3 --compress=true --left-context-initial=11 --right-context-final=21 \
      --num-frames-overlap=0 \
      "ark,s,cs:utils/filter_scp.pl --exclude exp/chain_north_big/tdnn_lstm1d_sp/egs/valid_uttlist data/train-all-big_sp_vol_mix_hires/split15/10/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train-all-big_sp_vol_mix_hires/split15/10/utt2spk scp:data/train-all-big_sp_vol_mix_hires/split15/10/cmvn.scp scp:- ark:- |" \
      ark,s,cs:- ark:-

The above command is the one that failed. By saying *input alignments*, are you referring to the lattice and alignment generation 
steps/align_fmllr_lats.sh

I have checked the input feature used by that script and confirm that the feature is the most up-to-date one. And in fact, 
this is my first run with this feature so there is no older version of the feature. 

Truong Do

unread,
Nov 10, 2017, 5:59:33 PM11/10/17
to kaldi-help
And by the way, I'm using mfcc + pitch with "--paste-length-tolerance 6".

Daniel Povey

unread,
Nov 10, 2017, 6:01:46 PM11/10/17
to kaldi-help
That may be relevant.  I suspect you started from some existing script but made a mistake somewhere when you changed it.  If you could show me what script you started with it might make it easier to guess where you went wrong.

On Fri, Nov 10, 2017 at 5:59 PM, Truong Do <truon...@gmail.com> wrote:
And by the way, I'm using mfcc + pitch with "--paste-length-tolerance 6".

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Truong Do

unread,
Nov 10, 2017, 6:21:12 PM11/10/17
to kaldi-help
I started from a script that I have run successfully on a subset ~400h. Then I change it to run on a full data which is around 700h.

Truong Do

unread,
Nov 10, 2017, 6:23:28 PM11/10/17
to kaldi-help
The original script is wsj but I have changed it quite a lot for my own data set.

Daniel Povey

unread,
Nov 10, 2017, 6:26:31 PM11/10/17
to kaldi-help
I can't help you further with this.  The length of the alignments is related to the length of the features they were generated with by that very simple formula, so it implies that when you generated the alignments, you had different-length features--  maybe not the ones that you thought you were using.  This isn't rocket science.


On Fri, Nov 10, 2017 at 6:23 PM, Truong Do <truon...@gmail.com> wrote:
The original script is wsj but I have changed it quite a lot for my own data set.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Nov 10, 2017, 6:35:36 PM11/10/17
to kaldi-help
Also I suspect that when you checked that the lengths of the features were the same, you didn't check all of them.  Maybe you just checked a few of them and just assumed the rest were the same.

Reply all
Reply to author
Forward
0 new messages