Importance of silences at utterence begin/end

theodor...@snips.ai

unread,

Jan 12, 2018, 5:26:09 AM1/12/18

to kaldi-help

Hi there,

I've recently experienced issues training a TDNN-LSTM model on some automatically segmented training data.

I regularly get the following warning during GMM alignment of the training set:

analyze_phone_length_stats.py: WARNING: optional-silence <sil> is seen only 31.7546483711% of the time at utterance begin. This may not be optimal.
analyze_phone_length_stats.py: WARNING: optional-silence <sil> is seen only 60.7031898789% of the time at utterance end. This may not be optimal.

but I tend to ignore it since the subsequent nnet training usually gives fairly good results.

Yet one of my last training resulted in a reasonable bootstraping triphone GMM (~30% WER) but a weird TDNN-LSTM (going down to 50%WER then up to 100%, decoding silence only, though having nice, converging training curves in terms of log-prob).
The thing is, (it's trained with the chain method), the acoustic score of silence-only predictions was lower than that of the correct transcript, but close enough so that the cost of LM ended up giving silence-only sequences a far lower cost.

Turns out that, following the recommendation in this comment in analyze_alignments.py
""
The reason for this warning is that the tradition in speech recognition is
to supply a little silence at the beginning and end of utterances... up to
maybe half a second. If your database is not like this, you should know;
you may want to mess with the segmentation to add more silence
""
we added more silence and that fixed the issue.

Having said that, I still have trouble understanding why it is preferable to have "a little silence at the beginning and end of utterances", and how it influences the training of the nnet.
Is there a reference paper or study that I could read?

Thanks,
Théo

Daniel Povey

unread,

Jan 12, 2018, 1:58:19 PM1/12/18

to kaldi-help

If you don't have silences, the utterances may be cut off in the middle, and that's not good. Also, test data may have silences. There isn't a hard and fast rule.

Anyway, I doubt that the problems with your TDNN+LSTM were related to this. I think it's more likely there was an unrelated error such as a tree mismatch or wrong chunk-size or extra-{left,right}-context options.

However, it's possible that there was an issue about the silence. Do

grep optional exp/your-gmm-dir/log/analyze_alignments.log

One possibility is that your training data had very little silence in its alignments, leading to the den.fst having very low probability for silence; this would have tended to push up the acoustic probabilities of silence, which might have led to a lot of silence being recognized since the lexicon fixes the probability of silence at 0.5. But my feeling is if this were a likely thing, we'd have seen it before. Were you using a lexicon with pronunciation and silence probabilties? (utils/dict_dir_add_pronprobs.sh)?

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e38bf161-8c8a-4b61-9002-aa705591a263%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

theodor...@snips.ai

unread,

Jan 15, 2018, 5:15:29 AM1/15/18

to kaldi-help

Fair enough, that makes sense. There was about 6% of silences in the alignments.
Doing the same experiments with and without adding silences resulted in the model with silences working all right and the one without breaking.
And indeed we were using pronunciation/silences probabilities for GMM-HMM decoding but not for nnet.
Thanks for the hints :)
Cheers
Théodore

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Reply all

Reply to author

Forward