Number of HMM topology states for the HMM-GMM model compared to the HMM-DNN model.

55 views
Skip to first unread message

yair shachar

unread,
May 24, 2021, 3:06:22 AM5/24/21
to kaldi-help

Hello everyone, this is my first post so i would like to first thank you for maintaining and developing this toolkit and for the support that you offer.

My question is related to the number of states in the HMM topology and the number of states in the chain type topology created for the DNN models.

I am currently inspecting the mini_librispeech recipe

1) why is it that the convention is to give 3 states for non silence phones and 5 states for silence phones in the conventional HMM?
2) When inspecting the chain-type topology, i see that all phones (silence/nonsilence) are given only one state, why is this?
3) How would changing these affect the results of the model?

Many thanks.

Daniel Povey

unread,
May 24, 2021, 3:39:58 AM5/24/21
to kaldi-help
The chain model has 3x subsampling of frames, so any longer topology would lead to problems predicting fast speech.
the silence/regular-phone topology difference (5 vs 3) is not really important.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d6a4df6a-041e-4611-9ce0-8453abd7dcdbn%40googlegroups.com.

yair shachar

unread,
May 24, 2021, 4:28:54 AM5/24/21
to kaldi-help
To make sure I understand- regarding your answer on the chain model topology: because the frames are cut into very short lengths modeling them with 3 states where each state receives a pdf might lead to false results because there is too little data in such a small frame? or is the reason for failing on 3x sub sampling different?

considering my third question and to clarify for myself your answers, does this mean that it doesn't really matter how many states are modeled for silence/non-silence phones?
(i.e. what would happen if I would have 2 and 2 or 5 and 5 or 5 and 10 for sil/non-sil etc' are these things I should consider?)

Thanks again.
Reply all
Reply to author
Forward
0 new messages