Transition-state 4: phone = a hmm-state = 0 pdf = 3
Transition-id = 7 p = 0.713307 count of pdf = 390711 [self-loop]
Transition-id = 8 p = 0.286693 count of pdf = 390711 [0 -> 1]
Transition-state 5: phone = a hmm-state = 1 pdf = 4
Transition-id = 9 p = 0.594051 count of pdf = 275931 [self-loop]
Transition-id = 10 p = 0.405949 count of pdf = 275931 [1 -> 2]
Transition-state 6: phone = a hmm-state = 2 pdf = 5
Transition-id = 11 p = 0.594987 count of pdf = 276569 [self-loop]
Transition-id = 12 p = 0.405013 count of pdf = 276569 [2 -> 3]
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/34e885c6-47e2-4c91-97df-7062af75dd65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi, Dan and All:
Dan, thank you very much for the explanation.
I have got clear about transition to the next phone now.
But I have some questions about "goes to the next phone" in more details, furtherly.
I found the transition probability [HmmState2 -> HmmState3] is very different from the probability of [phone=a -> phone=b], or P(b|a).
Precisely, I think, the transition-id of a triphone from HmmState=2 to HmmState=3 just tells us how the phone ends. In H transducer, we need an Arc with label (eps:eps) to make the transition from the initial/final state to the start/loop state. And the start/loop state is shared by all of triphone Hmm fsts (V. Panayotov's blog and the function MakeLoopFst()). The probability of [HmmState=2 -> HmmState=3] only tells us its difference from the self-loop [HmmState=2 -> HmmState=2].
And the conditional probability of P(b|a) is totally another concept that a transition-id can't represent.
So, the transition [HmmState=2 -> HmmState=3] shows phone's ending. The transition of a phone goes to the next phone must include two arcs, [HmmState=2 -> HmmState=3] and the arc with a label of eps:eps.
For example, we got three words: catch, catches and catched. From triphone k/ae/ch to the next triphone we may get it in the Fig. below. It shows that transition-id [2->3] can't help to distinguish between ae/ch/eps_0, ae/ch/i_0 and ae/ch/d_0. But we use the emitting probabilities from GMM or nnet to help to distinguish them.
My question is:
1) Is the transition model really effective in an ASR system, especially comparing to context-dependent-phone or GMM model? Is it really helpful to reduce WER?
I think it is more like a kind of glue to make other parts together.
Thanks a lot for your help!
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/651e262f-b150-4aeb-a45b-e27f52225e43%40googlegroups.com.