About Transition-id, need some more explanation

Zicheng Qiu

unread,

Nov 22, 2017, 12:35:15 PM11/22/17

to kaldi-help

Hi, All:

It seems that Transition-ids can only identify transitions between intra-phone states. For example, there is a transition-id identifying [phone=a, hmm-state=0] --> [phone=a, hmm-state=1].

My question is, can transition-ids indentify transitions between inter-phone states? For example, is there a transition-id for [phone=a, hmm-state=2] --> [phone=b, hmm-state=0] ?

The confusion comes from my misunderstanding of doc "HMM topology and transition modeling". All of transition-ids are defined in the transition mode file (e.g. 40.mdl). The "show-transitions" command gives the results showing that Transition-ids only indentify intra-phone states jumping.

However, during alignments, the sequence of transition-ids is used to identify both intra-phone states transition and inter-phone states transition. In any embedding training, there must be inter-phone states transitions.

output of show-transitions, I can only find intra-phone states transitions here:

Transition-state 4: phone = a hmm-state = 0 pdf = 3
 Transition-id = 7 p = 0.713307 count of pdf = 390711 [self-loop]
 Transition-id = 8 p = 0.286693 count of pdf = 390711 [0 -> 1]
Transition-state 5: phone = a hmm-state = 1 pdf = 4
 Transition-id = 9 p = 0.594051 count of pdf = 275931 [self-loop]
 Transition-id = 10 p = 0.405949 count of pdf = 275931 [1 -> 2]
Transition-state 6: phone = a hmm-state = 2 pdf = 5
 Transition-id = 11 p = 0.594987 count of pdf = 276569 [self-loop]
 Transition-id = 12 p = 0.405013 count of pdf = 276569 [2 -> 3]

Thanks,

Quinn Qiu

Daniel Povey

unread,

Nov 22, 2017, 1:24:17 PM11/22/17

to kaldi-help

The last transition of a phone is the one that goes to the next phone, e.g.:

Transition-id = 12 p = 0.405013 count of pdf = 276569 [2 -> 3]

goes to state 3 of this phone, which is the final nonemitting state, so effectively it goes to the next phone. Technically, it's any transition-id going to the non-emitting final state of the phone.

However, for search efficiency we do something called re-ordering, so the self-loop comes after the forward transition. (We probably should have just gone ahead an adopted an arc-based rather than state-based formalism rather than do this, but that's the way it is). Anyway, except for chain models that distinction doesn't matter for training purposes because they share the same pdf.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/34e885c6-47e2-4c91-97df-7062af75dd65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

QuinnQiu

unread,

Jan 20, 2018, 3:15:23 PM1/20/18

to kaldi-help

Hi, Dan and All:

Dan, thank you very much for the explanation.

I have got clear about transition to the next phone now.

But I have some questions about "goes to the next phone" in more details, furtherly.

I found the transition probability [HmmState2 -> HmmState3] is very different from the probability of [phone=a -> phone=b], or P(b|a).

Precisely, I think, the transition-id of a triphone from HmmState=2 to HmmState=3 just tells us how the phone ends. In H transducer, we need an Arc with label (eps:eps) to make the transition from the initial/final state to the start/loop state. And the start/loop state is shared by all of triphone Hmm fsts (V. Panayotov's blog and the function MakeLoopFst()). The probability of [HmmState=2 -> HmmState=3] only tells us its difference from the self-loop [HmmState=2 -> HmmState=2].

And the conditional probability of P(b|a) is totally another concept that a transition-id can't represent.

So, the transition [HmmState=2 -> HmmState=3] shows phone's ending. The transition of a phone goes to the next phone must include two arcs, [HmmState=2 -> HmmState=3] and the arc with a label of eps:eps.

For example, we got three words: catch, catches and catched. From triphone k/ae/ch to the next triphone we may get it in the Fig. below. It shows that transition-id [2->3] can't help to distinguish between ae/ch/eps_0, ae/ch/i_0 and ae/ch/d_0. But we use the emitting probabilities from GMM or nnet to help to distinguish them.

My question is:

1) Is the transition model really effective in an ASR system, especially comparing to context-dependent-phone or GMM model? Is it really helpful to reduce WER?

I think it is more like a kind of glue to make other parts together.

Thanks a lot for your help!

Quinn

--------------------------------------------------

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,

Jan 20, 2018, 3:22:04 PM1/20/18

to kaldi-help

I don't have time to go through this in detail, but in response to the question about how important the transition modeling is, the answer is: not very important. Turning it off completely would lose you only around 0.1% absolute of performance in typical cases.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/651e262f-b150-4aeb-a45b-e27f52225e43%40googlegroups.com.

QuinnQiu

unread,

Jan 20, 2018, 3:39:34 PM1/20/18

to kaldi-help

Hi, Dan:

Thank you for the number. It is very helpful.

Quinn

Reply all

Reply to author

Forward