In triphone clustering, dose pdf id and the central position phone have one-to-one corresponding relation ?

364 views
Skip to first unread message

付嘉懿

unread,
Mar 27, 2018, 10:41:50 AM3/27/18
to kaldi-help
Hi all,
    
    I'm trying to look into the clustering process of tri-phone acoustic model. I have read the document "How decision trees are used in Kaldi" and the paper "Tree-based State Tying for High Accuracy Acoustic Modeling".
   
    In my understanding, if I use one-state HMM to train all phone's models(include silence phones and non-silence phones) and then use these models to do tri-phone clustering, the tied-state based approach degrades into model based approach. After generating the decision tree, each pdf-id corresponds to one tri-phone cluster and all the tri-phones in this cluster has the same central position phone.

    In my practice in Mandarin, I modify the shell script to get one-state HMM and then do the normal training process to get the decision tree.  But when I looked into the pdf-id and the tri-phone cluster of this pdf-id, I found these tri-phones have different central position phones. For example, in the cluster of pdf-id 2045 there are tri-phones such as i3/s/uai4 , er5/q/uai5 and ian1/r/uan2. In other words, the pdf-id and the central phones of tri-phone in the cluster if this pdf-id is not one-to-one corresponding.

    I am really confusing. Why this happen? Is it where I got it wrong ?
    I look forward to getting an answer as soon as possible.
    Thank you !
 

Daniel Povey

unread,
Mar 27, 2018, 1:39:29 PM3/27/18
to kaldi-help

OK, to rephrase your question: you expected that the set of triphones that map to any given pdf-id would all have the same central-phone, but you founde that is not the case.
Firstly, the number of states in the HMM is not relevant here.

The central phones of the triphones that map to any given pdf-id, should all be at the same "tree root", that is, they should all be on the same line of the roots.txt file.  If you created a roots.txt file where all the nonsilence phones were on the same line (e.g. because in your dict-dir that you provided to prepare_lang.sh, all the phones in nonsilence_phones.txt [IIRC] were on the same line), that would explain what you are seeing.

To avoid that, put the phones on different lines of nonsilence_phones.txt in the dict dir.

Dan

 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/49656bea-d829-4c25-9314-8568d20f9fee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

付嘉懿

unread,
Mar 28, 2018, 3:55:31 AM3/28/18
to kaldi-help
Dan, thanks for your answer!

After listening to your suggestions, I checked my "nonsilence_phones.txt" and I found I have put all the phones on different lines in the dict dir. However, the problem remains.
I show a part of my "nonsilence_phones.txt" and generated "roots.txt":
"nonsilence_phones.txt":
a1
a2
a3
a4
a5
aa
ai1
ai2
ai3
ai4
ai5
an1
an2
an3
...

"roots.txt":
shared split a1_B a1_E a1_I a1_S
shared split a2_B a2_E a2_I a2_S
shared split a3_B a3_E a3_I a3_S
shared split a4_B a4_E a4_I a4_S
shared split a5_B a5_E a5_I a5_S
shared split aa_B aa_E aa_I aa_S
shared split ai1_B ai1_E ai1_I ai1_S
shared split ai2_B ai2_E ai2_I ai2_S
shared split ai3_B ai3_E ai3_I ai3_S
shared split ai4_B ai4_E ai4_I ai4_S
shared split ai5_B ai5_E ai5_I ai5_S
shared split an1_B an1_E an1_I an1_S
...

And I still feel confused about the number of HMM states. 
If I use 3 state HMM and set "shared split" for non-silence phones in "roots.txt" file, it means different HMM state can be splited into different leaves. 
So I found for some tri-phones , different pdf-class return different pdf id. 
For example, for tri-phone ao4/iu5/van3, the c++ function ctx_dep.Compute(triphone, 0),  ctx_dep.Compute(triphone, 1) and ctx_dep.Compute(triphone, 2) return 3 different pdf-ids.
In my situation, I want the pdf-class 0,1,2 return the same pdf-id, should I change split to not-split? But not-split cause a error in build-tree that is "split-status.size != 0".

Thank you Dan, is it what I did wrong?

在 2018年3月28日星期三 UTC+8上午1:39:29,Dan Povey写道:

OK, to rephrase your question: you expected that the set of triphones that map to any given pdf-id would all have the same central-phone, but you founde that is not the case.
Firstly, the number of states in the HMM is not relevant here.

The central phones of the triphones that map to any given pdf-id, should all be at the same "tree root", that is, they should all be on the same line of the roots.txt file.  If you created a roots.txt file where all the nonsilence phones were on the same line (e.g. because in your dict-dir that you provided to prepare_lang.sh, all the phones in nonsilence_phones.txt [IIRC] were on the same line), that would explain what you are seeing.

To avoid that, put the phones on different lines of nonsilence_phones.txt in the dict dir.

Dan

 
    
    I'm trying to look into the clustering process of tri-phone acoustic model. I have read the document "How decision trees are used in Kaldi" and the paper "Tree-based State Tying for High Accuracy Acoustic Modeling".
   
    In my understanding, if I use one-state HMM to train all phone's models(include silence phones and non-silence phones) and then use these models to do tri-phone clustering, the tied-state based approach degrades into model based approach. After generating the decision tree, each pdf-id corresponds to one tri-phone cluster and all the tri-phones in this cluster has the same central position phone.

    In my practice in Mandarin, I modify the shell script to get one-state HMM and then do the normal training process to get the decision tree.  But when I looked into the pdf-id and the tri-phone cluster of this pdf-id, I found these tri-phones have different central position phones. For example, in the cluster of pdf-id 2045 there are tri-phones such as i3/s/uai4 , er5/q/uai5 and ian1/r/uan2. In other words, the pdf-id and the central phones of tri-phone in the cluster if this pdf-id is not one-to-one corresponding.

    I am really confusing. Why this happen? Is it where I got it wrong ?
    I look forward to getting an answer as soon as possible.
    Thank you !
 

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Mar 28, 2018, 1:27:17 PM3/28/18
to kaldi-help

I am a little confused now.  In your initial email you were talking about a one-state HMM, but now it looks like you are talking about a 3-state HMM but you want the pdfs to be shared.  If that's really what you want you can accomplish it by setting all the PdfClass values in the 'topo' file to zero.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

付嘉懿

unread,
Mar 28, 2018, 10:30:29 PM3/28/18
to kaldi-help
Thanks, Dan!

Actually, I want to talk about one state HMM, I did some experimentation on 3-state because you said "the number of states in the HMM is not relevant here".
Anyway, I talk about one-state HMM in this email.

Now my problem is I have put all the phones on different lines in the dict dir, in nonsilence_phones.txt file and I also checked the generated roots.txt file, all the phones also on the different lines.
In my previous email, I showed a part of my nonsilence_phones.txt and roots.txt so we can see I actually did that.

Although I did so, the problem still exists.  The set of tri-phones that map to any given pdf-id don't have the same central-phone.
And I fount not all pdf-ids have this problem, for a little pdf-ids, their tri-phones have the same central-phone.

So, Dan, is there anything else that could cause this problem? Or do I understand something wrong?

Thanks again !

在 2018年3月29日星期四 UTC+8上午1:27:17,Dan Povey写道:

Daniel Povey

unread,
Mar 28, 2018, 11:56:47 PM3/28/18
to kaldi-help
Although I did so, the problem still exists.  The set of tri-phones that map to any given pdf-id don't have the same central-phone.
And I fount not all pdf-ids have this problem, for a little pdf-ids, their tri-phones have the same central-phone.

How do you know this?
I need details.

 
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

付嘉懿

unread,
Mar 29, 2018, 11:42:26 AM3/29/18
to kaldi...@googlegroups.com
I will show the details of the process.

Firstly, I trained one-state HMM:
  1. I create a run.sh and do the normal training process in this script.
  2. in the process of data preparation, I add options "--num_sil_states 1  --num_nonsil_states 1" when using script "utils/prepare_lang.sh"
  3. in the training process, I add option "--pdf-class-list=0" in "train_deltas.sh" (cluster-phones --pdf-class-list=0)
I used the function "Compute(triphone, pdf-class, pdf-id)" of class "ContextDependency" to check the number of hmm states. 
When I change the parameter pdf-class, the function return the same pdf-id, so I'm sure the hmm is one-state.

Then I do the process of triphone clustering:
  1. I put all the phones on different lines in the dict dir, like the file in my previous emails.
  2. Do the normal process of triphone clustering: acc-tree-stats, cluster-phones, compile-questions, build-tree. In this process, I use the default options and didn't do any  modification except add "--pdf-class-list=0" after the command cluster-phones.
After completing the above two steps, I got the target tree file and can map any triphone to corresponding pdf-id.
After the first time doing train_delta.sh, the num-pdfs is 2485 and the option num-leaves is 3200 so the number of leaves is enough.

Then I used this tree to get the central phone of each pdf-if because I want verify my guess that the set of tri-phones that map to any given pdf-id would all have the same central-phone then the HMM is one-state.

First, I traverse all the phone and combine mono-phone to tri-phone.
eg: There are to phones a and b, the all eight tri-phones: a/a/a, a/a/b, a/b/a, a/b/b, baa and so on.
Because I don't know how to get the tri-phones cluster of each pdf-id directly, so I used the traverse way.

By doing the traversing process I get all possible tri-phones in my lexicon. Then I use tree file and the function "Compute(triphone, pdf-class, pdf-id)" to get the pdf-id of each triphone.
After I got the map from all possible triphone to their pdf-id, I found that there are many pdf-ids that have more than one central phone.
egs: The cluster of pdf-id 0 has triphone i3/b/b and triphone i3/ee/s. 

Dan, is it where I got it wrong ? Or is there anything I didn't notice?
I feel really confused and I very much hope to get some answers!

Thanks !


You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/BsYcGifrLEI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Daniel Povey

unread,
Mar 29, 2018, 1:33:37 PM3/29/18
to kaldi-help
One possibility is that you were using '--central-position 0' so it treats the first phone as being central.

Or maybe there was a bug in your code somewhere.


Dan


Reply all
Reply to author
Forward
0 new messages