Context Dependency

198 views
Skip to first unread message

Kina T.

unread,
Jan 10, 2018, 11:07:43 PM1/10/18
to kaldi-help
Dear all,

i build context independent GMM and DNN system using kaldi. When i build the Context independent GMM, i train mono and [triphone models - with deltas, LDA+MLLT and SAT by giving --"context-opts = --context-width=1, central-potion=0" as training parameters for all triphones to make context independent models] and finally the context independent SAT system got above provided to the DNN as an input to come up with DNN acoustic model. Do i used the correct procedure? if not is there any means to build context independent way other than the above for in kaldi?

With regards,

Daniel Povey

unread,
Jan 11, 2018, 1:10:12 AM1/11/18
to kaldi-help
Yes, that would probably be the best way.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/bddec0e8-e4f0-46f7-a1d7-ae94949f7013%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kina T.

unread,
Jan 11, 2018, 6:34:28 AM1/11/18
to kaldi-help
Thanks Dan.

Kina T.

unread,
Jan 12, 2018, 1:27:12 AM1/12/18
to kaldi-help
Hi Dan,
When i analysed the performance of my  GMM and DNN systems, using GMM LDA+MLLT features achieved 12.65%WER and when i trained the SAT over LDA+MLLT alignment i got 13.94%WER which shows performance reduction when i used the fmllr speaker adaption than the LDA+MLLT. On DNN case using the SAT alignment as an input  i got 11.43%WER and using LDA+MLLR features i got 11% . My question is:GMM and DNN system build using fmllr speaker adaptions gives lower perfromance than LDA+MLLT features but i expect the speaker adaption improve the performance of the systems, what is the reason this happens and how can  i try to solve the issues? 

With best regards,

On Thursday, January 11, 2018 at 12:07:43 PM UTC+8, Kina T. wrote:

Daniel Povey

unread,
Jan 12, 2018, 1:32:40 AM1/12/18
to kaldi-help
Maybe your speakers had too little data per speaker, or you had no speaker information and you were effectively adapting per utterance.  This becomes a no-op for utterances under 5 seconds.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

Kina T.

unread,
Jan 12, 2018, 2:06:36 AM1/12/18
to kaldi-help
Thanks ,
My speakers are 120 from those 100 speaker has 100 sentences per speaker and remain 20 has 50 sentences. 
About the second issue how can i check it? 

On Friday, January 12, 2018 at 2:32:40 PM UTC+8, Dan Povey wrote:
Maybe your speakers had too little data per speaker, or you had no speaker information and you were effectively adapting per utterance.  This becomes a no-op for utterances under 5 seconds.

On Fri, Jan 12, 2018 at 1:27 AM, Kina T. <tessf...@gmail.com> wrote:
Hi Dan,
When i analysed the performance of my  GMM and DNN systems, using GMM LDA+MLLT features achieved 12.65%WER and when i trained the SAT over LDA+MLLT alignment i got 13.94%WER which shows performance reduction when i used the fmllr speaker adaption than the LDA+MLLT. On DNN case using the SAT alignment as an input  i got 11.43%WER and using LDA+MLLR features i got 11% . My question is:GMM and DNN system build using fmllr speaker adaptions gives lower perfromance than LDA+MLLT features but i expect the speaker adaption improve the performance of the systems, what is the reason this happens and how can  i try to solve the issues? 

With best regards,

On Thursday, January 11, 2018 at 12:07:43 PM UTC+8, Kina T. wrote:
Dear all,

i build context independent GMM and DNN system using kaldi. When i build the Context independent GMM, i train mono and [triphone models - with deltas, LDA+MLLT and SAT by giving --"context-opts = --context-width=1, central-potion=0" as training parameters for all triphones to make context independent models] and finally the context independent SAT system got above provided to the DNN as an input to come up with DNN acoustic model. Do i used the correct procedure? if not is there any means to build context independent way other than the above for in kaldi?

With regards,

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Kina T.

unread,
Jan 12, 2018, 2:25:52 AM1/12/18
to kaldi-help


On Friday, January 12, 2018 at 3:06:36 PM UTC+8, Kina T. wrote:
Thanks ,
My speakers are 120 from those 100 speaker has 100 sentences per speaker and remain 20 has 50 sentences. 
    During data preparation, in text the utterances not contain the speaker id may be this is the problem. If the specify on utterance speaker id is gain a performance improvement?      what do you recommend shall i return back and fix it or shall i used the current result?

Daniel Povey

unread,
Jan 12, 2018, 1:40:23 PM1/12/18
to kaldi-help
Including the speaker-id as part of the utterance-id -- if that's what you mean-- should not make any difference (if it was a problem it would lead to a validation failure).
I suspect you have too many parameters.  If you show the output of steps/info/gmm_dir_info.pl on the directories I may see something.

Dan


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Kina T.

unread,
Jan 12, 2018, 2:29:03 PM1/12/18
to kaldi-help
Thanks Dan,

When i prepared the utt2spk  i used the <utterance Id> <utterance Id> not <utterance id> <speaker Id> may this affects the speaker information. So changing the utt2spk has not any impact at all?

The output of the steps/info/gmm_dir_info.pl for tri2b and tri3b directories:

exp/tri2b: nj=20 align prob=-47.04 over 26.04h [retry=0.5%,fail=0.1%] states=1672 gauss=25047 tree-impr=3.47 lda-sum=13.19 mllt:impr, logdet=1.07,1.77

exp/tri3b: nj=20 align prob=-47.19 over 26.04h [retry=0.4%, fail=0.1%] states=1972 gauss=35039 fmllr-impr=5.03 over 17.58h tree-imp=5.08

With Best regards,

Daniel Povey

unread,
Jan 12, 2018, 2:36:58 PM1/12/18
to kaldi-help
Oh, you should definitely use the <utterance-id> <speaker-id> in utt2spk.  That does make a difference.  I thought you were talking about something else.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Kina T.

unread,
Jan 12, 2018, 2:40:48 PM1/12/18
to kaldi-help
Thank Dan.

Kina T.

unread,
Jan 21, 2018, 3:10:52 AM1/21/18
to kaldi-help
Hi Dan.,
Just i prepared the data back and used the utt2spk in the form of <utterance-id> <speaker-id> . Now the performance of the all the triphones are improved but the performance of the SAT system is lower than the LDA . i tried to tune by training different leaves and Gaussian but it is not improved. for example i got  12.53%, 12.14%, 11.94% and 12.31% for tri1, tri2a, tri2b and tri3b respectively: the gmm-dir information is as follows:
exp/tri1: nj=20 align prob=-95.08 over 26.04h [retry=0.4%, fail=0.1%] states=1418 gauss=15031 tree-impr=3.76
exp/tri2a: nj=20 align prob=-94.59 over 26.04h[retry=0.3%, fail=0.1%] states=2104 gauss=25047 tree-impr=3.92
exp/tri2b: nj=20 align prob=-46.96 over 26.04h [retry=0.4%, fail=0.1%] states=2156 gauss=30046 tree-impr=3.28 lda-sum=14.24 mllt:impr, logdet=1.09,1.78
exp/tri3b: nj=20 align prob=-47.14 over 26.04h [retry=0.4%, fail=0.1%] states=2144 gauss=30030 tree-impr=3.61 over 17.54h tree-impr=4.80

With best regards,
thanks

Daniel Povey

unread,
Jan 21, 2018, 1:49:33 PM1/21/18
to kaldi-help
It may just be random noise; perhaps your test set is not that bigand you're seeing noise.


To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

Kina T.

unread,
Jan 21, 2018, 2:43:38 PM1/21/18
to kaldi-help
Thanks Dana,
This problem is faced on the context dependent system where as for context independent case the performance of SAT is better than the others.
What do you recommend me?
Reply all
Reply to author
Forward
0 new messages