WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 1 with no stats; corresponding phone list: 6 7 8 9 10

298 views
Skip to first unread message

Sage Khan

unread,
Jul 25, 2022, 11:17:47 AM7/25/22
to kaldi-help
Hello

While training on my own custom data set, this warning occurs every time during mono, tri and dnn model training.

WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 1 with no stats; corresponding phone list: 6 7 8 9 10

the whole output is:

============================================================================
              tri3 : LDA + MLLT + SAT Training              
============================================================================
steps/align_si.sh --nj 15 --cmd run.pl --use-graphs true data/train_100h data/lang exp_gv_100h/tri2_6000_120000 exp_gv_100h/tri2_ali
steps/align_si.sh: feature type is lda
steps/align_si.sh: aligning data in data/train_100h using model from exp_gv_100h/tri2_6000_120000, putting alignments in exp_gv_100h/tri2_ali
steps/diagnostic/analyze_alignments.sh --cmd run.pl data/lang exp_gv_100h/tri2_ali
analyze_phone_length_stats.py: WARNING: optional-silence SIL is seen only 46.365802800119155% of the time at utterance begin.  This may not be optimal.
analyze_phone_length_stats.py: WARNING: optional-silence SIL is seen only 48.91980360065467% of the time at utterance end.  This may not be optimal.
steps/diagnostic/analyze_alignments.sh: see stats in exp_gv_100h/tri2_ali/log/analyze_alignments.log
steps/align_si.sh: done aligning data.
steps/train_sat.sh --cmd run.pl 8000 160000 data/train_100h data/lang exp_gv_100h/tri2_ali exp_gv_100h/tri3_8000_160000
steps/train_sat.sh: feature type is lda
steps/train_sat.sh: obtaining initial fMLLR transforms since not present in exp_gv_100h/tri2_ali
steps/train_sat.sh: Accumulating tree stats
steps/train_sat.sh: Getting questions for tree clustering.
steps/train_sat.sh: Building the tree
steps/train_sat.sh: Initializing the model
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 1 with no stats; corresponding phone list: 6 7 8 9 10
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 4 with no stats; corresponding phone list: 19 20 21 22
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 8 with no stats; corresponding phone list: 35 36 37 38
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 9 with no stats; corresponding phone list: 39 40 41 42
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 10 with no stats; corresponding phone list: 43 44 45 46
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 11 with no stats; corresponding phone list: 47 48 49 50
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 13 with no stats; corresponding phone list: 55 56 57 58
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 17 with no stats; corresponding phone list: 71 72 73 74
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 18 with no stats; corresponding phone list: 75 76 77 78
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 19 with no stats; corresponding phone list: 79 80 81 82
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 24 with no stats; corresponding phone list: 99 100 101 102
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 25 with no stats; corresponding phone list: 103 104 105 106
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 26 with no stats; corresponding phone list: 107 108 109 110
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 27 with no stats; corresponding phone list: 111 112 113 114
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 29 with no stats; corresponding phone list: 119 120 121 122
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 35 with no stats; corresponding phone list: 143 144 145 146
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 36 with no stats; corresponding phone list: 147 148 149 150
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 38 with no stats; corresponding phone list: 155 156 157 158
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 39 with no stats; corresponding phone list: 159 160 161 162
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 40 with no stats; corresponding phone list: 163 164 165 166
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 45 with no stats; corresponding phone list: 183 184 185 186
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 48 with no stats; corresponding phone list: 195 196 197 198
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 49 with no stats; corresponding phone list: 199 200 201 202
WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 55 with no stats; corresponding phone list: 223 224 225 226
This is a bad warning.
steps/train_sat.sh: Converting alignments from exp_gv_100h/tri2_ali to use current tree
steps/train_sat.sh: Compiling graphs of transcripts
Pass 1
Pass 2
Estimating fMLLR transforms
Pass 3
Pass 4
Estimating fMLLR transforms
Pass 5
Pass 6
Estimating fMLLR transforms
Pass 7
Pass 8
Pass 9
Pass 10
Aligning data
Pass 11

How do we fix this?

Regards 

Sage Khan

unread,
Jul 25, 2022, 11:25:38 AM7/25/22
to kaldi-help
Update. I figured something out but havent tested yet. Re: WARNING (gmm-init-model[5.5]:InitAmGmm():gmm-init-model.cc:55) Tree has pdf-id 1 with no stats; corresponding phone list: 6 7 8 9 10

Tree is a file that we find in exp/mono exp/tri1 exp/tri2 and exp/nnet3 and so on. This is where our HCLG.fst and final.mdl reside.

Tree file (cant view the contents for some reason I don't know) must have a pdf-id 1,4,8,9  and so on which corresponds to phone number 6,7,8,9 and 10 . Each of these directories I mentioned above have phones.txt file which I believe is derived from our data/local/lang or data/local/dict (where ever we put lexicon and silence and non silence phones)

the phones.txt file in exp/mono tri or nnet3 look like this
$cat {kaldi-root}/egs/myasr/s5/exp/tri3/phones.txt
<eps> 0
SIL 1
SIL_B 2
SIL_E 3
SIL_I 4
SIL_S 5
<oov> 6
<oov>_B 7
<oov>_E 8
<oov>_I 9
<oov>_S 10
EY_B 11
EY_E 12
EY_I 13
EY_S 14
AA_B 15
AA_E 16
AA_I 17
AA_S 18
A_B 19
A_E 20

each phone is followed by a number. The format is <phone-name> <phone-number>

So when it says tree has pdf-id 1 with no state and corresponding phone list 6,7,8,9,10. It is referring to the above list and the non-state phones are : 
<oov>_B 7
<oov>_E 8
<oov>_I 9
<oov>_S 10

This is possibly because I have defined oov and SIL as silence phones and same is in lexicon. But throughout code Ive been using SIL or !SIL and not <oov>. that can be an issue.

So I will have to get rid of these phones in my original phones.txt file in my lang folder and this should not occur again.

Please let me know if I have faltered somewhere. I took time to write this so that some other newbie like me can learn before hand :D

Regards
KHAN

Jan Yenda Trmal

unread,
Jul 25, 2022, 11:55:16 AM7/25/22
to kaldi-help
yes, good thinking. That is the "issue".
y.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d7e7d6d7-5deb-413b-aab3-d08ec5b7e84an%40googlegroups.com.

Sage Khan

unread,
Jul 25, 2022, 12:29:48 PM7/25/22
to kaldi-help
Thanks :)

By the way, does this affect the overall accuracy or training of mode?

KHAN

Jan Yenda Trmal

unread,
Jul 25, 2022, 12:34:54 PM7/25/22
to kaldi-help
usually not much. the <oov> model is not very useful for detecting OOVs. It's mostly used for alignment and training.
but  this is something you have to measure for your dataset
y.

Sage Khan

unread,
Jul 25, 2022, 12:42:57 PM7/25/22
to kaldi-help
So should the lexicon contain SIL SIL and !SIL SIL? same in silencephones.txt SIL (and not oov) right?

Sage Khan

unread,
Jul 25, 2022, 1:07:15 PM7/25/22
to kaldi-help
Also, how do we view Tree file in exp/mono tri or nnet ... I cant view it on text editor. How do we edit the Tree file to remove the pdf-ids with no examples of phones?

Jan Yenda Trmal

unread,
Jul 25, 2022, 9:13:22 PM7/25/22
to kaldi-help
we do not edit trees in kaldi, they are regenerated quite often anyway.
You should edit the input data (lexicon and data/local/dict/{,non}silence_phones.txt
y


Reply all
Reply to author
Forward
0 new messages