GMM-HMM hyperparameter tuning questions ^_^

361 views
Skip to first unread message

Maryam Shalaby

unread,
Jun 2, 2020, 9:55:02 AM6/2/20
to kaldi-help

I have a few questions regarding the hyper parameter tuning of the GMM-HMM (train_mono.sh/train_deltas.sh/train_lda_mllt.sh/train_sat.sh), my goal is produce an accurate forced alignment on the training data:

(1) Right now to save time, I'm training on a subset of the data (around 30 hours, and the goal data is around 120 hours), will the hyperparameters need to change when  I train on larger data to produce the maximum accuracy? or will it not differ? 
Note: The data is from the same language (so same phonetics), but with extra vocabulary, thus different language model.

(2) If I were to use Binary search to look for the parameter that produces highest accuracy, when should I stop? at 1 totgauss difference? or 50? or 100? (I guess I' asking about the minimum step size that will make a difference in accuracy)

(3) Do you have suggestions for the percentage of data the dev set should be?

(4) I have noticed, that sometime Sil is inserted between words or in the middle of the word when it is actually not needed, so it eats up some of the phones. Will changing the boost-sil to being less than 1 help this? or should I do something else? 
Note: that my data probably has sil at beg and end of audios. Sometimes between words, sometimes not. never in the middle of a word.

(5)Does higher accuracy mean better alignment? or is there another way to measure alignment accuracy?

Thank you so much!

dragon stone

unread,
Sep 6, 2021, 4:29:02 AM9/6/21
to kaldi-help
Being a new kaldi freshman, I have the same problems as yours

nshm...@gmail.com

unread,
Sep 6, 2021, 4:31:24 PM9/6/21
to kaldi-help
> If I were to use Binary search to look for the parameter that produces highest accuracy, when should I stop? at 1 totgauss difference? or 50? or 100? (I guess I' asking about the minimum step size that will make a difference in accuracy)

These days when everything is done with neural networks, GMM is only an intermediate step to speedup the training. Accuracy of GMM has very little effect on accuracy of the final system. There is no much sense to tune GMM hyperparameters, you can take default numbers from recipe of similar size.


> Will changing the boost-sil to being less than 1 help this? 

Yes


> Does higher accuracy mean better alignment? or is there another way to measure alignment accuracy?

There is no direct relation between alignment quality and accuracy. There are hand-annotated test sets to check the quality of alignment.

Moreover, the whole idea of alignment doesn't apply well to the continuous spontaneous speech which doesn't really have boundaries between sounds, sounds they are continuously evolving without any definite point where one replaces the other. It helps to reformulate your task in a way it doesn't depend on a boundary between sounds. Many recent advances in speech system accuracy/quality have been achieved this way.
Reply all
Reply to author
Forward
0 new messages