(1) Right now to save time, I'm training on a subset of the data (around 30 hours, and the goal data is around 120 hours), will the hyperparameters need to change when I train on larger data to produce the maximum accuracy? or will it not differ?
Note: The data is from the same language (so same phonetics), but with extra vocabulary, thus different language model.
(2) If I were to use Binary search to look for the parameter that produces highest accuracy, when should I stop? at 1 totgauss difference? or 50? or 100? (I guess I' asking about the minimum step size that will make a difference in accuracy)
(3) Do you have suggestions for the percentage of data the dev set should be?
(4) I have noticed, that sometime Sil is inserted between words or in the middle of the word when it is actually not needed, so it eats up some of the phones. Will changing the boost-sil to being less than 1 help this? or should I do something else?
Note: that my data probably has sil at beg and end of audios. Sometimes between words, sometimes not. never in the middle of a word.
(5)Does higher accuracy mean better alignment? or is there another way to measure alignment accuracy?
Thank you so much!