Hi all, I have two questions regarding the lexicon file that we create when training the acoustic model,
1) Multiple pronunciations for words:
If I have words that might have several pronunciations, for example:
the z eh
the z iy
the d eh
etc'...
And words that have generated pronunciations via g2p.
In the training data. I do not know which of these pronunciations was said, could this heavily affect the training and is there a more optimal way to deal with multiple pronunciations for words? Is it better to have only one pronunciation per word?
2) Learning oov words - as part of the training the 'spn' phoneme is also learned, for what I understand this is done by choosing words that will not be present in the lexicon when training. is there a smart way to choose these words and/or a good rule of thumb for the percentage of words to choose such that they will be learned as oov words in order for the model to learn spn well?
Many thanks