Hi,
I want to train a TDNN model with a large amount of data. The data size is very huge, approx 30k hours after a couple of custom augmentations (augmentation contains very high noise and high speech distortion to make ASR robust for all kind of ambiance).
To train a gmm-hmm model i have chosen the following configuration:
1. subset data size: 500k utterances, approx 800hr random dataset.
But I'm not sure 800hr of random data would be sufficient to train a good GMM-HMM model for alignment.
2. deltas training (train_deltas.sh):
leaves: 11500
gauss: 400000
3. LDA MLLT training:
leaves: 11500
gauss: 800000
3. SAT training:
leaves: 11500
gauss: 1600000
Can you please guide me, if I'm making any mistakes?
Thanks