Hi Dan and everybody. Thanks a lot for the answer but one question remains.
I have been reading about "hires". And there is just one last sanity check I'd be very grateful if you could do.
My understanding from your reply is that the deltas and delta-deltas are typically not fed together with the MFCCs as input for the DNN.
The conventional role of the deltas and delta-deltas belongs to the GMM, where during the iterative triphone generation, progressively better alignments are obtained...
After the best alignments of the GMM are produced (tri5_ali), nobody cares about deltas anymore.
For each frame instead of the 13MFCC + 13-deltas + 13 deltas-deltas we forget the deltas and delta-deltas and instead of just computing 13 MFCCs, we compute 40 for each frame and fed context windows of mfcc hires to the DNN.
Is that it? If so, do the Mel frequency bands get thinner? (In order to pass from 13 to 40 MFCCs? )
Thanks a lot,
The source of my confusion was because they both have similar dimensions (13 MFCCs + 13 + delta + 13 + delta-delta vs 40 MFCC hires )