What is the difference between MFCC and MFCC hires?

mura...@gmail.com

unread,

Sep 18, 2020, 4:33:59 PM9/18/20

to kaldi-help

Hi, I've searched for this question here in the group, the main post adressing it seems to be this one:

https://groups.google.com/g/kaldi-help/c/gMFIMck_a30

In this post it is said (by Dan) in the context that MFCC would be better to train GMMs rather than MFCC_hires:

"The use of 13-dim features for GMM systems was tuned decades ago."

This lead me to the conclusion that the MFCCs used to train GMMs do not include the deltas (hence MFCC dim=13) . This led me to the conclusion that the MFCC_hires include the deltas and the delta-deltas (making it 39) which is "arguably" the canonical dimension for these features that appears in most papers.

Is this right? The difference between MFCC and MFCC hires has to do with just dimension?

Thanks a lot for the patience.

Daniel Povey

unread,

Sep 19, 2020, 1:14:32 AM9/19/20

to kaldi-help

The deltas are added on the fly, not stored on disk. Regular MFCC feature are 13-dimensional, "hires" have 40 (actuall MFCCs, not deltas), which

is more suited for neural nets.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/afe1d057-6da0-412d-8ac9-e4d744d12b3fn%40googlegroups.com.

mura...@gmail.com

unread,

Jan 6, 2021, 10:41:08 PM1/6/21

to kaldi-help

Hi Dan and everybody. Thanks a lot for the answer but one question remains.

I have been reading about "hires". And there is just one last sanity check I'd be very grateful if you could do.

My understanding from your reply is that the deltas and delta-deltas are typically not fed together with the MFCCs as input for the DNN.

The conventional role of the deltas and delta-deltas belongs to the GMM, where during the iterative triphone generation, progressively better alignments are obtained...

After the best alignments of the GMM are produced (tri5_ali), nobody cares about deltas anymore.

For each frame instead of the 13MFCC + 13-deltas + 13 deltas-deltas we forget the deltas and delta-deltas and instead of just computing 13 MFCCs, we compute 40 for each frame and fed context windows of mfcc hires to the DNN.

Is that it? If so, do the Mel frequency bands get thinner? (In order to pass from 13 to 40 MFCCs? )

Thanks a lot,

The source of my confusion was because they both have similar dimensions (13 MFCCs + 13 + delta + 13 + delta-delta vs 40 MFCC hires )

lali...@gmail.com

unread,

Jan 7, 2021, 3:28:58 AM1/7/21

to kaldi-help

MFCC is DCT of mel-frequency, When we say about MFFC and its dimision, It means we choose only the first Coefficients of that DCT. (read some notes about that like HTK book or The Application of Hidden Markov Models in Speech Recognition)

when we say 13 MFCC it means 13 Coefficients of DCT and so on about 40.

and In speech recognition usually, add delta and delta-delta of MFCCs to the vector of MFCC due to robustness and improved result.

It means when we say 13 MFCCs with delta and delta-delta of them our feature vector has 39 dimensions and for 40 MFCC it has 40*3 dimensions.

Since using DCT to compute MFCC, MFCCs are discriminate and it is not good for training a neural network. In a neural network, if input features are correlated better. But in Kaldi, they use hires-MFCC to improve results.

Reply all

Reply to author

Forward