MFCC or FBANK

1,774 views
Skip to first unread message

Ярослав Пикалёв

unread,
Mar 16, 2019, 6:03:38 PM3/16/19
to kaldi-help
Hi. It's not a question about Kaldi ASR. But it's question about ASR.
What method of feature extraction should be used for continuous speech recognition? MFCC or FBANK (100 h of recorded audio, bottleneck dnn and dnn)?

Ярослав Пикалёв

unread,
Mar 16, 2019, 6:14:59 PM3/16/19
to kaldi-help
I did a web-search and found article https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html.
So I should use FBANK if the machine learning algorithm is not susceptible to highly correlated input? Else I should use MFCCs?
I know u are getting a lot of simple questions but ...

Daniel Povey

unread,
Mar 16, 2019, 6:18:33 PM3/16/19
to kaldi-help
Generally the key is to have a feature that has a high enough dimension to capture enough of the shape of the spectrum-- generally with a dimension of 40 to 100.  In Kaldi we use MFCCs but they are without dimension reduction, so there are 40 filter banks and we use all 40 MFCCs.  This is just because they are more straightforward to compress heavily on disk; with this configuration they are the same as filterbanks, just with a linear transformation.

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/1452ff2c-8e6e-4f4c-bb9e-98a90e16ed9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages