reproduce mfcc-htk type

99 views
Skip to first unread message

Georgi Dzhambazov

unread,
Oct 25, 2016, 1:31:26 PM10/25/16
to librosa

I am migrating  the MFCC feature extraction from htk to librosa.

I want to make sure they have the same output for the same audio.

I figured out the big difference comes from the mel spectrogram:
https://github.com/georgid/mfcc-htk-an-librosa

See the melscale:
https://github.com/georgid/mfcc-htk-an-librosa/blob/master/mel_spectrogram_htk_librosa.png

any ideas why the amplitude scale looks different?

Dan Ellis

unread,
Oct 25, 2016, 5:03:59 PM10/25/16
to Georgi Dzhambazov, librosa
htk uses optional "liftering" to scale the higher-index values.  Looks like the top plot has liftering, the bottom plot doesn't.

I went through the exercise of matching HTK and other MFCC approaches back in the time of Matlab:


  DAn.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/cd517679-2252-4576-920e-e1944da0ee28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Georgi Dzhambazov

unread,
Nov 3, 2016, 12:05:37 PM11/3/16
to librosa, joro.dz...@gmail.com, dp...@ee.columbia.edu
Dear Dan, Thank you for your reply.
Yes, in htk I use  CEPLIFTER=22. According to the htk book liftering is applied after the DCT transform (chapter 5.6) .
However here we are looking at the log mel-scale spectrum just before applying DCT, so the difference should come from the mel filterbank, right?

In librosa's code htk=True sets only the Hz to mel-scale formula to be of the htk type. 
Does anybody know if there is a way in librosa to try to make same filterbank (the triangles' shape) as in htk (chapter 5.4 from htk book).
    
Cheers, Georgi



On Tuesday, October 25, 2016 at 11:03:59 PM UTC+2, Dan Ellis wrote:
htk uses optional "liftering" to scale the higher-index values.  Looks like the top plot has liftering, the bottom plot doesn't.

I went through the exercise of matching HTK and other MFCC approaches back in the time of Matlab:


  DAn.
On Tue, Oct 25, 2016 at 1:31 PM, Georgi Dzhambazov <joro.dz...@gmail.com> wrote:

I am migrating  the MFCC feature extraction from htk to librosa.

I want to make sure they have the same output for the same audio.

I figured out the big difference comes from the mel spectrogram:
https://github.com/georgid/mfcc-htk-an-librosa

See the melscale:
https://github.com/georgid/mfcc-htk-an-librosa/blob/master/mel_spectrogram_htk_librosa.png

any ideas why the amplitude scale looks different?

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.

Dan Ellis

unread,
Nov 3, 2016, 10:40:15 PM11/3/16
to Georgi Dzhambazov, librosa
Good point, the liftering should not be involved here. 

But there are probably 2 or 3 things different. The big effect is probably noalization of the individual Mel filters for constant max value (top plot) vs. constant total energy (bottom plot). Try redoing the plot after scaling each row in each matrix to have the same peak value (which would normalize out that effect).

But the frequency scale must be different, because the bin of the visible ridge in the lower plot doesn't match the bin of the first strong ridge in the top plot. May be HTK axis vs the other one (I think of it as Hermansky, but he got it from someone else). 

If you analyze the Matlab I sent you, you should find the right combination of variables. 

  DAn. 
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/25f58cdb-41f7-482c-9e3f-a4dad0c31492%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages