MFCC returns different values depending on values outside of window

149 views
Skip to first unread message

Willy Kim

unread,
Apr 14, 2022, 12:45:34 AM4/14/22
to librosa
Hi,

I've been struggling with this issue for a few days.
I have a very basic understanding of how the mfcc function works, and I need some help.

Here is a gist containing my test code.
My question is why are the mfccs different for the first 50ms when the values for the first 2 seconds are exactly the same?
The only difference is that the second audio is longer.

The main problem I'm having is that I've trained my model on segmented data, and because the mfcc values are different for a non-segmented audio, my model's performance decreases significantly.

This originated from a real world data, and I'm just using random data as an example.
Thanks for all your help!

Willy

Vincent Lostanlen

unread,
Apr 14, 2022, 3:24:54 AM4/14/22
to Willy Kim, librosa
Hello,

This is probably some effect of padding and windowing. If you’re processing audio streams in chunks, make sure to leave out some amount of overlap between chunks so as to compensate for this

I imagine that the 50 milliseconds you mention correspond to the influence of padding (see hop_length and center=True in your MFCC call)

The stride function in librosa can potentially help you with sorting this out


I hope this helps


Vincent


--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/2bb82a24-81f4-4e8a-9557-4698a3eadea7n%40googlegroups.com.

Brian McFee

unread,
Apr 14, 2022, 6:05:12 AM4/14/22
to librosa
I suspect this is happening because of how decibels are calculated within mfccs.  By default, power_to_db will truncate at 80dB below the peak power measurement.  The peak of the first 2 seconds of a signal might not be the same as the peak of a longer observation, which leads to different truncations.

You can bypass this by calling mfcc with top_db=None.

Willy Kim

unread,
Apr 14, 2022, 9:20:37 AM4/14/22
to librosa
At first, I was having this problem because center=True, but for this test, I set it False.
Would padding still influence this?

Thanks for the help!

Willy

Willy Kim

unread,
Apr 14, 2022, 10:01:56 AM4/14/22
to librosa
Brian,

It says: mel() got an unexpected keyword argument 'top_db'
I tried calling amplitude_to_db with top_db=None before calling mfcc, but it doesn't seem to fix the issue.

Willy

Brian McFee

unread,
Apr 14, 2022, 10:27:02 AM4/14/22
to librosa
Ah right, sorry. You'll need to compute your mel spectrogram separately for this to work.  Something like:

logmel = librosa.power_to_db(librosa.feature.melspectrogram(...), top_db=None)
mfcc = librosa.feature.mfcc(S=logmel, ...)

Willy Kim

unread,
Apr 14, 2022, 11:32:35 AM4/14/22
to librosa
Seems like that partially fixes the issue.

In the real-world use case I'm interested in,  I can only go extra 2 seconds without the top_db solution.
With it, I can go extra 36 seconds, but if I go more, I get the same problem.

Also, the following seems to do the same:
db = librosa.power_to_db(data, top_db=None)
mfcc = librosa.feature.mfcc(y=db, ...)

This solution also doesn't seem to work on the random data in that gist.
If you'd like, I can try to make a repo with my real-world use case.

Willy
Reply all
Reply to author
Forward
0 new messages