Removing partial windows from MFCC calculations

Pablo Saunders-Shultz

unread,

May 29, 2024, 9:21:13 AM5/29/24

to librosa

Hello,

I am new to this group and library. I am doing some exploratory research on using typical audio processing features applied to infrasound data for geophysical machine learning problems. Two key libraries I am using for this work are librosa and obspy. Using the obspy windowing function trace.slide(), I can specify a window_length and stepsize (in seconds). Similarly using librosa.feature.mfcc I can set the win_length and hop_length (in samples), however I get different length results. I could be wrong, but I think this is because librosa is using partial windows to calculate mfccs? I could not find any settings which do not use partial windows for the mfcc calculation, and had some trouble figuring out which samples to remove from the mfcss to remove all the ones calculated from partial windows. So my questions are:
- Does the mfcc calculation indeed use partial windows?
- Does it create partial windows at the start and end of the signal? Or just at the end?
- i.e in my test case, obspy returns 31 windows, and librosa returns 36, should I remove 5 windows from the end? Or some windows from the beginning and some from the end?

Insight, advice, questions, "hellos" are welcome. Thanks

Brian McFee

unread,

May 29, 2024, 9:25:08 AM5/29/24

to librosa

By default, librosa uses centered analysis windows (center=True in functions like mfcc, stft, etc). This is accomplished by padding a half-frame on either side of the signal prior to analysis, and is most likely the source of the disparate frame counts you're seeing.

You can disable this by setting center=False, and you'll most likely see identical frame counts. If there's still a disagreement afterward, it might be due to rounding error when converting seconds to sample counts.

Pablo Saunders-Shultz

unread,

Jun 1, 2024, 10:37:39 AM6/1/24

to librosa

Thank you. I got it to produce the same number of windows/frames. But only if I set n_fft = win_length. Digging into the code, I think this is caused by the fact that padding is not allowed if center=False. Since using center=False disallows padding, that means that the number of frames is determined by n_fft, not actually by win_length. So I can only get the correct number of frames if I set n_fft=win_length. Am I understanding the code correctly? Is there a reason why there is no padding allowed if center=False?

It would be great to be able to use a different n_fft value and still get the correct number of frames. I guess the way to do that would be to use center=True, and manually add half a frame to the beginning of my data so that the new "center" is actually the "left" of the original data? Then I would need to figure out which of the output mfccs to remove from the data, since they are based on partial windows.

Brian McFee

unread,

Jun 1, 2024, 10:42:51 AM6/1/24

to librosa

I think you have this correct: padding is only enabled with centered analysis.

The case you're describing would apply padding only to the right side of each frame. While we don't support this automatically through the API, it is still possible to achieve this effect by manually constructing a window that has your desired length and then padding it out to the frame length. Something like:

>>> window = librosa.filters.get_window('hann', win_length, fftbins=True) # use win length here

>>> window = librosa.util.fix_length(window, size=n_fft) # zero-pad out to match n_fft

>>> D = librosa.stft(y, window=window, n_fft=n_fft, hop_length=hop_length, center=False) # non-centered analysis with win_length < n_fft

Reply all

Reply to author

Forward