extract features then segment short clips, or vice versa?

274 views

Skip to first unread message

Kendra Oudyk

unread,

May 31, 2021, 3:22:23 PM5/31/21

to librosa

Hi all,

Sorry for the beginner question!

I am selecting short audio clips from a longer waveform, then I calculate spectral features on each short clip. I keep getting this kind of warning:

UserWarning: n_fft=1024 is too small for input signal of length=11

Would it be better to

A) zero-pad the short clips,

B) choose a shorter n_fft (and loose spectral resolution, from what I understand), or

C) extract the features first, then segment (and potentially loose temporal accuracy)?

I'm working with birdsong, so there will be some important high frequencies. My sampling rate is 22050.

Thanks!

Kendra

PhD Candidate

Montreal Neurological Institute

McGill University, Canada

Brian McFee

unread,

Jun 10, 2021, 3:39:46 PM6/10/21

to librosa

In general, it's going to depend on what kind of features you're extracting and how badly the effects of padding would be for you.

In your particular case, it seems like you're hitting an edge case at the end of the signal (ie the last 11 samples), and I don't think any amount of padding is going to give you a very reasonable representation from such a short observation window. In that situation, I'd probably drop segments that are below a reasonable length threshold. (BTW, the warning message should have been fixed in the 0.8.1 release; n_fft is too large for your signal, not too small. But that was just a cosmetic error, the functionality hasn't changed.)

If you can tolerate overlap in your observations (as we usually do), then I'd go for option (C). I don't think you'd necessarily lose any temporal accuracy there, as STFT frames usually overlap anyway.

Reply all

Reply to author

Forward

0 new messages