Beginner question re: bin frequencies

111 views
Skip to first unread message

Casey Connor

unread,
Sep 29, 2022, 7:03:20 PM9/29/22
to librosa
Hi -- I'm playing with the stft:

[data is a single channel of floats, sample rate 48k]

spec = librosa.stft(data, n_fft=4096, hop_length=512)
bf = dict(enumerate(librosa.fft_frequencies(sr=48000, n_fft=4096)))

...that generates a 2049 element array representing bin frequencies.

Are these frequency values representing the boundaries of the bins or the center frequencies? E.g. bf[23] is the left edge or center of bin index 23?

Given the length of the array, I would assume these are bin boundaries?

So, spec[0][t] is the DC at index t, spec[1][t] is the spectral value at time t for bin centered at frequency (bf[1]+bf[2])/2 ?

Thanks for any clues!

Brian McFee

unread,
Oct 3, 2022, 3:35:43 PM10/3/22
to librosa
These are best thought of as center frequencies, not edges.

The STFT is computing a discrete fourier transform (DFT) of each frame of the signal independently.  Since the signal is real-valued, we use the rfft function to compute only the non-negative frequencies: 0, sr/N, 2sr/N, 3sr/N, ..., sr/2 (where N is the frame length), resulting in N//2+1 frequencies.

Usually (and by default) the frames are windowed, which has the effect of smearing frequency content in both directions (ie the main lobe) while reducing contributions from distant frequencies due to spectral leakage.  This smearing is symmetric in frequency - look at the DFT of the Hann window here for example: https://en.wikipedia.org/wiki/List_of_window_functions.

Casey Connor

unread,
Oct 27, 2022, 6:21:02 PM10/27/22
to librosa
Thank you, Brian!

So in my example above, bf[0] is the DC at that moment, bf[1] is 11.71875 which is the bin centered at 11.7 Hz.

How then should I interpret bf[2048], which has the value 24000?

What confuses me is that the last bin is centered on nyquist, but the signal can only contain content at or below nyquist, which would mean that that final bin is "not the same" as the other bins in terms of how much bandwidth it covers, since the other bins reflect content in the original signal that fell to either side of their center frequency, but the final bin has no content "to the right" of its center frequency... is that accurate?

Brian McFee

unread,
Oct 28, 2022, 9:59:58 AM10/28/22
to librosa
> but the final bin has no content "to the right" of its center frequency... is that accurate?

That's almost right, but you're not accounting for aliasing.

We're assuming the signal is band-limited to ±sr/2, so there should not be any energy at frequencies outside the band (ie above Nyquist).  However, remember that due to sampling, any frequency f will alias with f + k·sr for all integers k.  So if you have a frequency f  = sr/2 + b  (for some positive b, and let's say it's smaller than the bin spacing to keep things simple), then it will have an alias at (sr/2 + b) + (-1· sr) = - sr/2 + b.  This frequency *is* inside the band limits of the signal: -sr/2 <= -sr/2 + b <= sr/2.  If we're also assuming real-valued input, so that we have conjugate symmetry in the spectrum, then this aliasing frequency will also look like its negative: sr/2 - b.  In short, as you move above Nyquist, frequencies appear will reflect downward.

Casey Connor

unread,
Oct 28, 2022, 12:44:26 PM10/28/22
to librosa
I appreciate the help, thanks very much!
Reply all
Reply to author
Forward
0 new messages