librosa.stft() getting shifted spectrum

139 views
Skip to first unread message

D. S. Parihar

unread,
Oct 12, 2023, 3:28:43 PM10/12/23
to librosa
Hi

Recently I have come up with a problem with librosa.stft (). I recorded a sinusoidal signal at 48 kHz having three different frequencies 200 Hz, 500 Hz and 23000 Hz.
Case 1: I read the signal using librosa.load () at sr = 48 kHz and calculated the STFT using librosa.stft () .  When I plot output D, I am getting correct frequency distribution.
                     D = np.abs(librosa.stft  (wav_file1,
                                          win_length  = 2048,
                                          hop_length  = 2048,
                                          n_fft        = 2048,
                                          window     = hamming,
                                          center      = False))**2
Case 2: When I read the signal using librosa.load () function at sr = 22050 Hz and calculated the STFT.  Now when I plot D, I am getting incorrect frequency distribution which is shifted by twice.
Though it's very strange, I am unable to debug why the shifting is taking place after down sampling. 
Please find the respective plots for your reference. 
CASE2_absPlot_22050hz 1.png
Melspec_Shifted.png
Melspec_correct.png
CASE1_absPlot_48000hz 1.png

Brian McFee

unread,
Oct 12, 2023, 4:43:41 PM10/12/23
to librosa
Your example is skipping several steps, notably the conversion from linear to mel spectra, and then the display code.  In both places, if you forget to pass in the sampling rate, it will fill in the default (22050) and you'll observe this kind of error.

D. S. Parihar

unread,
Oct 13, 2023, 4:55:22 AM10/13/23
to librosa
Hi
Thank you for your response.

Let me share the complete step.

CASE 1: Audio file is loaded at sr = 48000

D = np.abs(librosa.stft(wav_file1,

win_length = 2048,
hop_length = 2048,
n_fft = 2048,
window = hamming,
center = False))**2
    S1 = librosa.feature.melspectrogram(S = D, sr = 48000, n_mels=128, htk = True)
   
    fig, ax = plt.subplots()

    S_dB = librosa.power_to_db(S1, ref=np.max)

    img = librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=48000, ax=ax)

    fig.colorbar(img, ax=ax, format='%+2.0f dB')

    ax.set(title='Mel-frequency spectrogram')

CASE 2: Audio file is loaded at sr = 22050

 D = np.abs(librosa.stft(wav_file1,

  win_length = 2048,
  hop_length = 2048,
  n_fft = 2048,
window = hamming,
center = False))**2
    S1 = librosa.feature.melspectrogram(S = D, sr = 48000, n_mels=128, htk = True)
   
    fig, ax = plt.subplots()

    S_dB = librosa.power_to_db(S1, ref=np.max)

    img = librosa.display.specshow(S_dB, x_axis='time', y_axis='mel', sr=22050, ax=ax)

    fig.colorbar(img, ax=ax, format='%+2.0f dB')

    ax.set(title='Mel-frequency spectrogram')

    I intentionally put sr = 48000 while calculating melspectrogram.  I want to understand how it is shifting the spectrum. Can it change the spectrum if choose different sampling rate? If it does then how?

D. S. Parihar

unread,
Oct 13, 2023, 5:09:06 AM10/13/23
to librosa
I want to add something that the shifting occurs at librosa.stft() which is I guess independent of sampling rate. 

The shifting can be observed itself in the output D and after that we are calculating spectrogram.

D = np.abs(librosa.stft(wav_file1,

  win_length = 2048,
  hop_length = 2048,
  n_fft = 2048,
window = hamming,
center = False))**2

Please find the output plot of D for both the cases for your reference.
CASE2_absPlot_22050hz.png
CASE1_absPlot_48000hz.png

Dan Ellis

unread,
Oct 13, 2023, 8:13:28 AM10/13/23
to D. S. Parihar, librosa
Each bin k in a discrete Fourier transform corresponds to a center frequency center_freq(k) = k / fft_length * sampling_rate.

Because the sampling_rate is different between the two Fourier transforms, the "units" of the bin axis changes.  

In CASE1, each bin represents 48,000 / 2048 = 23.44 Hz.  The plot shows the first peak at k = 9 (I guess), for 211 Hz.

In CASE2, each bin represents 22,050 / 2048 = 10.77 Hz.  The plot shows the first peak at k = 19, for 205.6 Hz.

So, the bin index k changes, but the frequencies referenced are equivalent.

Your plots are labeled inconsistently because you indicate that sr=22050 to specshow, even when the underlying data is based on sr=48000.

  DAn.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/ab36c8c3-44aa-4af9-b2e5-48ac73e1007an%40googlegroups.com.

D. S. Parihar

unread,
Oct 13, 2023, 8:36:27 AM10/13/23
to librosa
Thank you for your response.

I agree that the resolution has changed and accordingly the index point has shifted. Would it mean that down sampling a signal will change the frequency distribution? Or in other way, it will not show the correct time-frequency response in the spectrogram? I am getting frequency shift in the specshow even after correcting its label. There is no frequency shifting in FFT plot whereas it does in spectrogram.
Frequency shift.png
FFT plot.png
Melspec_correct.png

Dan Ellis

unread,
Oct 13, 2023, 2:35:48 PM10/13/23
to D. S. Parihar, librosa
Sorry, I jumped to conclusions about what you were asking.

I think your question arises because the vertical labeling on the CASE2 (Melspec_shifted) spectrogram appears to show the frequency of the ridges as ~400 and 800 Hz, whereas in fact the sinusoids are at 200 and 500 Hz.  I think I can explain this.

A "mel spectrogram" is not a single thing.  The mel frequency axis is a well-defined mapping of linear frequency to a new, nonlinear axis, but to use it on a spectrogram-like display requires some additional parameters such as the number of bins on the frequency axis and the highest frequency shown (and possibly control of the width of each bin).  librosa.feature.melspectrogram constructs these from the sr and n_mels arguments.

But the result (S1 in your code) is just an array with n_mels rows and some number of columns; there's no metadata attached to record that it corresponds to a mel spectrogram, or the details of the mel axis.  In order for librosa.display.specshow to be able to get the labeling right, we have to also provide that function with all that information via y_axis='mel' and sr=22050 (or whatever was used as sr when calculating the mel spectrogram).

In your case 2, you deliberately told melspectrogram that the sample rate of the waveform was 48 kHz, even though it was 22 kHz.  Then, when drawing the spectrogram you told specshow that the mel spectrogram corresponded to 22 kHz data.  That's why the vertical axis doesn't make sense.  If you had told melspectrogram that the sampling rate was 22 kHz, the labeling would have been right.  If you had told specshow that melspectrogram had treated the waveform as if it were sampled at 48 kHz, it would have plotted the axis as if the original signal had been sped up by 48/22.05, i.e. the original 200 Hz tone would appear at 435 Hz.  But by making the sample rate different between melspectrogram and specshow, we get .. something else.

I'm not sure what you're trying to do.  At the very least, maybe you're trying to make sense of "what should I get in this situation?"  It's not simple because of the nonuniformity of the mel projection - not only is it nonlinear, it actually shifts between linear and exponential mapping at a specific break frequency (700 Hz in the HTK version, although it's a soft transition).  That's why it needs to know the sample rate of the data when calculating the mel spectrogram - it needs to place that break frequency correctly.  

(Incidentally, I think that's why the frequency legend is still a bit wrong even when we match the SRs - I think specshow is assuming Slaney mel axis, but you calculated with HTK, and specshow has no way of knowing).

If you work it through, you should find that the image you're seeing makes sense, but I can't think of a scenario where it's something you'd want to do.

Hope this helps,

  DAn.

Devendra Singh Parihar

unread,
Oct 13, 2023, 3:57:53 PM10/13/23
to Dan Ellis, librosa
Hi Dan,

Thank you for the clarification that it is just a display issue because of choice of different sampling rate.
Yes, I am looking something else and this is one of the parts of the analysis.

It would be greatful if you could answer one more thing that the effect of sampling rate on the mel filter bank. 

If I want to compare mel filter bank distribution using sr = 4800000 (large value) and sr= 4800 (smaller value) at n_mel =128. Then how it will impact on frequency?
Which one will better realize the lower frequecy? 
Reply all
Reply to author
Forward
0 new messages