Questions related to librosa.feature.melspectrogram function

298 views
Skip to first unread message

Hongbo Chen

unread,
Mar 28, 2018, 2:59:31 AM3/28/18
to librosa
Hello, I'm now using  librosa.feature.melspectrogram to generate the spectrogram of wav file. But there are some problems:

1. What's the window of the function? It seems that you don't mention the type of window in the document page: http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html, such as Hamming window. etc. 

2. What's the scale division of "Hz" axis? Or it can be change through the parameters of the function? Sorry for my poor knowledge of STFT, if you can tell me how to calculate it I will be very appreciate!

3. This function returns S, a numpy array, and what's the meaning of this matrix?
In your example:
array([[  2.891e-07,   2.548e-03, ...,   8.116e-09,   5.633e-09],
       [  1.986e-07,   1.162e-02, ...,   9.332e-08,   6.716e-09],
       ...,
       [  3.668e-09,   2.029e-08, ...,   3.208e-09,   2.864e-09],
       [  2.561e-10,   2.096e-09, ...,   7.543e-10,   6.101e-10]])
is the elements like : [  2.891e-07,   2.548e-03, ...,   8.116e-09,   5.633e-09], [  1.986e-07,   1.162e-02, ...,   9.332e-08,   6.716e-09] 
refers to the amplitudes of different frequency in different and each list element represent a series of amplitude value in different time?

 I want to use this matrix to train a neural network but I don't know the meaning of this matrix.

Thanks a lot!

Brian McFee

unread,
Mar 29, 2018, 8:16:48 AM3/29/18
to librosa


On Wednesday, March 28, 2018 at 2:59:31 AM UTC-4, Hongbo Chen wrote:
Hello, I'm now using  librosa.feature.melspectrogram to generate the spectrogram of wav file. But there are some problems:

1. What's the window of the function? It seems that you don't mention the type of window in the document page: http://librosa.github.io/librosa/generated/librosa.feature.melspectrogram.html, such as Hamming window. etc. 


All of the spectral features use librosa.stft with default parameters (hann window, etc).  If you want to do something different, you should compute the stft directly, and then pass it as a parameter to the feature extraction function.
 
2. What's the scale division of "Hz" axis? Or it can be change through the parameters of the function? Sorry for my poor knowledge of STFT, if you can tell me how to calculate it I will be very appreciate!


For mel spectra, you can get the bin center frequencies by calling librosa.mel_frequencies.  Similar functions exist for other representations (fft, cqt, etc).
 
3. This function returns S, a numpy array, and what's the meaning of this matrix?
In your example:
array([[  2.891e-07,   2.548e-03, ...,   8.116e-09,   5.633e-09],
       [  1.986e-07,   1.162e-02, ...,   9.332e-08,   6.716e-09],
       ...,
       [  3.668e-09,   2.029e-08, ...,   3.208e-09,   2.864e-09],
       [  2.561e-10,   2.096e-09, ...,   7.543e-10,   6.101e-10]])
is the elements like : [  2.891e-07,   2.548e-03, ...,   8.116e-09,   5.633e-09], [  1.986e-07,   1.162e-02, ...,   9.332e-08,   6.716e-09] 
refers to the amplitudes of different frequency in different and each list element represent a series of amplitude value in different time?


Each column is the mel spectrum for a frame (time slice), and each row is a frequency channel (by default, there should be 128 of them).  The value at S[f, t] is the power at frequency f, time (frame) t.


 
 I want to use this matrix to train a neural network but I don't know the meaning of this matrix.

Thanks a lot!

I hope that helps! 

Hongbo Chen

unread,
Mar 29, 2018, 8:24:27 AM3/29/18
to librosa
Thanks a lot! It helps! 
Reply all
Reply to author
Forward
0 new messages