Is the number of frames correct?

1,418 views
Skip to first unread message

Carlton Banks

unread,
Apr 24, 2017, 1:35:00 PM4/24/17
to librosa
I am currently computing melspectograms given these options: 


[y,sr]  = librosa.core.load(audio_path,sr=16000)

 

print y.shape # output: (76186,)

print sr # output: 16000

specto
= librosa.feature.melspectrogram(y, sr=sr, n_fft=400, hop_length=160, n_mels=40)
print specto.shape  #output: (40,477)



But does the spectoshape make sense?


I would expect given the number of samples, n_fft, and the hop length - the number of frames should be computable as such



y = n_fft*n - hop_length(n-1)



for n frames is there always be n-1 overlaps, which the formula is build upon.


Given the data i have: 


In[11]:= Solve[76186 == 400*n - 160*(n - 1), n] // N
 
Out[11]= {{n -> 316.775}}


I am clearly doing something wrong here? 


I expected the framing would follow this convention ..




So what am I doing wrong?

Brian McFee

unread,
Apr 24, 2017, 2:00:09 PM4/24/17
to librosa
Yes, the number of frames is correct.  Refer to this thread: https://github.com/librosa/librosa/issues/530

Carlton Banks

unread,
Apr 24, 2017, 2:37:15 PM4/24/17
to librosa
So this should do it?

[y,sr]  = librosa.core.load(audio_path,sr=16000)

print y.shape

print sr

stft_spect
= librosa.core.stft(y, n_fft=400, hop_length=160,center=False)

specto
= librosa.feature.melspectrogram(stft_spect, sr=sr, n_fft=400, hop_length=160, n_mels=40)


I am getting error message: 


Traceback (most recent call last):
  File "specto.py", line 13, in <module>
    specto = librosa.feature.melspectrogram(stft_spect, sr=sr, n_fft=400, hop_length=160, n_mels=40)
  File "/usr/local/lib/python2.7/site-packages/librosa/feature/spectral.py", line 1368, in melspectrogram
    power=power)
  File "/usr/local/lib/python2.7/site-packages/librosa/core/spectrum.py", line 1148, in _spectrogram
    S = np.abs(stft(y, n_fft=n_fft, hop_length=hop_length))**power
  File "/usr/local/lib/python2.7/site-packages/librosa/core/spectrum.py", line 154, in stft
    util.valid_audio(y)
  File "/usr/local/lib/python2.7/site-packages/librosa/util/utils.py", line 151, in valid_audio
    'ndim={:d}, shape={}'.format(y.ndim, y.shape))
librosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(1025, 149)

Dan Ellis

unread,
Apr 24, 2017, 2:47:21 PM4/24/17
to Carlton Banks, librosa
To pass in a spectrogram to librosa.feature.melspectrogram use named arg S, e.g.

specto = librosa.feature.melspectrogram(S=stft_spect, sr=sr, n_fft=400, hop_length=160, n_mels=40)


--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/9fd24ff3-aa23-4f67-9fe9-838773a4ffb1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Carlton Banks

unread,
Apr 24, 2017, 2:49:57 PM4/24/17
to librosa, nof...@gmail.com, dp...@ee.columbia.edu
Thanks. 
I am still a bit confused on why my expression isn't correct?.. What is wrong with the way i computing it?... 
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.

Dan Ellis

unread,
Apr 24, 2017, 3:07:03 PM4/24/17
to Carlton Banks, librosa

I would expect given the number of samples, n_fft, and the hop length - the number of frames should be computable as such



y = n_fft*n - hop_length(n-1)



I think you're confusing hop_length and overlap_length.  The total length of n frames would be

n*window_length  - (n-1)*overlap_length

.. to back out the n-1 areas of overlap being double counted.

hop_legth = window_length - overlap_length, so

n*window_length - (n-1)*(window_length - hop_length) = window_length + (n-1)*hop_length

which is the "usual" formula.

  DAn.

 


for n frames is there always be n-1 overlaps, which the formula is build upon.


Given the data i have: 


In[11]:= Solve[76186 == 400*n - 160*(n - 1), n] // N
 
Out[11]= {{n -> 316.775}}


I am clearly doing something wrong here? 


I expected the framing would follow this convention ..




So what am I doing wrong?

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/9fd24ff3-aa23-4f67-9fe9-838773a4ffb1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/62270838-5b7e-417b-b382-543f3eab22c6%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages