question regarding frame duration on which mfcc's are generated by librosa

901 views
Skip to first unread message

Vivek Mangipudi

unread,
Jan 25, 2018, 1:13:49 AM1/25/18
to librosa
Thank you for the amazing package!!


So, 

when I do something like : 
import librosa
y, sr = librosa.load('./data/mytestfile.wav')
print y
print sr
mfcc=librosa.feature.mfcc(y=y, sr=sr)
print mfcc.shape
OUTPUT:

[ -4.16746450e-04 -5.46636584e-04 -4.89326369e-04 ..., 9.71163390e-05 1.07619283e-03 0.00000000e+00]

22050

(20, 87)





My understanding is (20,87) represents 87 time frames ,  with each frame having 20 mfcc values.  

like this : 


Q1. in other words we have 87 coulmns and each column corresponds to a fixed time frame along with 20 mfcc's calculated for that time frame Is that correct?


Q2. If what is the duration of the time frame for which mfcc is calculated? and are these overlapping frames?


Q3. How do I change the duration for which mfcc are calculated. 


my approach :

https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html#librosa.feature.mfcc   and https://librosa.github.io/librosa/glossary.html#term-frame 

based on ^  two

i roughly estimated the frame duration as (hop_length/sr)

so that would be 512/22050 ~= 23ms = frame size or duration



Motivation for above questions is my previous question : https://groups.google.com/forum/#!topic/librosa/-057PPKcnW8 concerning combining mfcc vectors with a label from an annotated file. 


The rationale is : if each column has mfcc's corresponding to a particular time frame, may be I can simply append the label depending on whether a particular frame falls with the boundaries of a particular label or not. 


Thank you. 





Dan Ellis

unread,
Jan 25, 2018, 8:11:22 AM1/25/18
to Vivek Mangipudi, librosa
You're correct, it's 87 time frames each of 20 MFCCs.  librosa.feature.mfcc has two arguments (which actually pass through to the underlying stft).  win_length is the number of samples included in each time frame; it defaults to 2048, or ~93ms at 22 kHz SR.  hop_length is the number of samples between successive windows; its default is 512, or the 23 ms you calculated.

  DAn.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/b3f92e7d-6b72-43e6-8aa0-78b2e95fde7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages