Slicing librosa.load result (audio time series) with util.frame?

J Paul LESCOUZERES

unread,

Apr 2, 2021, 5:58:57 PM4/2/21

to librosa

Hello everyone!

I'm working on an audio visualization project, which goal is to get radial waveforms from a music track.

After first experiments using Web Audio API (with performance issues), I wanted to try another way, and discovered Librosa (and Python also!).

My idea is to load my audio file, then simply slice the audio data array, more or less the same way I was having it achieved with the getByteTimeDomainData method, before displaying it as radial waveforms. My knowledge here is very basic, and I hope someone will be able to enlightme!

So, after librosa.load(), I wanted to use librosa.util.frame() to slice my array into "several" ones... If I'm correct, for my usecase the hop_length and frame_length should be identicals (?). But I cannot understand how to calculate properly this frame_length?

I suppose that it should be related to len(y) (where y is the result from librosa.load()), and maybe also to sample rate... ? But I think I'm missing something.

Like I said, my knowledge here is very basic, so I'm starting this conversation with the hope of having some explanations, maybe some useful articles links, in order to improve my understanding of audio analysis?

I hope I don't take too much advantage of your kindness...And I thank you in advance for your time!

Brian McFee

unread,

Apr 5, 2021, 10:13:34 AM4/5/21

to librosa

frame_length is the number of samples in each "slice"; hop_length is the number of samples between slices. If hop_length is less than frame_length, then successive frames will have some samples in common (i.e., there is overlap). If hop_length equals frame_length, then the buffer is exactly partitioned. If hop_length is greater than frame_length, then some samples will be skipped and not belong to any frame. Typically, hop_length is some constant fraction of frame_length (1/2 or 1/4), but this is not a strict requirement: the two are completely independent variables.

Usually we set the frame length to be independent of the total signal length (len(y)), because that standardizes any subsequent analyses. Librosa uses a default sampling rate of 22050 samples per second, and frame lengths of 2048 samples (or, approximately 93ms worth of audio). If you take a Fourier transform of a frame at these parameters, you'll get the frequency range 0 to 11025 Hz (half the sampling rate) divided evenly into 1025 bands. If you're not doing frequency analysis of your frames, then it might be worth just considering how long of a signal you think is relevant for your purposes and going from there.

J Paul LESCOUZERES

unread,

Apr 22, 2021, 5:23:34 PM4/22/21

to librosa

Hello again!

First, thank you for your answer... and sorry for my late response! This Covid situation here (France) brought some troubles in my diary life...

If I understood you well this frame_length parameter is quite arbitrary, a bit like the "fftSize" used in the Web Audio API (https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/fftSize)? So if I'm not wrong I can choose a value between power(2, 5) and power(2, 15), as far as it is a a power of 2?

Brian McFee

unread,

Apr 23, 2021, 9:34:03 AM4/23/21

to librosa

Yikes, I hope things are okay over there!

To your point: yes, the frame_length parameter is essentially unconstrained. In fact, it doesn't need to be a power of 2; this is usually done for the sake of efficiency (because FFT algorithms are generally fastest when N=2^k), but it will work well at any other length. Just make sure that your hop length is set so that you get the desired overlap between your frames and you shouldn't hit any problems.

J Paul LESCOUZERES

unread,

Apr 23, 2021, 4:24:11 PM4/23/21

to librosa

Great! Thanks a lot for all your answers!

Reply all

Reply to author

Forward