Extracting audio segments based on detected onsets using libROSA

887 views
Skip to first unread message

Zergoogelt

unread,
Jul 14, 2021, 3:47:00 AM7/14/21
to librosa
I have a couple of .wav audio files with very similar percussive signals whose onset times I can identify using libROSA's onset detection. I would now like to extract the associated audio segments from the files using the onset times. Here is what I have done so far:

import librosa
import matplotlib.pyplot as plt
import numpy as np

x, sr = librosa.load("C:/.../test.wav")
onset_frames = librosa.onset.onset_detect(x, sr=sr, wait=1, pre_avg=1, post_avg=1, pre_max=1, 
post_max=1)
print(onset_frames) # frame numbers of estimated onsets

onset_times = librosa.frames_to_time(onset_frames)

o_env = librosa.onset.onset_strength(x, sr=sr)
times = librosa.frames_to_time(np.arange(len(o_env)), sr=sr)
onset_frames = librosa.util.peak_pick(o_env, 10, 10, 10, 10, 2, 60)

D = np.abs(librosa.stft(x))
plt.figure(figsize=(15,10))
ax1 = plt.subplot(2, 1, 1)
librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
x_axis='time', y_axis='log')
plt.title('Power spectrogram')
plt.subplot(2, 1, 2, sharex=ax1)
plt.plot(times, o_env, label='Onset strength')
plt.vlines(times[onset_frames], 0, o_env.max(), color='r', alpha=0.9, linestyle='--', 
label='Onsets')
plt.axis('tight')
plt.legend(frameon=True, framealpha=0.75)
plt.show()
print(onset_times)


Message has been deleted

Brian McFee

unread,
Jul 14, 2021, 8:31:35 AM7/14/21
to librosa
Please disregard the previous response to this posting.  We appear to have spammers impersonating the developers now. 

Brian McFee

unread,
Jul 15, 2021, 9:31:46 AM7/15/21
to librosa
Back to your original question, probably the easiest thing to do is convert the `onset_frames` array to samples (using librosa.frames_to_samples), and then using those indices to slice the audio buffer: x[onset_samples[0]:onset_samples[1]] or something like.

If you want to get a little fancier, you might want to use the onset backtracker https://librosa.org/doc/latest/generated/librosa.onset.onset_backtrack.html to roll back detection frames to just before the detection point.  This is helpful for segmentation because onset detection is designed to catch peaks (end of attack, beginning of decay), so slicing at that point may cut off the beginning of the attack.  This might not matter too much for percussive instruments, but it's worth looking into.  (I believe Tristan Jehan's dissertation work did something like this for note segmentation, but I'm a bit hazy on the details at this point.)

If you want to get even more precise, you could use the time-frequency reassigned spectrogram https://librosa.org/doc/latest/generated/librosa.reassigned_spectrogram.html to locate the temporal center of mass of each frame, rather than using frames_to_samples which will just return the center sample index of the frame in question (or first sample, if using non-centered frames).  The change here will likely be quite miniscule, and may only work well if the recordings are monophonic, but it's also worth looking into.

Zergoogelt

unread,
Jul 20, 2021, 1:10:58 PM7/20/21
to librosa
Thanks a lot for your help. I'm currently missing the method of how to save the onset-associated audio segments as wav files. Since the sounds are very similar, it would be sufficient if I could save a predefined window, let's say 0.1s, that starts -0.02s before onset. Unfortunately, I'm not experienced enough to do that. Do you have an idea where to start?

Brian McFee

unread,
Jul 20, 2021, 1:32:43 PM7/20/21
to librosa
For saving, we generally recommend pysoundfile.  There are some examples in the documentation: https://librosa.org/doc/latest/ioformats.html#write-out-audio-files

For backtracking, yeah, you could always subtract off a fixed offset.  I expect that onset_backtrack will be a bit more usable though, as it already handles the boundary cases for you, eg, if the subtracted offset puts you before the beginning of the array.

Zergoogelt

unread,
Jul 22, 2021, 10:39:50 AM7/22/21
to librosa
Thank you, I might have to try both methods and compare the outcomes. Since I'm still learning: Is there an example you could provide?

Zergoogelt

unread,
Aug 4, 2021, 4:41:59 AM8/4/21
to librosa
Is there anyone who could help me with this task? Any help much appreciated.

Zergoogelt

unread,
Aug 6, 2021, 11:03:12 AM8/6/21
to librosa
I posted my question at Stackoverflow. Thanks again for your suggestions.

Zergoogelt

unread,
Aug 6, 2021, 11:03:24 AM8/6/21
to librosa
Reply all
Reply to author
Forward
0 new messages