Detecting sound above a certain threshold

128 views
Skip to first unread message

Marius Flage

unread,
Aug 26, 2020, 3:02:02 AM8/26/20
to librosa
Hi!

I have a task which sounds a bit trivial, but I'm really struggling getting this to work the way I want it to. I have hours of audio material that I want to detect "motion" in. Basically the track is more or less quiet for a length of time (there's a lot of audio distortion here, but this generally stay under a certain threshold), but then I can suddenly have activity and I want to flag when this happens in the file.

Since this will be analyzed on individual files each containing 24 hours of data, I need to use the stream interface to be able to read this in. But I have no idea how to then interpret the data and which functions to use to get this output. So far I have the following:

#!/usr/bin/env python3

import librosa
import sys

MAX_ITER = 200

count = 0
position = float(0)

if len(sys.argv) < 2:
sys.exit("Please provide a filename")

filename = sys.argv[1]

sr = librosa.get_samplerate(filename)

stream = librosa.stream(
filename,
block_length=256,
frame_length=4096,
hop_length=1024)

for block in stream:
seconds = len(block)/sr

# i'm just testing, so i don't want to parse the full 24 hours to begin with
if count > MAX_ITER:
break
peaks = librosa.effects.split(y=block, top_db=20)
print(peaks)

count += 1

position += seconds

print(position)

Is this the correct way of approaching this or should I be doing this some other way?

--
Marius

Brian McFee

unread,
Feb 27, 2021, 10:41:11 AM2/27/21
to librosa
I think your basic setup is correct (streaming in blocks), but it would be helpful to have a better handle on what kind of events you're hoping to detect in the audio stream.

If the events in question have somewhat abrupt onsets, you might do well to look at PCEN, and particularly the streaming example we have in the gallery: https://librosa.org/doc/latest/auto_examples/plot_pcen_stream.html#sphx-glr-auto-examples-plot-pcen-stream-py .  Without getting into the details of the method, it can essentially act like an onset detector that adapts to slow / non-stationary background noise, and it works well for things like wake-word spotting.  In the example notebook I linked, the last step we take is to aggregate the response over frequency bands, which produces a curve that spikes whenever a new event occurs.  It's not a perfect onset detector by any means, but it might be a good starting point for you.
Reply all
Reply to author
Forward
0 new messages