Hi!
I have a task which sounds a bit trivial, but I'm really struggling getting this to work the way I want it to. I have hours of audio material that I want to detect "motion" in. Basically the track is more or less quiet for a length of time (there's a lot of audio distortion here, but this generally stay under a certain threshold), but then I can suddenly have activity and I want to flag when this happens in the file.
Since this will be analyzed on individual files each containing 24 hours of data, I need to use the stream interface to be able to read this in. But I have no idea how to then interpret the data and which functions to use to get this output. So far I have the following:
#!/usr/bin/env python3
import librosa
import sys
MAX_ITER = 200
count = 0
position = float(0)
if len(sys.argv) < 2:
sys.exit("Please provide a filename")
filename = sys.argv[1]
sr = librosa.get_samplerate(filename)
stream = librosa.stream(
filename,
block_length=256,
frame_length=4096,
hop_length=1024)
for block in stream:
seconds = len(block)/sr
# i'm just testing, so i don't want to parse the full 24 hours to begin with
if count > MAX_ITER:
break
peaks = librosa.effects.split(y=block, top_db=20)
print(peaks)
count += 1
position += seconds
print(position)
Is this the correct way of approaching this or should I be doing this some other way?
--
Marius