Is it possible to use librosa to detect audio out of sync?

207 views
Skip to first unread message

tli 2020

unread,
Oct 10, 2020, 10:42:40 PM10/10/20
to librosa
Hi all, I'm doing some research on detect audio out of sync recently, the task sounds like:

There's a video, and I will use some Android devices(with some lib) to transcode the video, it seems that in some of the specific devices, the audio after transcode would be out of sync with the lib.For example, the audio after transcoded is about 0.5 second slower than the standard audio.

After Google, I found the tutorial Music Synchronization with Dynamic Time Warping (https://librosa.org/doc/latest/auto_examples/plot_music_sync.html#sphx-glr-download-auto-examples-plot-music-sync-py), but it seems that's not applying to my specific case.

I was wondering maybe if we could use librosa to detect that kind of out of sync error?

Brian McFee

unread,
Feb 27, 2021, 10:36:04 AM2/27/21
to librosa
This is hard to answer without a bit more detail regarding your specific setup.  Is the problem to detect a delay between the original and transcoded audio signals?  Or do you also need to detect the delay between the video and audio as well?

If the first part, you could definitely do that with librosa in a number of ways.  DTW is probably overkill, since I wouldn't expect the delay to change significantly aside from a global offset in time.  (Eg, starts 0.5 seconds later, but afterwards looks more or less identical.)  The simplest thing to do here is measure the cross-correlation between signals and find the sample offset corresponding to the peak.

If you need to synchronize against video, that's a whole other can of worms, and I'm not sure librosa can do much for you here.

Dan Ellis

unread,
Feb 27, 2021, 10:49:29 AM2/27/21
to Brian McFee, librosa
Detecting sync between audio and video is a good candidate for a machine learning approach, since it's very easy to generate unlimited training data:  Take some architecture that accepts both audio and video inputs like, say, SoundNet.  Train a top layer to look at the audio and video embeddings and classify them as in sync/not in sync.  Starting with known in-sync videos, deliberately shift the audio to make not-in-sync examples, and train away.

The relevance of librosa is pretty small, though.

  DAn.

--
You received this message because you are subscribed to the Google Groups "librosa" group.
To unsubscribe from this group and stop receiving emails from it, send an email to librosa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/librosa/cb39beb6-0de8-416a-a966-e3c1930c6f43n%40googlegroups.com.

tli 2020

unread,
Feb 27, 2021, 9:17:27 PM2/27/21
to librosa
Hi Brian,

Is the problem to detect a delay between the original and transcoded audio signals? ==> Yup!

The case is, we have dozens of Android devices in our lab, and we need to test the transcoding lib to make sure the lib works well in all the devices.(We actually found in some devices, the audio would out of sync, like 0.5 seconds delay.)

I just found some other discussions relate this topic, 


Awesome thanks:)
Reply all
Reply to author
Forward
Message has been deleted
0 new messages