OpenL3: Competitive deep audio embeddings trained on AudioSet (+ image & video embeddings)

Skip to first unread message

Justin Salamon

unread,

Mar 19, 2020, 8:26:57 PM3/19/20

to audiose...@googlegroups.com, Jason Cramer

Hello AudioSet users!

Given the interest on this list in audio embeddings trained on AudioSet (e.g. VGGish), we thought you might be interested in OpenL3, an open-source deep audio embedding model trained on AudioSet.

OpenL3 is an improved version of the self-supervised L3-Net, and outperforms VGGish and SoundNet (and the original L3-Net) on several sound recognition tasks.

We're excited to announce the release of version 0.3.1 of OpenL3: In this latest version, we have added functionality for extracting image embeddings, processing video files, and batch processing. OpenL3 is open source and readily available for everyone to use: if you have TensorFlow installed just run pip install openl3 and you're good to go.

Full details are provided in the following paper:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
J. Cramer, H.-H. Wu, J. Salamon, and J. P. Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp 3852-3856, Brighton, UK, May 2019.

We're excited to see what the community does with OpenL3, and of course if you have any feedback please don't hesitate to reach out.

Cheers!
Justin, on behalf of the OpenL3 team: Jason Cramer, Ho-Hsiang Wu, Justin Salamon and Juan Pablo Bello.