AudioSet dataset and model updates

Manoj Plakal

unread,

Mar 28, 2018, 9:39:13 PM3/28/18

to audioset-users, Dan Ellis

Hello AudioSet users,

The Sound Understanding team at Google has been working on many improvements to the AudioSet dataset and AudioSet models since our public releases in 2017. We plan to share these updates with the AudioSet user community in the coming months.

Some highlights of what we plan to release:

- A revised version of the AudioSet dataset, including more ratings for more videos, and dropping videos that are no longer available on YouTube.

- "Resnetish": a TensorFlow model with the ResNet-50 architecture, trained on AudioSet, and able to generate both embeddings as well as AudioSet class predictions directly.

- A slimmed-down TensorFlow model, also trained on AudioSet to generate predictions, that uses less compute and memory, and might be better suited for mobile phones or embedded devices (e.g., we have had success with Raspberry Pi devices).

- Embeddings from the Resnetish model for the AudioSet videos, computed using the released feature extraction code, and without any post-processing.

- A library to compute evaluation metrics for classifiers trained on AudioSet, focusing on the standard metrics that we use internally and report in our publications.

We have noted the issues that some users have experienced while trying to reconcile the released VGGish embeddings with the VGGish model, and we're hoping to solve those issues in the new release. In particular, the new released embeddings will be computed using the released model and released feature-extraction code (and not using internal production code) and will not be post-processed. Furthermore, the Resnetish model will let users go directly from features to class scores, which might serve the needs for many users without having to use intermediate embeddings. The standard evaluation metrics library will also eliminate a long-standing source of confusion and bugs when comparing classifiers.

As always, we don't have a firm timeline but we are aiming to release these updates over the next 2-3 months or so.

Manoj

on behalf of Sound Understanding in Google AI Perception

https://research.google.com/teams/perception/

Yi He

unread,

Mar 30, 2018, 4:57:24 AM3/30/18

to audioset-users

Great job!

在 2018年3月29日星期四 UTC+8上午9:39:13，Manoj Plakal写道：

Message has been deleted

mkar...@umich.edu

unread,

May 29, 2018, 5:47:18 PM5/29/18

to audioset-users

Hello!

Thank you for the update! I was working on building a classifier and unfortunately while it had good accuracy on the given eval_embeddings, I was not getting a good result using raw audio converted to embeddings using VGGish and so this is great news! Will the new dataset/embeddings be available soon (in the next month or so)?

Regards,
Karthik M.

momde...@gmail.com

unread,

Jun 14, 2018, 2:27:26 PM6/14/18

to audioset-users

That sounds exciting! While we are waiting for the next release, could you please tell us which tensorflow operation Resnetish will be trained on? I am looking to convert an existing model to iOS(.mlmodel format), and for that tf-coreml library(which converts .pb format to .mlmodel format) only takes in certain operations.

Thanks

On Wednesday, March 28, 2018 at 6:39:13 PM UTC-7, Manoj Plakal wrote:

Deep Chakrabarti

unread,

Jul 13, 2018, 10:42:39 AM7/13/18

to audioset-users

Hi Manoj,

Since it has already been 3 months since this announcement was posted, do you have a planned date for the promised release, or has the work unfortunately been delayed?

Regards.

Manoj Plakal

unread,

Jul 13, 2018, 2:21:32 PM7/13/18

to dee...@gmail.com, audioset-users

Indeed, we are delayed due to a number of other exciting things that consumed our time last quarter. One of which was the launch of a Kaggle challenge for audio event detection, there are still a couple of weeks left for people to try their luck at https://kaggle.com/c/freesound-audio-tagging. We've also been tweaking the dataset itself, which is a laborious process.

We'll post an update to the list when we're ready to release. Some time this summer would be my best guess.

--
You received this message because you are subscribed to the Google Groups "audioset-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-user...@googlegroups.com.
To post to this group, send email to audiose...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/f5ffe7d6-3ee6-4e0c-b1b8-7567fd5388d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arunima Pathania

unread,

Aug 5, 2018, 5:01:32 PM8/5/18

to audioset-users

Hi ,

With regards to the the embeddings released using the vggish-inference-demo I wanted to ask you about the method of processing the embeddings to a neural network. I process a 3 second wav file and I get 2 arrays with 128 elements (the image of one such is attached). I havent been able to find enough documentation to know how to process this to train my model. Your help in this matter would be highly appreciated.

Thank you.

array.PNG

Logan Ford

unread,

Oct 9, 2018, 3:04:56 PM10/9/18

to audioset-users

Any update here on timeline?

On Wednesday, March 28, 2018 at 9:39:13 PM UTC-4, Manoj Plakal wrote:

Punit Agrawal

unread,

Jan 10, 2019, 6:32:04 AM1/10/19

to audioset-users

Hey Manoj,

Any update on the timeline?

Thank you.

Manoj Plakal

unread,

Jan 10, 2019, 12:15:33 PM1/10/19

to Punit Agrawal, audioset-users

Apologies for the lack of updates.

We grossly underestimated the time it would take to make a proper update. Getting approval to release a classifier that predicts human-readable classes (in addition to opaque embeddings) is trickier for various corporate reasons. Happily, we're almost at the end of that process.

We are working right now on an updated AudioSet model and associated machinery and data. I'm hesitant to give you a date :) but it will be soon.

Happy New Year!

--
You received this message because you are subscribed to the Google Groups "audioset-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-user...@googlegroups.com.
To post to this group, send email to audiose...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/ea3ed1df-ce29-4144-953f-75d14ccbd24d%40googlegroups.com.

NumesSanguis

unread,

Feb 19, 2019, 11:52:40 PM2/19/19

to audioset-users

Just wondering if there is any update to the release schedule of getting the audio classifier released :)

Ralf Meermeier

unread,

Mar 22, 2019, 10:14:16 AM3/22/19

to audioset-users

Adding my voice to the fray, any idea about the release of this fixed dataset? I am struggling with the discrepancies between the current VGGish model's output and the feature vectors of the published train/eval set.

On Wednesday, March 28, 2018 at 9:39:13 PM UTC-4, Manoj Plakal wrote:

Deep Chakraborty

unread,

Apr 24, 2019, 1:51:29 AM4/24/19

to audioset-users

Hi Manoj,

I understand you guys are working really hard on releasing the updated model and data, possibly battling corporate policies. But for researchers like us who have been waiting patiently almost a year to use the updated model to make our projects work, it is really frustrating to wait any longer. Unfortunately, due to lack of resources which prevent us from training on such a large corpus and fix the discrepancies, all we can do is wait. Therefore, for the sake of the spirit of open source, we plead you to provide us some kind of a definite timeline for the release, and whether it is at all going to happen or not. On behalf of all audioset users, I hope you'll heed our outcry.

Thank you,

Deep

On Thursday, January 10, 2019 at 12:15:33 PM UTC-5, Manoj Plakal wrote:

Apologies for the lack of updates.

We grossly underestimated the time it would take to make a proper update. Getting approval to release a classifier that predicts human-readable classes (in addition to opaque embeddings) is trickier for various corporate reasons. Happily, we're almost at the end of that process.

We are working right now on an updated AudioSet model and associated machinery and data. I'm hesitant to give you a date :) but it will be soon.

Happy New Year!

On Thu, Jan 10, 2019 at 6:32 AM Punit Agrawal <punit....@krithisystems.com> wrote:

Hey Manoj,

Any update on the timeline?

Thank you.

--
You received this message because you are subscribed to the Google Groups "audioset-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to audiose...@googlegroups.com.

Keunhong Park

unread,

May 8, 2019, 5:36:21 AM5/8/19

to audioset-users

Hi,

Do you still have plans to release the Resnet-ish model?

Thanks,

Keunhong

On Wednesday, March 28, 2018 at 6:39:13 PM UTC-7, Manoj Plakal wrote:

Meet Shah

unread,

Sep 25, 2019, 8:19:14 AM9/25/19

to audioset-users

Hi Manoj,

Any update on the Resnetish model you guys were planning to release?

Reply all

Reply to author

Forward