audioset labels for drinking or for washing machine noises for ESC-50?

160 views

Skip to first unread message

jenelle...@gmail.com

unread,

Aug 19, 2021, 4:41:12 PM8/19/21

to audioset-users

Hi everyone,

Thanks for all of the work that went into AudioSet! We have found it very useful in our work.

We are currently working on a project that is using parts of the ESC-50 dataset (https://github.com/karolpiczak/ESC-50). We have some models trained with AudioSet labels and were hoping to be able to map the AudioSet labels onto the ESC-50 classes. However we were a bit surprised that a few classes didn't have obvious AudioSet labels associated with them, particularly:

* Drinking, sipping
* Washing machine
* Mouse clicking

Are there labels that we just haven't found which are reasonably associated with these sounds, and if so, could anyone point us to them? Or are they actually missing from the ontology?

Alternatively, if anyone has previously mapped the AudioSet labels onto ESC-50 and would be willing to share a pointer to the paper etc we would love to take a look at it!

Thanks!
Jenelle Feather
PhD Candidate
McDermott Lab, MIT Brain and Cognitive Sciences

Dan Ellis

unread,

Aug 20, 2021, 4:26:19 PM8/20/21

to jenelle...@gmail.com, audioset-users

Hi Jenelle -

we were a bit surprised that a few [ESC-50] classes didn't have obvious AudioSet labels associated with them, particularly:

* Drinking, sipping
* Washing machine
* Mouse clicking

Are there labels that we just haven't found which are reasonably associated with these sounds, and if so, could anyone point us to them? Or are they actually missing from the ontology?

Writing here as the original author of the AudioSet ontology ...

As you might notice, the version of the ontology published on Github hasn't been updated since its initial release in 2017. Our internal version has gone through a large number of small changes. These include adding "Slurping" (/m/07pqmly) (though not "Sipping") and "Washing machine" (/m/0174k2).

"Clicking" (/m/07qc9xj) is already present, but we don't have a subclass for mouse clicking.

Having these within the ontology is not the same as having adequate examples for them, of course (or including them in a published classifier).

If you want to identify which existing AudioSet or YAMNet classes best correspond, one (slightly circular) thing to do is to simply see what the classifier reports for examples of the new classes. I'm getting "Chewing" for drinking, a lot of "Liquid" sounds for washing machines (and some "Train" - depends on what the machine is doing, I guess), and "Computer keyboard" for mouse clicks. (These are for random samples pulled from the internet, not specifically the ESC-50 sounds, which I note are non-commercial licensed).

I admit that I'm unclear about the best role for the ontology on GitHub. I guess I meant the whole thing as a proposal, and by putting it on GitHub, I meant to indicate that we're receptive to other input about how it should be. However, in practice we've now diverged internally, and a separate evolution externally isn't terribly appealing. And, to be honest, I'm less convinced that striving for a single, universal audio event ontology is an achievable goal. My experience is that even with the classes we defined, there are almost always application-specific wrinkles that undermine the appearance of authority.

Happy to discuss further, though.

To your point, I'm not aware of an existing mapping between ESC-50 classes and AudioSet MIDs, but it seems like a nice idea. You might want to share whatever you end up with.

Rather than directly using AudioSet classifier outputs to detect ESC-50 classes, the more common style of work, I believe, is to train some kind of embedding layer on audioset data (or take the embedding from an existing classifier), then evaluate on ESC-50 by using some of the data to train a final classification layer (and evaluate on the rest).

Best,

DAn.

Alternatively, if anyone has previously mapped the AudioSet labels onto ESC-50 and would be willing to share a pointer to the paper etc we would love to take a look at it!

Thanks!
Jenelle Feather
PhD Candidate
McDermott Lab, MIT Brain and Cognitive Sciences

--
You received this message because you are subscribed to the Google Groups "audioset-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to audioset-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/audioset-users/b4344745-0d72-42c7-9c90-7e9ee6bc9084n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages