Fwd: VGGish model released

782 views
Skip to first unread message

Manoj Plakal

unread,
Nov 29, 2017, 12:03:52 PM11/29/17
to dcase-di...@googlegroups.com, Dan Ellis

Hello DCASE community,

We, the Sound Understanding team at Google, released a trained audio event classification embedding model earlier this year to accompany our previous release of AudioSet. We announced it on the AudioSet users mailing list  but were recently informed that this news may not have reached many people in the audio community.

I am forwarding the original announcement to make everyone aware of what we released. Please forward to other lists as appropriate to spread the word.

If you have AudioSet-related questions, please post and discuss at https://groups.google.com/group/audioset-users.

We look forward to how everyone will use VGGish and AudioSet!

Manoj,
on behalf of Sound Understanding @ Google


---------- Forwarded message ----------
From: Manoj Plakal <pla...@google.com>
Date: Tue, Aug 8, 2017 at 5:25 PM
Subject: VGGish model released
To: audioset-users <audiose...@googlegroups.com>

The Sound Understanding team at Google is happy to announce that the trained model used to generate AudioSet embeddings is now available.

The model, which we call "VGGish", is available at https://github.com/tensorflow/models/tree/master/research/audioset

This release contains:
- the VGGish model definition in TensorFlow (Slim)
- Python code to compute log mel spectrogram features from waveform
- Python code to post-process the embeddings from the model and apply PCA/quantization
- associated model checkpoint and PCA parameter files
- demo code showing how to use the model in inference and training modes

As mentioned in the README, please use the mailing list for general questions, and use the tensorflow/models issue tracker for specific technical issues (and make sure to @-mention or assign issues to @plakal and @dpwe to get our attention).

We are looking forward to how the community will use VGGish and AudioSet! 

Manoj,
on behalf of Sound Understanding @ Google

ehu

unread,
Nov 30, 2017, 11:11:30 AM11/30/17
to DCASE Discussions
Hello Manoj,

Thank you for your message. 
At  the DCASE 2017 workshop, Shawn Hershey did a very interesting keynote talk about Audioset, but it's true that he didn't mention the availability of your pretrained VGGish model.
What surprises me, is that he showed that the ResNet architecture gave the best results on Audioset ( better than VGG), so i would have expected you to share a pretrained ResNet instead of VGG.
Any comment on this ? 
Anyway , the VGGish model is already a GREAT contribution. 

Best
Eric

Manoj Plakal

unread,
Dec 1, 2017, 2:23:24 PM12/1/17
to ehu, DCASE Discussions, Dan Ellis

Hi Eric,

Thanks for your support!

I need to double-check with Shawn about the exact results he used in his talk. I'm guessing that among other things, he showed you results from our ICASSP paper earlier this year (https://arxiv.org/pdf/1609.09430.pdf, page 3, Table 2). Those results are from evaluation of various models on an internal dataset of 100M YouTube videos (see the paper for details). Unfortunately, we do not have approval to release the ResNet-like model that was trained on this large private dataset. 

The VGGish model that we released was trained earlier on a dataset that was a draft version of YouTube-8M https://research.google.com/youtube8m/ (released by our friends, the Video Content Analysis team at Google), so it was easier to get approval for releasing a model trained on a dataset that was very similar to a dataset that had already been released.

That being said, we have been working on training a variety of models using just AudioSet, and are planning on releasing more models and supporting code. We can't make any promises yet, because we still need to get approval before we release anything, but we will update audioset-users@ as well as this list as soon as we have something to share.

Manoj



--
You received this message because you are subscribed to the Google Groups "DCASE Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dcase-discussions+unsubscribe@googlegroups.com.
To post to this group, send email to dcase-discussions@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/dcase-discussions/14732b8a-2504-41e2-b5ca-e13a0d6ff198%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Tushar Poddar

unread,
Feb 5, 2020, 5:56:28 PM2/5/20
to DCASE Discussions
Hi, 
I am working with the VGGish model in my research but due to some reason I cannot understand the use of variables 

_MEL_BREAK_FREQUENCY_HERTZ = 700.0
_MEL_HIGH_FREQUENCY_Q = 1127.0

These variables are used in the mel_features.py to convert the frequencies from hertz to mel. But i am unsure as to why we chose these two specific numbers. 

Thanks and regards 
Tushar Poddar

Jose Giraldo

unread,
Feb 10, 2020, 4:49:47 PM2/10/20
to Tushar Poddar, DCASE Discussions
Dear Tushar

These variables come from the definition of the mel scale itself.

The following excerpt from Richard Lyon's book would clarify it.

imagen.png

Best regards,

Jose Giraldo




--
You received this message because you are subscribed to the Google Groups "DCASE Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dcase-discussi...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/dcase-discussions/ce201d5e-fbed-4b69-9092-7aa498caeecd%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages