The Sound Understanding team at Google is happy to announce that the trained model used to generate AudioSet embeddings is now available.
This release contains:
- the VGGish model definition in TensorFlow (Slim)
- Python code to compute log mel spectrogram features from waveform
- Python code to post-process the embeddings from the model and apply PCA/quantization
- associated model checkpoint and PCA parameter files
- demo code showing how to use the model in inference and training modes
As mentioned in the README, please use the mailing list for general questions, and use the tensorflow/models issue tracker for specific technical issues (and make sure to @-mention or assign issues to @plakal and @dpwe to get our attention).
We are looking forward to how the community will use VGGish and AudioSet!
Manoj,
on behalf of Sound Understanding @ Google