Yamnet class prediction score

479 views

Skip to first unread message

Enrico G

unread,

May 13, 2021, 5:51:50 AM5/13/21

to audioset-users

Hello,

I'm playing with the YAMNET dataset, to classify small audio.

How should I interpret the scores of the aforementioned classes?
If for example, extracted the classes on 1-second audio, I get a prediction for the first 3 classes:

- Speech (0.64)

- Cat (0.10)

- Inside, small room (0.02)

Does it mean that I have a probability of 64% for the Speech class to be in this 1-second audio, or is it the predominant sound?

Manoj Plakal

unread,

May 13, 2021, 1:36:01 PM5/13/21

to Enrico G, audioset-users, Dan Ellis

On Thu, May 13, 2021 at 5:51 AM Enrico G <federi...@gmail.com> wrote:

How should I interpret the scores of the aforementioned classes?
If for example, extracted the classes on 1-second audio, I get a prediction for the first 3 classes:
- Speech (0.64)
- Cat (0.10)
- Inside, small room (0.02)

Does it mean that I have a probability of 64% for the Speech class to be in this 1-second audio, or is it the predominant sound?

YAMNet has not been calibrated so you can't interpret the raw scores from YAMNet as probabilities. Comparing the scores across classes is also not guaranteed to make sense because each class has an independent logistic classifier and we train in a multi-class multi-label setting, and so different classes could easily use different score ranges. Furthermore, YAMNet was trained entirely on YouTube so there might be a domain mismatch if you run it on non-YouTube data.

If you want to use YAMNet for a particular application and you want the outputs to be interpretable, it would be best to run some kind of calibration or fine tuning or even transfer learning:

- calibration: run the model on a few representative clips with known ground truth labels, and use the scores to determine thresholds and ranges you can use for making predictions

- if you have enough data, you could fine-tune YAMNet or you could try transfer learning as described here: https://www.tensorflow.org/tutorials/audio/transfer_learning_audio