AudioSet embeddings range is [-128, +127], vggish_postprocess.py range is [0, 255]

62 views

Skip to first unread message

Robert Mcanany

unread,

Oct 16, 2021, 4:30:39 PM10/16/21

to audioset-users

I'm probably doing something wrong, but I can't see it.

I extracted the embeddings from the TFRecords following this code example:

https://colab.research.google.com/drive/1BeORlWolTKw3noASvW94OXXqcQZ8PZEQ

Spot checking, I saw the range of the was [-128, +127]

In vggish_postprocess.py, I think this code ensures the range is [0, 255]

    # Quantize by:
    # - clipping to [min, max] range
    clipped_embeddings = np.clip(
        pca_applied, vggish_params.QUANTIZE_MIN_VAL,
        vggish_params.QUANTIZE_MAX_VAL)
    # - convert to 8-bit in range [0.0, 255.0]
    quantized_embeddings = (
        (clipped_embeddings - vggish_params.QUANTIZE_MIN_VAL) *
        (255.0 /
         (vggish_params.QUANTIZE_MAX_VAL - vggish_params.QUANTIZE_MIN_VAL)))
    # - cast 8-bit float to uint8
    quantized_embeddings = quantized_embeddings.astype(np.uint8)

Robert Mcanany

unread,

Oct 21, 2021, 10:08:12 AM10/21/21

to audioset-users

Never mind. It was me.

Here's the response from Manoj on the GitHub Issue Tracker https://github.com/tensorflow/models/issues/10313

If you are referring to the AudioSet embeddings documented at https://research.google.com/audioset/download.html, please note that, as per the example on that page, the embeddings are represented as tensorflow.BytesList which in turn is represented by a 'bytes' protocol message type which is a sequence of arbitrary byte values. These are not signed integers and you should interpret the values as unsigned uint8s in the range [0, 255].

Reply all

Reply to author

Forward

0 new messages