AudioSet embeddings range is [-128, +127], vggish_postprocess.py range is [0, 255]

62 views
Skip to first unread message

Robert Mcanany

unread,
Oct 16, 2021, 4:30:39 PM10/16/21
to audioset-users
I'm probably doing something wrong, but I can't see it. 

I extracted the embeddings from the TFRecords following this code example:

Spot checking, I saw the range of the was [-128, +127]

In vggish_postprocess.py, I think this code ensures the range is [0, 255]

    # Quantize by:
    # - clipping to [min, max] range
    clipped_embeddings = np.clip(
        pca_applied, vggish_params.QUANTIZE_MIN_VAL,
        vggish_params.QUANTIZE_MAX_VAL)
    # - convert to 8-bit in range [0.0, 255.0]
    quantized_embeddings = (
        (clipped_embeddings - vggish_params.QUANTIZE_MIN_VAL) *
        (255.0 /
         (vggish_params.QUANTIZE_MAX_VAL - vggish_params.QUANTIZE_MIN_VAL)))
    # - cast 8-bit float to uint8
    quantized_embeddings = quantized_embeddings.astype(np.uint8)

Robert Mcanany

unread,
Oct 21, 2021, 10:08:12 AM10/21/21
to audioset-users
Never mind.  It was me. 

Here's the response from Manoj on the GitHub Issue Tracker https://github.com/tensorflow/models/issues/10313

If you are referring to the AudioSet embeddings documented at https://research.google.com/audioset/download.html, please note that, as per the example on that page, the embeddings are represented as tensorflow.BytesList which in turn is represented by a 'bytes' protocol message type which is a sequence of arbitrary byte values. These are not signed integers and you should interpret the values as unsigned uint8s in the range [0, 255].
Reply all
Reply to author
Forward
0 new messages