Quite different results with "noisy quantization" v.s. actual quantization

Tomás

unread,

Aug 14, 2021, 6:13:59 PM8/14/21

to tensorflow-compression

Dear tf-compression authors,

I notice there's quite some difference in the results with `model(x, training=True)` v.s. `model(x, training=False)` in the main models, like bls2017 and bmshj2018; the BPP is often a lot higher (like 10~20%) with `training=True` than `training=False`, and I wonder why. I thought the noisy quantization with uniform noise should give very close BPP to actual quantization, according to the ICLR 2017 paper, no?

I first noticed this from the keras training logs, where the training (loss,bpp,mse) was quite different from the validation (loss,bpp,mse):

e.g., training bls2017.py on CLIC with --num_filters 192 and --lambda 0.01:

Epoch 101/20

10000/10000 [==============================] - 439s 44ms/step - loss: 0.9100 - bpp: 0.4769 - mse: 43.3064 - val_loss: 0.8713 - val_bpp: 0.3983 - val_mse: 47.2942

Epoch 102/200

10000/10000 [==============================] - 416s 42ms/step - loss: 0.9121 - bpp: 0.4776 - mse: 43.4441 - val_loss: 0.8689 - val_bpp: 0.3980 - val_mse: 47.0877

like the train loss can be 10-20% higher than val_loss (0.91 v.s. 0.87), and train bpp also higher than val_bpp (0.477 v.s. 0.398), and mse is different too. And it stays this way until convergence.

I thought there was some mismatch in train vs validation data, but I got similar results to the train numbers when I evaluated on the validation set with `training=True`.

Thank you!

Johannes Ballé

unread,

Aug 23, 2021, 11:41:07 PM8/23/21

to tensorflow-...@googlegroups.com

Hi Tomas,

this is actually to be expected, and there is an explanation for why the uniform noise version has a higher bit rate than the actual quantization version. You can think of adding uniform noise as equivalent to performing uniform quantization, but with a randomized quantization offset (i.e. the quantization bins "jitter" with every value). This is also called dithered quantization.

The entropy with dithered quantization is basically an expected value with respect to the random offset (i.e. an average), while the entropy with uniform quantization is with respect to a fixed offset. We take certain steps to ensure that the fixed offset is at least close to the best possible quantization offset. Hence, it is close to the minimum of the entropy with respect to the offset. Since an average is always greater or equal to a minimum, this explains the difference.

This is all explained in greater detail in Section III.A of this paper: https://ieeexplore.ieee.org/abstract/document/9242247

Hope this helps!

Let me know if you have any additional questions.

Johannes

--
You received this message because you are subscribed to the Google Groups "tensorflow-compression" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tensorflow-compre...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/d3fcbcaa-b95a-48cd-b846-b182a6e1c0f6n%40googlegroups.com.

Tomás

unread,

Aug 27, 2021, 2:29:20 PM8/27/21

to tensorflow-compression

Hi Johannes,

Thank you for the great explanation and the paper! It all makes sense now.

About finding the best quantization offset, I remember in the 1.* version of tensorflow-compression, the deep factorized prior used the median (I think it did an optimization to find the quartiles, including median) as the quantization offset. But when I look at the latest version of the DeepFactorized prior, the quantitation offset is simply set to 0: https://github.com/tensorflow/compression/blob/48f2e5abe162b7ee48fb0facc13769b87666b44d/tensorflow_compression/python/distributions/deep_factorized.py#L241