Thanks Johanne, this was very helpful.
Another question, when you estimate the parameters of the hyperprior,
how is their error backpropagated? For example, if you assign a
certain set of distribution parameters to a particular latent
variable, how do you know the estimate is correct? It looks like
your hyperprior params are used to seed the table of the AE/AD. But,
none of the outputs of those modules (AE/AD) are used during training
time. You pass through the latent code through by adding uniform
noise and you use the hyperprior params along with the pdf of the
latent code to estimate the entropy/rate. What prevents the hyperprior
network setting all the variances of the latents to zero and thereby
reducing the rate. I am uncertain how the parameters of the
hyperprior network affect network training other than what I just
mentioned. In this case, I don't see why the network wouldn't "cheat"
like I mentioned. I must be missing something obvious.
Isaac
On Wed, Feb 3, 2021 at 9:58 PM 'Johannes Ballé' via
tensorflow-compression <
tensorflow-...@googlegroups.com>
wrote:
> To view this discussion on the web visit
https://groups.google.com/d/msgid/tensorflow-compression/CAN4xYSBXHJpsC5BNYJzHnt5P%2B-2043Mzoz4aD_1L41of_v8q1w%40mail.gmail.com.