Question about the necessity of hyperprior

177 views
Skip to first unread message

Isaac Gerg

unread,
Feb 3, 2021, 2:28:49 PM2/3/21
to tensorflow-...@googlegroups.com
Hi,

I see in "Variational image compression with a scale hyperprior" that
the hyperprior is used to take the out spatially varying entropy
(variance) of the sample image in Figure 2 left middle. Why not just
normalize all the variances of the features at each spatial location?
My thought would be that that should eliminate the hyperprior and the
encoding optimization can be handled by the network itself. I would
assume though that have already tried this or decided against it from
reading your papers (they are very good). So, what am I missing? :)

Thank you,
Isaac

Isaac Gerg

unread,
Feb 3, 2021, 4:34:04 PM2/3/21
to tensorflow-compression
I'm going to take a stab and answer my own question.   I suppose what I propose is easily shown to be suboptimal.  For example, you may have one section of an image which predicts with almost certainty that the rest of it might be blank.  In this case, you don't want to encode the blank area using the same pdf as other areas.  This means you must have quite a few of these tables for the hyperprior if you model every latent variable (16x16x128 for example if your input image is 256x256).  Is this correct?

Isaac

Johannes Ballé

unread,
Feb 3, 2021, 9:58:01 PM2/3/21
to tensorflow-compression
Hi Isaac,

the long answer depends on what you mean exactly by "normalizing the variances of the features". For the purposes of a short answer, I'm going to assume you mean to rescale the features such that their variance is one everywhere. The problem with that is that if you normalize the variance, but keep the quantization step size the same, then you will encode low-variance regions with much higher precision than necessary. In addition, if the encoder normalizes all the features to unit variance, how would the decoder undo the normalization without receiving additional information from the encoder? It would not have access to the original variances, unless the encoder transmits it.

The hyperprior architecture represents exactly that: information about the variances that the encoder shares with the decoder.

Johannes.

--
You received this message because you are subscribed to the Google Groups "tensorflow-compression" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tensorflow-compre...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/423be841-c7cc-4317-b15c-5499b45d2ddfn%40googlegroups.com.

Isaac Gerg

unread,
Feb 3, 2021, 11:40:21 PM2/3/21
to tensorflow-...@googlegroups.com
Thanks Johanne, this was very helpful.

Another question, when you estimate the parameters of the hyperprior,
how is their error backpropagated? For example, if you assign a
certain set of distribution parameters to a particular latent
variable, how do you know the estimate is correct? It looks like
your hyperprior params are used to seed the table of the AE/AD. But,
none of the outputs of those modules (AE/AD) are used during training
time. You pass through the latent code through by adding uniform
noise and you use the hyperprior params along with the pdf of the
latent code to estimate the entropy/rate. What prevents the hyperprior
network setting all the variances of the latents to zero and thereby
reducing the rate. I am uncertain how the parameters of the
hyperprior network affect network training other than what I just
mentioned. In this case, I don't see why the network wouldn't "cheat"
like I mentioned. I must be missing something obvious.

Isaac

On Wed, Feb 3, 2021 at 9:58 PM 'Johannes Ballé' via
tensorflow-compression <tensorflow-...@googlegroups.com>
wrote:
> To view this discussion on the web visit https://groups.google.com/d/msgid/tensorflow-compression/CAN4xYSBXHJpsC5BNYJzHnt5P%2B-2043Mzoz4aD_1L41of_v8q1w%40mail.gmail.com.

Isaac Gerg

unread,
Feb 5, 2021, 6:45:59 PM2/5/21
to tensorflow-compression
Hi Johanne,

Does anything prevent the hyperpriors from being all mu=0, sigma=1 and letting the latents simply take on their own mu and sigma bypassing the hyperprior model?

Isaac

Johannes Ballé

unread,
Feb 5, 2021, 8:24:31 PM2/5/21
to tensorflow-compression
Hi Isaac,

the model will not cheat because it still needs to balance the rate term(s) with the distortion term. Like in the 2017 model, the rate term fits the entropy model in a maximum likelihood sense to the latent space activations. The latents in turn cannot become all zero due to the distortion term (unless the rate term has a much higher weight).

Hope this helps!
Johannes

Isaac Gerg

unread,
Feb 11, 2021, 10:15:19 AM2/11/21
to tensorflow-compression
Thanks Johannes,  It all clicks now and I was able to get everything working from scratch and I believe I understand all the parts of your code now.  This is a very interesting problem and I think your approach is very interesting and  fun to study. -Isaac

Johannes Ballé

unread,
Feb 18, 2021, 3:22:35 PM2/18/21
to tensorflow-compression
Reply all
Reply to author
Forward
0 new messages