Hi there,
I'm curious if you have any insight/experience about how a (hyper)decoder network should be parameterized to compute the indexes for an indexed entropy model.
From reading the
Integer Networks paper, it seems that the decoder should ideally output integers in {0, 1, ..., L-1} as indexes for the indexed entropy model. But in the current (non-integer-network) implementation, is there "best practice" for doing this?
For example, in
bmshj2018.py, the scale indexes are directly computed by the hyper decoder and used to construct the conditional entropy model:
indexes = self.hyper_synthesis_transform(z_hat)
y_hat, bits = entropy_model(y, indexes, training=training)
And from reading the code for ContinuousBatchedEntropyModel, it seems like the entropy_model simply clips `indexes` to [0, 63] (here L=64) and uses the continuous values for evaluating rate loss / backpropagation, and at test time `indexes` are simply rounded to integers. So the hyper_synthesis_transform isn't really trained to output integer values in {0,...,63}, like it should at test time.
Wouldn't it be a good idea to pass `indexes` through a properly scaled sigmoid in order to constraint its values to [0, 63]? And then perhaps even round to integers and use something like straight-through gradient during training?
Thanks,
Yibo