Hi Xinyu, Tomas:
my apologies for the late reply, unfortunately I wasn't able to respond earlier to your emails and I'm just processing them now.
With the context model, there are some subtleties regarding how exactly to condition on the previously decoded values. David Minnen (CC'ed) knows more about the details here, since he implemented this code.
It sounds like the problem may be that you would want to condition on the quantized values rather than the "noisy" values during training, since the decoder can only "see" the quantized values. In order for the encoder and decoder to use the same conditioning, you would want to make sure that both are using the same values for the conditioning, or else there may be a performance drop due to mismatch between encoder and decoder.
Regarding the other question, SignalConv is essentially the same as Conv, but it has more options for boundary handling, which can be quite significant in the case of image compression models (or fully convolutional autoencoders).
We didn't use any regularizations or learning rate tricks other than described in the paper.
I hope this helps. Please let me or David know if you have any further questions!
Johannes