1) The additional gains after three days of training compared to 1 day of training amount to 1.5% to 3%. However, the problem is that these additional gains are essential to determine the effectiveness or failure of any new method. Without this additional training, I am uncertain how to evaluate the efficacy of any newly proposed method. Also, can you elaborate on the input pipeline?
2) In nearly every model I've trained, I've observed that the BPP converges at the same rate or faster than the PSNR.
The autoencoder could be trained without the entropy component, and then the entropy coding could be introduced. Would that be effective or helpful in reducing the training time?
3) I have also attempted training on multiple GPUs, but the training time did not improve. I have also experimented with various GPU machines.