Implemetation details of paper Scale-Space Flow for Video Compression

Khanh Quoc Dinh

unread,

Dec 15, 2020, 8:02:54 PM12/15/20

to tensorflow-compression

Hello,

Thanks for the paper Scale-Space Flow for End-to-End Optimized Video Compression.

I am implementing this paper and encounter some implementation details, could you elaborate on these?

A. About scale-space flow:
- 1) is performance sensitive to \sigma_0? 2) what should be a good number for \sigma_0 ? from my setting, \sigma_0 from a range of 0.1 to 0.4 seems to be fine.
- 3) is reparameterization important? 4) The equation in the paper (page 8506, paragraph of reparameterization) might not be correct, should it be z = i + (\sigma_a^2 - \sigma^2) / (\sigma_b^2 - \sigma_a^2), supposed that \sigma_b at i+1 and \sigma_a at i? 5) how i in the equation will be calculated here and do we need stop_gradient for i? From my experiments, reparameterization does not help (similar performance), so I am not sure if I implemented it correctly.

B. About quantization (Session 3.3)
- 6) Do you use both [10] and [32] quantization, as we have three networks for [ intra, inter, and flow ], each has two bottlenecks of latent and hyperprior, totally 6 places? 7) if so how do you set which quantization method to which network and which bottleneck? Currently from my experiments, applying [10] to everywhere is better than applying [32] to everywhere.

C. About network architecture
- 8) From Table 2 in your supplementary document, you applied QReLU to Hyper decoder, does QReLU affect performance (or ReLU and QReLU different in terms of performance)?

Thank you :)

Best regards,
Khanh

Eirikur Agustsson

unread,

Dec 16, 2020, 5:39:43 AM12/16/20

to tensorflow-compression

Hello and thanks for your interest in the paper,

Please find replies inline below, I hope this helps!

best regards,

Eirikur

On Wednesday, December 16, 2020 at 1:02:54 AM UTC khanhq...@gmail.com wrote:

Hello,

Thanks for the paper Scale-Space Flow for End-to-End Optimized Video Compression.

I am implementing this paper and encounter some implementation details, could you elaborate on these?

A. About scale-space flow:
- 1) is performance sensitive to \sigma_0? 2) what should be a good number for \sigma_0 ? from my setting, \sigma_0 from a range of 0.1 to 0.4 seems to be fine.

We used sigma0=1.5, but did not observe a significant difference for other values (1.0, 2.0) in our initial experiments.

We originally chose the value large enough to avoid anti aliasing when subsampling the pyramid, but we ended up not subsampling as it was slower on GPU (even though it is less compute in FLOPs) (see "Complexity" paragraph in the paper).

- 3) is reparameterization important? 4) The equation in the paper (page 8506, paragraph of reparameterization) might not be correct, should it be z = i + (\sigma_a^2 - \sigma^2) / (\sigma_b^2 - \sigma_a^2), supposed that \sigma_b at i+1 and \sigma_a at i? 5) how i in the equation will be calculated here and do we need stop_gradient for i? From my experiments, reparameterization does not help (similar performance), so I am not sure if I implemented it correctly.

Apologies there is a type on the formula, it should be:
`z = i + (\sigma^2 - \sigma_a^2) / (\sigma_b^2 - \sigma_a^2)`

so that it produces i for `\sigma = \sigma_a` and i+1 for `\sigma=\sigma_b`.

We did not ablate for the reparameterization, so perhaps it is not be needed.

B. About quantization (Session 3.3)
- 6) Do you use both [10] and [32] quantization, as we have three networks for [ intra, inter, and flow ], each has two bottlenecks of latent and hyperprior, totally 6 places? 7) if so how do you set which quantization method to which network and which bottleneck? Currently from my experiments, applying [10] to everywhere is better than applying [32] to everywhere.

We combine them in the same way for each of the three networks: quantization is used to produce the reconstruction, but uniform noise for entropy modeling. You can find a discussion on what works best (noise vs quantization) in Sec. 4 in ( Channel-wise Autoregressive Entropy Models for Learned Image Compression, Minnen & Singh, ICIP 2020).

C. About network architecture
- 8) From Table 2 in your supplementary document, you applied QReLU to Hyper decoder, does QReLU affect performance (or ReLU and QReLU different in terms of performance)?

In the paper that proposes QReLU, it is found to require more channels to obtain comparable results to ReLU (see Fig. 3). We used it to ensure deterministic range coding and did not ablate on its performance impact.

diqk...@gmail.com

unread,

Dec 16, 2020, 7:24:18 PM12/16/20

to tensorflow-compression

Hi Eirikur,

Thanks a lot for your quick response, Eirikur. I will try training with your mentioned setting and hope to see results soon.

Bests,
Khanh

diqk...@gmail.com

unread,

Jan 13, 2021, 3:28:09 AM1/13/21

to tensorflow-compression

Hi Eirikur,

Wish you a happy new year with much achievement!

Can you clarify another question about architecture?
Q1. In Fig. 2 of the paper, you had a connection from [w_i] to [Residual Decoder]. What is exact the meaning of connection? Is that decoded [w_i] and decoded [v_i] will be concatenated before feeding to [Residual Decoder]?
Q2. Do you use the same number of channels for all rate-distortion points for Figs. in the paper (i.e., 3 -> 128 -> 128 -> 128 -> 192 -> 128 -> 128 -> 128 -> 3)? Published papers suggests that more channels will be better at high bitrate.

Thank you.

Best regards,

On Wednesday, December 16, 2020 at 7:39:43 PM UTC+9 Eirikur Agustsson wrote:

Wufei Ma

unread,

Jun 28, 2021, 9:55:40 PM6/28/21

to tensorflow-compression

Hi Khanh,

I'm wondering if you managed to reproduced the results claimed in the paper. Also, did you train the model on Vimeo-90K?

I'm curious about the exact benefit using the scale-space flow over bilinear warping. I would really help me a lot if you would like to share some of your results.

Thanks very much.

Best,

Wufei

diqk...@gmail.com

unread,

Jun 29, 2021, 7:23:35 PM6/29/21

to tensorflow-compression

Hello Wufei,

I have checked both warpings and scale-space warping, the scale-space warping consistently improves coding performance over bilinear (space) warping.

Basically, the improvement in terms of PSNR is about 0.3dB ~ 0.5dB at the same bitrate for all rate-distortion points I tested (< 0.5bpp, but I think the same gain will be observed for other bitrates).

Hope this helps.

Bests,
Khanh

diqk...@gmail.com

unread,

Jun 29, 2021, 7:36:07 PM6/29/21

to tensorflow-compression

Ah, sorry, I missed the first question. I tried to crawl the data from Youtube as the author, but the dataset is really big, about 7TBs and the quality of videos is low, even though I chose the highest bitrate.

If you follow the JVET meetings, there are two datasets proposed for video coding + neural network: BVI-DVC and Tencent, where BVI-DVC is the dominant one. Accordingly, I trained the network with BVI-DVC and managed to reproduce the result from the paper though not exactly as claimed.

Bests

On Tuesday, June 29, 2021 at 10:55:40 AM UTC+9 wuf...@gmail.com wrote:

Wufei Ma

unread,

Jun 29, 2021, 9:47:23 PM6/29/21

to tensorflow-compression

Hi Khanah,

Thanks for sharing the results! The results sound quite promising. Would you like to share the implementation of the scale-space flow with me? I'd like to train it on different datasets and compare it with other baselines. It's totally ok if you prefer not to, just like the original authors. I'm just asking in case you wouldn't mind.

Thanks again for the information.

Best,

Wufei

diqk...@gmail.com

unread,

Jul 8, 2021, 5:37:53 PM7/8/21

to tensorflow-compression

Hello Wufei,

Thanks for interested in my implementation. I am sorry that it is not possible to publish the source-code right now; but I will let you know as soon as there is a plan.

Bests,
Khanh

Wufei Ma

unread,

Jul 13, 2021, 9:38:06 PM7/13/21

to tensorflow-compression

Hi Khanh,

No problem, I totally understand. Thanks a lot for the results you provided. Those are very helpful!