Submission got lower PSNR

181 views
Skip to first unread message

Pei You

unread,
Feb 22, 2021, 8:44:53 AM2/22/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Dear Organizer,

We have submit a decoder but got much lower PSNR than our expectation. (7 dB vs. 28 dB). Is there any way we can check our decoded images?
Since we have already tested locally with the same Docker environment, which get correct result. I want to check whether there is any decoding failure, or any other reasons.

Thank you very much for your help!

Lucas Theis

unread,
Feb 22, 2021, 1:45:07 PM2/22/21
to Pei You, Workshop and Challenge on Learned Image Compression (CLIC)
It is currently not possible to download any data from the server. I will send you some example reconstructions.

Also make sure you are calculating the PSNR for the average MSE over all images and not the average PSNR.

--
You received this message because you are subscribed to the Google Groups "Workshop and Challenge on Learned Image Compression (CLIC)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clic-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clic-list/e16b79bf-73f6-4eae-8040-51a04c675434n%40googlegroups.com.

Pei You

unread,
Feb 22, 2021, 7:36:23 PM2/22/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Dear Lucas,
Thank you. I'll check the reconstruction images after you send to me.

Yes, we calculate PSNR for the average MSE, and we also use the metric calculation scripts you provided when you reply to other challengers. The results are same.

Erfan Noury

unread,
Feb 22, 2021, 9:54:38 PM2/22/21
to Workshop and Challenge on Learned Image Compression (CLIC)
I think one issue might be non-deterministic cuDNN computations (floating point operations) if you run your model on GPU. Even if you are encoding your images on a GPU on the same Docker image, during decoding due to this non-determinism, some differences in computation might happen that would cause problems in the decoding process.
You can enforce cuDNN determinism in TensorFlow using the TF_CUDNN_DETERMINISTIC=1 env var and I think PyTorch has similar mechanisms for enforcing determinism in cuDNN.
This is just a guess, I’m not sure your issue is related to this problem.

Pei You

unread,
Feb 22, 2021, 10:29:16 PM2/22/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Actually, we have encountered non-deterministic issues when running our model on GPU before.
So this time, we encode using CPU and submit the result to the same CPU docker.
I'm not sure whether it is still non-deterministic issue, maybe reconstruction images can give me some hints.

Thank you very much for sharing the solution, we will try your methods and submit to GPU docker to see the results.



Pei You

unread,
Feb 26, 2021, 3:35:46 AM2/26/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Dear Erfan,

We tested your methods, but still the issue occured, seems it is not related to non-deterministic issue.

These two days we have tested on multiple servers and find out that during convolution calculation, results may be different, perhaps due to different float calculation precision?
And this issue caused the problem:
    Encode and decode on one server, no problem.
    Encode and decode on different servers, problem may occurs, since encoded bin files differ a little.

Do you have any comments on this issue?
Thank you.

Erfan Noury

unread,
Feb 26, 2021, 4:09:31 AM2/26/21
to Pei You, Workshop and Challenge on Learned Image Compression (CLIC)
If you are seeing different results on different servers then most probably it's an issue with floating point computations. Your model is too sensitive to these changes. 
One thing you can do is pin your network to stay on CPU and try again. I think floating point operations on CPU are more reproducible. 

Best regards,
Erfan 

On Feb 26, 2021, at 00:35, Pei You <yplo...@gmail.com> wrote:

Dear Erfan,
You received this message because you are subscribed to a topic in the Google Groups "Workshop and Challenge on Learned Image Compression (CLIC)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clic-list/32HaLI9oino/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clic-list+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clic-list/5b875fd8-97c9-46ef-8098-ffa68cc3999an%40googlegroups.com.

Pei You

unread,
Feb 26, 2021, 8:39:38 AM2/26/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Dear Erfan,

Actually, we had already pin our network to stay on CPU. The inference process is totally on CPU. So it is a little weird.
We tested our model on 3 different servers, in which two of them inference correctly to each other, while the other one can not.
The HW spec is totally different of 3 different servers. The one with i9-9700 CPU and the one with Xeon Silver 4210 can inference correctly to each other. But the one with Xeon E5 2690 v2 can not.
We are trying to figure it out, if you have any further comments, pls share to me. Thank you very much.

Pei You

unread,
Feb 26, 2021, 8:44:54 AM2/26/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Dear Lucas,

Have you met this issue before during past challenges?
If you have any suggestions, please share to me, thank you so much.
By the way, could you share me the detailed type of CPU of your host server?

Lucas Theis

unread,
Feb 26, 2021, 12:08:01 PM2/26/21
to Pei You, Workshop and Challenge on Learned Image Compression (CLIC)
The type of CPU is not something we control and so we cannot promise, for example, that the same CPU will be available at test time.

There have certainly been submissions in the past that had problems with instability. Perhaps if you tell us a bit about what your codec is doing, someone can point out tricks to make your codec more robust.

George Toderici

unread,
Feb 26, 2021, 1:35:06 PM2/26/21
to Lucas Theis, Pei You, Workshop and Challenge on Learned Image Compression (CLIC)
If you're having issues with running on different CPUs then there's a very deep problem that needs to be resolved. I would suggest looking into quantized computation for the critical portions of your codec. In particular don't rely on floating point for the parts which make or break the decoding.

Pei You

unread,
Feb 27, 2021, 3:23:40 AM2/27/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Dear George and Lucas,

Thanks for your reply.

Actually, our model is VAE based, adding some specific designed modules.
We implement our code based on open source code https://github.com/tensorflow/compression and https://github.com/ZhengxueCheng/Learned-Image-Compression-with-GMM-and-Attention. Today we test the model provided on the GMM github and find the same issue.

I find that there exist participants implement code based on the same open source code as ours in past events. I’m not sure whether they also met the issues before.

Looking forward to your reply. Thank you very much.


gtoderici

unread,
Feb 28, 2021, 6:54:21 PM2/28/21
to Workshop and Challenge on Learned Image Compression (CLIC)
AFAIK if you use the tensorflow_compression you should get stable results as long as the hyperprior decoder is quantized.

Pei You

unread,
Feb 28, 2021, 8:36:41 PM2/28/21
to Workshop and Challenge on Learned Image Compression (CLIC)
Thanks!
We've noticed the paper and we are tring to find out the difference between our implementation and tensorflow_compression.
I will follow your advice, and try to solve the issue through quantized computation.

Reply all
Reply to author
Forward
0 new messages