slightly different FID score

Shangyin Gao

unread,

Feb 28, 2021, 9:27:11 PM2/28/21

to Workshop and Challenge on Learned Image Compression (CLIC)

Dear organizers,

I used your 'metrics.py' script for FID calculation. Even the random seed is fixed (the image patch is the same for each run), I got a slightly different FID score for the same reconstruction images.

Did I misuse the script or there are some bugs in the calculation?

Best,

Shangyin

Lucas Theis

unread,

Mar 1, 2021, 4:39:45 AM3/1/21

to Workshop and Challenge on Learned Image Compression (CLIC)

Can you provide more details of what you did (scripts you used and how you ran them, where the images in the same order, etc.) and the numbers you were seeing?

zhengxue cheng

unread,

Mar 4, 2021, 7:34:31 AM3/4/21

to Workshop and Challenge on Learned Image Compression (CLIC)

Hi Lucas and Shangyin,

I also encountered with FID calculation problem. FID scored given by server and local host are different, but PSNR and MS-SSIM are the same. It seems FID is related to the order of images. Below is how I generate the dict of submission_images, target_images, as the input of metrics.py. When I tried two different orders by reading the following txt or using glob(distort_img_dir), one FID score is 213 and the other is 198.

Did I misuse the script? Could you please give some hints on how server calculate FID scores?
Thank you very much for your help.
------------------------------------------

test_files = "clicv.txt"
with open(test_files) as f:
content = f.readlines()
file_name = [x.rstrip() for x in content]

submission_images = OrderedDict()
target_images = OrderedDict()
for i in range(0, len(file_name)):
submission_images[name] = distort_img_dir + file_name[i]
target_images[name] = target_img_dir + file_name[i]

results = evaluate(submission_images, target_images)
------------------------------------------

Zhengxue

clicv.txt

Lucas Theis

unread,

Mar 9, 2021, 2:01:04 PM3/9/21

to Workshop and Challenge on Learned Image Compression (CLIC)

Because FID is more compute intensive, FID is calculated from 256x256 crops randomly sampled from the images (one crop per image). The random selection happens with a fixed seed, so the scores should be the same across different runs with the same reconstructions, and the selected crops should be the same for different methods. But if you change the order of the filepaths passed to evaluate(), this will affect the crops and therefore slightly change the FID score.

Reply all

Reply to author

Forward