image uploads

Brian Stewart

unread,

Dec 1, 2022, 8:23:58 AM12/1/22

to gcd-tech

Does the image upload system create a hash for every picture uploaded?

1. Does a hash check occur to possibly flag/alert that this is a duplicate of a picture that already exists, and provide a direct link to the issue?

a. Handy during uploading to alert submitter

b. handy for approvers to quickly find/compare a possible issue

2. If all the images were hashed then a dupe report could be generated in the special reports section for a deeper review

Just some random thoughts I had as I was trying to manage my own local collection of images. Granted images visually might be the same but have different hashes due to metadata, or minor pixel changes. I'm looking at my own files and creating a fixed reference thumbnail 300px of all master images, and then hashing both ... the smaller file created by my batch process would be meta free and if there were two similar images in theory their smaller generated image with reduced image surface is more likely to create a hash collision.

---BRIAN

Alexandros Diamantidis

unread,

Dec 1, 2022, 11:29:55 AM12/1/22

to gcd-...@googlegroups.com

Hi Brian! Comments below:

* Brian Stewart [2022-12-01 05:23]:

> Does the image upload system create a hash for every picture uploaded?
>
> 1. Does a hash check occur to possibly flag/alert that this is a duplicate
> of a picture that already exists, and provide a direct link to the issue?

No, each upload is treated as just an independent file which gets an id
in the database (and some resized versions for display in the indexes).

Nothing in the code checks the contents of the file (either via a hash
or via another mechanism). We trust the uploaders to upload the correct
image and the editors checking the uploads to catch occassional mistakes
or problems.

Apart from that, while the cover and other scans we keep is a nice extra
and important for identification and documentation purposes, our
original mission began with the issue indexes so our search tools focus
on the database data, not on the pictures.

> a. Handy during uploading to alert submitter
> b. handy for approvers to quickly find/compare a possible issue

Here you mean a user uploading the same image twice by mistake? And this
concerns uploading an image to the incorrect issue after having added it
to the correct one? Or another scenario?

> 2. If all the images were hashed then a dupe report could be generated in
> the special reports section for a deeper review

Hopefully, there are no unnecessary dupes because the editors catch
them during approval. Of course there are various cases where the covers
of different issues might legitimately be very close visually, e.g. for
variants, second printings, etc., as you mention later in your message.
For this reason, I don't think even a visual similarity / perceptual
hash, e.g. using the Python ImageHash lib
(https://pypi.org/project/ImageHash/) would help much in our normal
workflow.

Alexandros

Brian Stewart

unread,

Dec 1, 2022, 11:41:02 AM12/1/22

to gcd-tech

Thanks for the feedback Alexandros, yes comic index/creators/story data certainly is the primary focus just figured some quick hash item to flag potential conflicts might be handy during the reviewing process.

I guess the only gap I see regularly, especially when I batch upload an entire series of covers + indicia for a title (before I start working on the actual data index) is during the upload of indicias. Can the indicia (and state of ownership) process be updated to be the same as the main cover workflow? After indicia/SOO is uploaded can the web page actually DISPLAY the image when done? I often mistakenly click the wrong image file for indicia (human error), and since the image preview step is skipped it gets missed until kicked by an editor.

---BRIAN

Reply all

Reply to author

Forward