SHA512 hash is wrong for some images

77 views
Skip to first unread message

nuk...@googlemail.com

unread,
Jun 27, 2015, 2:02:10 AM6/27/15
to derpiboor...@googlegroups.com
For example let's take https://derpibooru.org/923591.json

"sha512_hash":"7b2e0256e1f9de5bf18a44ae1e1776d37651773d74c6eb9242408a2deda68adfc3499f0b86667c77758bb1821f66483fbcbd358871a8981e758bd5d8c93e5672"

If I download the full image https://derpicdn.net/img/view/2015/6/25/923591__safe_twilight+sparkle_rainbow+dash_fluttershy_princess+luna_scootaloo_sketchdump_artist-colon-rorakkusu.png

and run sha512sum on it, it returns a different hash?

How do you guys get your hashes?

Jared Stafford

unread,
Jun 28, 2015, 11:23:03 AM6/28/15
to derpiboor...@googlegroups.com
I've noticed the same thing too. I tried to search through the code to find out exactly where the hash is generated, but I'm completely lost.

Clover, do you have any idea why the hash is incorrect on almost every image?


--
You received this message because you are subscribed to the Google Groups "Derpibooru Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to derpibooru-disc...@googlegroups.com.
To post to this group, send an email to derpiboor...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/derpibooru-discuss/eb49fa60-ffea-458b-bf8e-7d80857c1c1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Clover the Clever

unread,
Jun 28, 2015, 11:24:47 AM6/28/15
to derpiboor...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The hash should be performed on the image data itself, not the file. So
ignoring non-image headers etc. It's done this way because we do (or
did) do a lot of header scrubbing on upload.

Cheers,
Clover

On 28/06/2015 16:23, Jared Stafford wrote:
> I've noticed the same thing too. I tried to search through the code to find out exactly where the
hash is generated, but I'm completely lost.
>
> Clover, do you have any idea why the hash is incorrect on almost every
image?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQIcBAEBAgAGBQJVkBG8AAoJEG5qwsD99qALcsUQANSAuuc/BPEgr51yj/vxQ81t
xxt9OQHvRcqM/GnTUt2482O8RHKI7grTNTE2vPEhCh/D1m0QssbtXFyBIZRDhdgS
u0cCf84J2Y76ryfD/nSBQAzFvyoRYwQKEWT2IgHEQfeTYFKnu1cD7oE7FlaRnxb/
eSPm+Pq4+Fy2i99QKLsTk+ynb/QqpQsj8lPlMAq8F2QzZETlPC/AhcVzwmnV2Kyj
U2f5ZculPq9OLBo6i10vR3mEbi/+nzwaXMcKRZlloColyWUYUp/cSZM22R1fBgyJ
OUknPBghLGHvfkv50tRgZqRm+YZXPj5hKYahWb7o0dc+3Zc3qIzq7bi/Lln+jEcO
25pJLw6vZqBOxOm0AANq3PCY57nXlrRPHcaQPVGFV2mooWiEGrFtz3BmL2kZ4rrJ
0/jXa29m+FBF23tsfzV6H7ZFfCPC5UC5km6J+0S90O5k+Ck0OIGd0QECXe8PJU1u
KOWI53zGx7aix3Bc/tgsRu+KrtrqiQXRQsPHIvcmkFcDDDIHHvfubrNxtqbVoRMz
ef5x6oGkuMBUKZtaZS1l4KPQ0uZdKc2Sztld0xHVFSS2ahVsNjXm0yqmegyKHND4
3ssAuM3UsJUIyFYPBYBPU95RN69+Ma7kOljTwWyWH/nyF9ImakJ9C6WU77/DlY0I
+YtXhv+fH5jYeSVZeUwe
=tXLX
-----END PGP SIGNATURE-----

Jared Stafford

unread,
Jun 28, 2015, 11:49:34 AM6/28/15
to derpiboor...@googlegroups.com
So what is the algorithm to remove the non-image headers?

--
You received this message because you are subscribed to the Google Groups "Derpibooru Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to derpibooru-disc...@googlegroups.com.
To post to this group, send an email to derpiboor...@googlegroups.com.

nuk...@googlemail.com

unread,
Jun 28, 2015, 3:39:26 PM6/28/15
to derpiboor...@googlegroups.com, clovert...@derpibooru.org

But there is still a lot of images which use the old hash. Will you update them?

Liam White

unread,
Jul 2, 2015, 6:02:03 PM7/2/15
to derpiboor...@googlegroups.com, nuk...@googlemail.com

The hash algorithm for the file uploads is done by using
Digest::SHA512.hexdigest(temp_object.data)

This is the original sha512 hash (orig_sha512_hash in searches and JSON)
The thumb paths are very likely to not have the same hash as the thumbs are all passed through ImageMagick to fix the files. I don't think we are actually serving the original file at all anymore after a few of my changes to the upload pipeline, actually.

ribb...@gmail.com

unread,
Oct 18, 2015, 1:24:04 AM10/18/15
to Derpibooru Discussion, nuk...@googlemail.com, inksca...@gmail.com
I believe the hash of the "full" representation of the file should be served under some other additional key in the metadata. Omitting it creates a whole lot of work since what should be a simple retrieval and verification operation now has to load libraries to parse the file and hashsum just the image data.

This puts it out of reach of simple shellscripts.

Unless there is some preexisting commandline utility that hashes the file as you do that you could point me to? I could just call it and capture it's output.

Reply all
Reply to author
Forward
0 new messages