SHA-512 in django model? Optimization help?

25 views
Skip to first unread message

Benjamin Schollnick

unread,
Apr 7, 2019, 9:12:19 PM4/7/19
to Django users
Right now, I’m generating a SHA-512 of a file in my Django model, but that requires a string of 128 bytes…

I’m not positive on a better way to store the sha-512 in my model?  Can anyone suggest a method / design that would be better optimized?

A quick test, looks like I might be able to store this in a blob of 64 bytes, which would be a considerable savings…

t = hashlib.sha512(test.encode("utf-16”))
digest = t.digest()
len(digest)
64

t = hashlib.sha512(test.encode("utf-16”))
hexdigest = t.hexdigest()
len(hexdigest)
128

But thinking about it, I could convert the hex digest to an integer?

int(hexdigest,16)
4298666745768817459166789395753510504053621749752930724783367173454957154660445390018210346619930005782627382250329880502243093184532984814018267510704707
z = int(hexdigest,16)
type(z)
<class 'int’>


‘52137203c3c4b62bc981fd9c8770952bfd1984ee9ce6e33ec94e485bc31a5631b6c6d15c1a2646f39c887575b576e66ed1ddbd96112d5355e574f06df8878a43'
0x52137203c3c4b62bc981fd9c8770952bfd1984ee9ce6e33ec94e485bc31a5631b6c6d15c1a2646f39c887575b576e66ed1ddbd96112d5355e574f06df8878a43

They convert identically, but I’m not sure that converting to integer would keep the integrity of the sha-512?

Can anyone make any sort of suggestion on the best way to handle this?

- Benjamin

René Fleschenberg

unread,
Apr 8, 2019, 2:26:00 AM4/8/19
to django...@googlegroups.com
Hi

Assuming you are on Postgres, Using BinaryField(max_length=64) should be
good if you want to optimize for size. According to the Postgres docs,
it will use "1 or 4 bytes" plus the actual string.

It is also reasonably convenient to work with:

instance.hashfield = hashlib.sha512(b'test').digest()
instance.save()

And if you want to output for display:

print(instance.hashfield.encode('hex'))

An integer is not going to work, since you need 64 bytes (512 / 8 = 64)
and the largest integer type can only hold 8 bytes.


René
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com
> <mailto:django-users...@googlegroups.com>.
> To post to this group, send email to django...@googlegroups.com
> <mailto:django...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/61a7c17c-7c23-49ca-9cdb-0df325672ed5%40googlegroups.com
> <https://groups.google.com/d/msgid/django-users/61a7c17c-7c23-49ca-9cdb-0df325672ed5%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
René Fleschenberg

Rosastrasse 53, 45130 Essen, Germany
Phone: +49 1577 170 7363
https://fleschenberg.net
E-mail: re...@fleschenberg.net

René Fleschenberg

unread,
Apr 8, 2019, 3:10:53 AM4/8/19
to django...@googlegroups.com
Hi again!

Something I forgot to mention earlier: If the files are not too big,
consider generating the SHA-512 on the fly instead of storing it.
Space-wise, it is the best possible optimization, obviously. And it
frees you from having to ensure consistency between the file and the hash.


René


On 4/8/19 3:12 AM, Benjamin Schollnick wrote:

Benjamin Schollnick

unread,
Apr 8, 2019, 5:30:32 AM4/8/19
to Django users

Something I forgot to mention earlier: If the files are not too big,
consider generating the SHA-512 on the fly instead of storing it.
Space-wise, it is the best possible optimization, obviously. And it
frees you from having to ensure consistency between the file and the hash.

We're using MS SQL server...  

But we're using the SHA-512 to validate file integrity through a multistage process.  So, we need to have the SHA for certain steps of the process.
(eg. Ingress, Conversion, Output) so that we can validate it's the same file we received, etc.

The issue I am having is that the sheer size of this database, is making every byte count...  (40+ million records)
And disk space is a cost, which I am trying to work around... 
Reply all
Reply to author
Forward
0 new messages