How to create a fixed-length binary field with a unique constraint?

692 views
Skip to first unread message

Guido Winkelmann - Hornetsecurity

unread,
Jul 10, 2017, 12:15:32 PM7/10/17
to django...@googlegroups.com
Hi,

How can I create a table with a fixed-length binary field with a unique
constraint?

I need to save a SHA-256 checksum as a part of my objects, and I need to make
sure I never have two or more objects with the same checksum in the database.
Also, I need to be able to look up objects by that checksum.

I have tried defining the model like this:

class Certificate(models.Model):
...
fingerprint_sha256 = models.BinaryField(max_length=32)
class Meta:
db_table="certificates"
...

Problems with this:

- BinaryField does not seem to support a length parameter, just max_length.

- When I make a migration with that model and apply it to a MySQL database,
the resulting field will be of type "longblob" instead of the "binary(32)" I
was going for.

- MySQL cannot create a unique constraint on a binary field without a length
constraint (or a length constraint over 255). If I add a unique=True parameter
to that field, migrating that change to a MySQL database will fail.

I can just manually connect to the MySQL server on the command line and change
the field to what I want, and in fact the Django app will work fine like that,
however I have a feeling that this undermines Django's database migration
concept and may lead to unforeseen problems in the future. (Also, I don't want
to unnecessarily complicate the install process for the app beyond what is
normal for a Django app.)

I could save the checksum as an ASCII representation in a CharField instead,
however this would more than double storage requirements for the field
(possibly even quadruple when using UTF-8 charset). This table is going to be
very large and very busy in production, and I'm afraid that any increase in
row size is going to be detrimental to performance.

Does anyone have a good idea how to solve this problem?

Regards,

Guido

Mike Morris

unread,
Jul 10, 2017, 7:25:37 PM7/10/17
to django...@googlegroups.com
From your description (save a SHA-256 checksum), you do not need a
binary field. Binaries are always of indeterminant length; they can hold
photos, executables, sound files, etc.

By definition, a SHA -256 is 256 bytes of ASCII.

You probably want a CharField(length=256).

ecas

unread,
Jul 11, 2017, 5:18:22 AM7/11/17
to Django users
I would use:

fingerprint_sha256 = models.CharField(max_length=64, unique=True, db_index=True)

In hexadecimal, a SHA-256 uses 64 characters (4 bits per char).
The ASCII of the characters used in the hexadecimal representation are 1 byte in UTF-8 (0-9, a-f).

Tom Evans

unread,
Jul 13, 2017, 10:06:38 AM7/13/17
to django...@googlegroups.com
On Tue, Jul 11, 2017 at 12:05 AM, Mike Morris <mi...@musicplace.com> wrote:
> From your description (save a SHA-256 checksum), you do not need a binary
> field. Binaries are always of indeterminant length; they can hold photos,
> executables, sound files, etc.
>
> By definition, a SHA -256 is 256 bytes of ASCII.
>
> You probably want a CharField(length=256).
>

BinaryField is arbitrary size, binaries are not. The binary
representation of a SHA-256 is significantly smaller (it's 256 *bits*,
or 32 bytes), hence why the OP wants to store his binary objects in
the most efficient format possible, not 2 or 8 times that size.

Its like storing an IPv4 address as the 15 bytes string
"10.123.132.254" or the 4 byte integer 175867134.

Guido:

You can make your own custom database types quite simply (completely untested):

class FixedSizedBinaryField(models.BinaryField):
def __init__(self, num_bytes=None, *args, **kwargs):
self.nbytes = num_bytes
super(FixedSizedBinaryField, self).__init__(*args, **kwargs)

def deconstruct(self):
name, path, args, kwargs = super(FixedSizedBinaryField, self).deconstruct()
kwargs[''num_bytes'] = self.nbytes
return name, path, args, kwargs

def db_type(self, connection):
return "binary(%d)" % self.nbytes

For more details, check out the docs on custom fields:

https://docs.djangoproject.com/en/1.11/howto/custom-model-fields/

Cheers

Tom

winke...@hornetsecurity.com

unread,
Jul 13, 2017, 12:34:49 PM7/13/17
to Django users
On Thursday, July 13, 2017 at 4:06:38 PM UTC+2, Tom Evans wrote:
 [...]
Guido:

You can make your own custom database types quite simply (completely untested):

class FixedSizedBinaryField(models.BinaryField):
  def __init__(self, num_bytes=None, *args, **kwargs):
    self.nbytes = num_bytes
    super(FixedSizedBinaryField, self).__init__(*args, **kwargs)

  def deconstruct(self):
    name, path, args, kwargs = super(FixedSizedBinaryField, self).deconstruct()
    kwargs[''num_bytes'] = self.nbytes
    return name, path, args, kwargs

  def db_type(self, connection):
    return "binary(%d)" % self.nbytes

For more details, check out the docs on custom fields:

https://docs.djangoproject.com/en/1.11/howto/custom-model-fields/

Thanks for the suggestion. I am now working with this custom field class (I actually found the relevant documentation and the solution myself in the meantime. It's almost identical to yours, except without a configurable length):

class SHA256Field(models.BinaryField):
    def __init__(self, *args, **kwargs):
        kwargs['max_length'] = 32
        super(SHA256Field, self).__init__(*args, **kwargs)

    def deconstruct(self):
        name, path, args, kwargs = super(SHA256Field, self).deconstruct()
        del kwargs["max_length"]

        return name, path, args, kwargs

    def db_type(self, connection):
        return "BINARY(32)"

This works very well for me - in MySQL/MariaDB. I haven't tested it with any other database systems.

Regards,

Guido
 

Anubhav Gupta

unread,
Mar 20, 2019, 9:42:44 AM3/20/19
to Django users
Hi,

My use case is exactly similar to this with one additional condition that I want this column to serve as Primary Key of my table. Now when I use the following code:

class BinaryTable(models.Model):
    binary_hash = models.BinaryField(primary_key=True)
    comments = models.CharField(max_length=70)

    class Meta:
        db_table = 'binary_table'
       
I get the error code (1170, "BLOB/TEXT column 'binary_hash' used in key specification without a key length")
The behaviour is same if I specify unique=True instead of primary_key=True.

Now, one workaround is that from SHA256Field we return "BINARY(32) PRIMARY KEY"
The problem with this is that django sees no primary_key=True column while parsing the model declaration and assigns an automatic primary key and then raises the error that there are 2 primary keys. Any ideas how I can achieve primary key constraint on this custom field?
Reply all
Reply to author
Forward
0 new messages