Adding ability to choose AutoField type (signed vs unsigned)

369 views
Skip to first unread message

Caio Ariede

unread,
Apr 6, 2020, 10:38:23 AM4/6/20
to django-d...@googlegroups.com
Hi folks,

I’ve been working on ticket #56 "Primary key columns should be UNSIGNED"  (really old) for a while now. It’s cumbersome and hard to fix for some reasons, including backwards compatibility, nothing that a single PR can solve but I’ll try to summarize where things are at this moment. Hopefully we can get to a consensus so I can continue.

The problem: Django has been storing primary keys as signed integers since the beginning and this is considered a waste of resources. Very few people seem to use negative values as primary keys, apparently.

The desired solution: We want to change that in a backwards-compatible way to not cause migrations to happen in all projects using Django, in all of a sudden. This is where things start to get difficult.

These are the links for the related ticket and PR up to this point:



While I was working on PR 11900, I stumbled across this other PR 8924 "BigAutoField as new default in 2.0”, and a particular comment from Nick Pope got my attention:


The idea o adding a new DEFAULT_AUTOFIELD setting made a lot of sense to me, as we would be able to keep things backwards compatible.

My first reaction after following that discussion, was to add a missing part that would likely help, which was the PositiveBigIntegerField. This is already fixed and merged:



Now that we have PositiveBigIntegerField merged in, we can also have PositiveBigAutoField and finally move forward with ticket #56. But we’d still need the ability to set a different default Auto Field.

To achieve this, I’ve created a new ticket and Mariusz Felisiak suggested to bring this discussion to the developers list, whether or not we should move forward with this approach:


I want to hear from you folks, whether or not we can get to a consensus on adding this new setting but also, hear any concerns or issues you can anticipate.

Ideally, this new setting would default to AutoField until Django 4? After that we can make it default to PositiveBigAutoField.

We could (or should) also change the project template to set DEFAULT_AUTOFIELD to PositiveBigAutoField for new projecst, as suggested in Nick Pope's comment above.


Thank you

--
Caio



Adam Johnson

unread,
Apr 6, 2020, 11:11:44 AM4/6/20
to django-d...@googlegroups.com
I'm in favour of the move. The setting strategy seems appropriate.

Simon's comment on the PR that this affects a small % of projects is true. But if your project does reach that scale, you're probably one of the bigger Django organizations in the world - hopefully key advocates for the framework. It's also possible to hit this on smaller projects with tables that import a lot of new rows, such as rolling time series data imports.

Also, this problem is *highly* asymmetric. Unnecessarily using big PK fields wastes a little storage space - unlikely to be noticed. Migrating to big PK fields can be a massive task (and can feel like "Django let you down here with bad defaults") ("Ruby on Rails does it right!").

It will need a careful migration guide though. Pre-existing projects may need to migrate one table at a time to reduce downtime/load, or keep the setting to the smaller auto field forever. I'm happy to help with this.

One thing we *could* add is a deploy time system check (or a management command, or something), to check what % of your tables' autofields' PK values have been "used up." This allows larger projects to prioritize their migrations.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/8E947B86-2228-4DD3-9D6A-C1B6D757652E%40gmail.com.


--
Adam

Jure Erznožnik

unread,
Apr 6, 2020, 11:46:33 AM4/6/20
to django-d...@googlegroups.com

Sorry if I understand stuff incorrectly, but:

I agree with Adam where he discusses migration issues, but I also don't see how "DEFAULT_AUTOFIELD" setting could leave tables out of the migration(s).

My understanding is that it would cause immediate re-provisioning of all the pk fields, including in the core django modules and third-party modules. Some of them might have programmed their logic such as to include some integer math around the primary keys (and now all auto primary keys might be GUID).

If I understand the above correctly, the setting would have to be per-module. Could it simply be done like this:

class PKMixin(object):
    id = models.UUIDField(primary_key=True, ....)

Naturally, then all desired models would (also) derive from this mixin. A bit of a chore, but it would not require reconfiguration of all migrations, just the ones the programmer would specify - and would only be limited to their own module.

OTOH, JUST FOR the unsigned vs signed integer PK, it should be relatively easy to modify makemigrations code detect existing migrations for the model, detect that the only difference is IntegerField vs PositiveIntegerField and in this case skip generating the migration (for migrating the field to an unsigned one). There could also be a flag to the management command specifying to force generating said migrations.

Of course, ignore if this is gibberish.

LP,
Jure

Adam Johnson

unread,
Apr 6, 2020, 12:48:10 PM4/6/20
to django-d...@googlegroups.com
Jure - yes switching the setting should generate migrations for all affected models. The migration guide would cover changing models' primary key fields to PositiveBigAutoFields one at a time, perhaps with a mixin as you've suggested. Maybe Django should provide such a mixin to ease the migration.

primary keys might be GUID

The proposal is not to move to GUID's/UUID's, as you've used in your example. It's to move to bigger integers.

UUID's are a bad idea for database performance, as they distribute randomly across the table, preventing any cache wins. On autoincrement tables, the tail end of each table is normally most frequently read and written and thus cached in memory. I don't think Django should ever suggest UUID's as primary keys.



--
Adam

Collin Anderson

unread,
Apr 9, 2020, 3:06:51 PM4/9/20
to Django developers (Contributions to Django itself)
Having a DEFAULT_AUTOFIELD setting makes sense to me.


On Monday, April 6, 2020 at 12:48:10 PM UTC-4, Adam Johnson wrote:
Jure - yes switching the setting should generate migrations for all affected models. The migration guide would cover changing models' primary key fields to PositiveBigAutoFields one at a time, perhaps with a mixin as you've suggested. Maybe Django should provide such a mixin to ease the migration.

primary keys might be GUID

The proposal is not to move to GUID's/UUID's, as you've used in your example. It's to move to bigger integers.

UUID's are a bad idea for database performance, as they distribute randomly across the table, preventing any cache wins. On autoincrement tables, the tail end of each table is normally most frequently read and written and thus cached in memory. I don't think Django should ever suggest UUID's as primary keys.

To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.


--
Adam

Tim Graham

unread,
Apr 9, 2020, 6:10:13 PM4/9/20
to Django developers (Contributions to Django itself)
How would the strategy of having a setting generate new migrations work with third-party apps? Generally migrations can't easily be added to them by user projects and if more migrations are added to the third party app in a later version, there will be an inconsistent history.

Nick Pope

unread,
Apr 10, 2020, 3:29:28 AM4/10/20
to Django developers (Contributions to Django itself)
Ah. I hadn't thought about that - only got as far as being able to define a new default value in DEFAULT_AUTOFIELD in the start project template so that existing projects are not suddenly forced to migrate.

An alternative is to have something on the AppConfig. I'm sure for most people the large tables that may need this will be in their own, rather than third-party, apps. People can also choose to set this for a third-party app by subclassing the AppConfig, but as you say, they'd then be forced to handle migrations manually - is this even avoidable? I'm not sure how this would look for moving to a new default though.

Adam Johnson

unread,
Apr 10, 2020, 5:10:28 AM4/10/20
to django-d...@googlegroups.com
Ah yes third party apps would be tricky.

An AppConfig setting would make sense, although I don't believe there's any precedent. It could also be done by making DEFAULT_AUTOFIELD take a dictionary mapping app labels to field names.

On Fri, 10 Apr 2020 at 08:29, Nick Pope <nickpo...@gmail.com> wrote:
Ah. I hadn't thought about that - only got as far as being able to define a new default value in DEFAULT_AUTOFIELD in the start project template so that existing projects are not suddenly forced to migrate.

An alternative is to have something on the AppConfig. I'm sure for most people the large tables that may need this will be in their own, rather than third-party, apps. People can also choose to set this for a third-party app by subclassing the AppConfig, but as you say, they'd then be forced to handle migrations manually - is this even avoidable? I'm not sure how this would look for moving to a new default though.

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/dfb45121-8ce6-4d6b-8505-5831778e4c3f%40googlegroups.com.


--
Adam

Caio Ariede

unread,
Apr 10, 2020, 6:21:52 PM4/10/20
to django-d...@googlegroups.com
I believe that it is already possible to trigger new migrations in a third-party app by changing its AppConfig.label, for example. Please correct me if I’m wrong.

From that perspective, it think it wouldn’t be wrong if a migration is created after someone changes its AppConfig’s default_auto_field. The extra migrations could be handled manually using MIGRATION_MODULES.

I feel like adding such option to AppConfig would give a pretty good flexibility, but I’m not sure it dismisses the use of settings.DEFAULT_AUTO_FIELD. Specially if one desires to keep an old default behavior.

--
Caio


Jure Erznožnik

unread,
Apr 12, 2020, 2:02:00 PM4/12/20
to django-d...@googlegroups.com
I would like to try again:
I believe having a general setting for autofields could cause all
sorts of issues. As illustrated, I immediately thought that some
people might want to use that for migrating to UUID fields for that (I
read about using them somewhere about when I started using Django 5
years ago). The point is that an option like that immediately sparks
imagination.

However, the problem being attempted at here is only migrating from
signed to unsigned integer. A very minor change most deployments would
not even be affected by.

That one could be handled less aggressively by:

>>OTOH, JUST FOR the unsigned vs signed integer PK, it should be relatively easy to modify makemigrations code detect existing migrations for the model, detect that the only difference is IntegerField vs PositiveIntegerField and in this case skip generating the migration (for migrating the field to an unsigned one). There could also be a flag to the management command specifying to force generating said migrations.

I believe this would be unintrusive enough allowing everyone to
migrate their models at will. Since most of users won't even notice
the difference, a solution like this could be preferable even if it
introduces one additional special case in the code.
OTOH, the proposed setting DOES provide more flexibility (to introduce
more varied options for the PK field), but a necessity for a
biginteger or guid or anything else, for that matter, is already
catered for in the API - by specifying the super-special pk manually
for the affected table.

I sure hope I'm not just derailing the discussion with my uninformed
ideas, but perhaps the minor change could be handled in a simpler
manner like proposed above.

LP,
Jure
> --
> You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3FE8A93E-1A7F-436D-87CC-3D87A6C16801%40gmail.com.

Caio Ariede

unread,
Apr 6, 2021, 8:12:15 AM4/6/21
to django-d...@googlegroups.com
Hello folks,

Now that we’ve completed these:

https://code.djangoproject.com/ticket/31007
https://code.djangoproject.com/ticket/30987

I’m wondering if https://code.djangoproject.com/ticket/56 is still something we want to solve.

I feel like the workaround would be to explicitly define ``id = models.PositiveBigIntegerField(primary_key=True)``

In the other hand, we could also add ``PositiveBigAutoField`` to make things easier now that we have the ability to define the ``default_auto_field`` per app.

Thoughts?

— Caio

אורי

unread,
Apr 6, 2021, 8:39:11 AM4/6/21
to Django developers (Contributions to Django itself)
Hi,

I think any primary key field which is an integer should be 64 bits unsigned. 32 bits integers belong to the history, and there is no need for negative primary keys. It's like when they defined IPv4, they defined it with 32 bits and now we are stuck forever with 32 bits IP addresses. You can never know when a model will require more than 2**31 records and I don't see any use in using 32 bits in 2021.

By the way, if we use 64 bits - it doesn't matter that much if the integer is signed or unsigned - I don't think we need the 64th bit anyway.

Uri.


Tom Forbes

unread,
Apr 6, 2021, 11:44:06 AM4/6/21
to django-d...@googlegroups.com
I don't think we need this anymore, at least not by default. The default 64 bit range is probably enough for the time being. We could switch the default to be `PositiveBigIntegerField` instead of `BigIntegerField` but I'm not sure if that's sensible.

An explicit `id = PositiveBigIntegerField(...)` workaround would be fine to be honest, for those that need it.

Adam Johnson

unread,
Apr 6, 2021, 1:08:06 PM4/6/21
to django-d...@googlegroups.com
I also don't think this is necessary any more and can be closed.
 
An explicit `id = PositiveBigIntegerField(...)` workaround would be fine to be honest, for those that need it.

I also would like to meet the django app that *does* need it (for non-silly reasons like deciding to start ID's near 2^63).



--
Adam

Florian Apolloner

unread,
Apr 7, 2021, 3:29:01 AM4/7/21
to Django developers (Contributions to Django itself)
On Tuesday, April 6, 2021 at 7:08:06 PM UTC+2 Adam Johnson wrote:
I also don't think this is necessary any more and can be closed.
 
An explicit `id = PositiveBigIntegerField(...)` workaround would be fine to be honest, for those that need it.

I also would like to meet the django app that *does* need it (for non-silly reasons like deciding to start ID's near 2^63).

Since you have asked: It is required if you want to store a x509 certificate serial number as is. We indeed get queries from integrators that have previously stored serial numbers in a big integer field. After the "recent"  cert library updates which now properly produce 64 bit positive integers like the CAB forum requires, they ran into plenty of problems with the first certificate serial that was larger than 2^63.

Map that to Django and I think changing to PositiveBigIntegerField is a fair workaround. Not sure if it is worth changing the default, but the requirements do exist for some people.

Cheers,
Florian

Adam Johnson

unread,
Apr 8, 2021, 7:00:45 AM4/8/21
to django-d...@googlegroups.com
Okay fair enough, it wouldn't be much work to ship a PositiveBigAutoField class.


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.


--
Adam
Reply all
Reply to author
Forward
0 new messages