Defaulting to use BigAutoField in models

706 views
Skip to first unread message

Tom Forbes

unread,
Jun 11, 2020, 11:28:59 AM6/11/20
to django-d...@googlegroups.com
I’d like to re-propose switching Django to use BigAutoField’s rather than the current AutoField. This has been proposed[1] before (and a MR made[2]) but it was closed due to implementation issues and not much else has happened since then.

As many of you are aware the max value a standard AutoField can hold is 2,147,483,647 (2.1 billion) which sounds like more than you can ever need. But it’s often not, and you only find out at the worst possible time - out of the blue at night and during a period of rapid growth. The process for fixing this issue also becomes a lot harder as your data grows - when you’ve hit the limit you’re looking at a multi-hour `ALTER TABLE` on Postgres during which writes and reads are blocked, or a risky operation to create a new table with the correct primary key and copy old data over in batches. Basically if you’ve experienced this before you wouldn’t wish it on your worst enemy.

I’m proposing that we add a `MODELS_PRIMARY_KEY` (name improvements welcome!) setting that _defaults_ to `BigAutoField`, with prominent docs/release notes saying that to preserve the existing behaviour this should be set to `AutoField`. I think this the only realistic way we can implement this for new projects in a way that ensures it will be used meaningfully and not forgotten about until it’s too late.

Rails managed to do this migration somewhat painlessly due the big differences between Rails and Django models. As Django migrations are derived from the current model state so there’s no way I can think of to express “make this auto-generated field a BigAutoField only if this model is new”. The way I see it is that a global setting is very easy to toggle and there is little chance of missing the large numbers of migrations that would be generated during the Django update. Smaller applications could apply the migrations with little issue and larger applications would be able to opt-out (as well as be reminded that this is a problem they could face!).

Some specifics:
- The setting would take any dotted path to a class, or a single class name for a build in field. This would potentially solve [3], and could be useful to people who want to default to other fields like UUIDs (or a custom BigAutoField) for whatever reason
- The setting would also be used for GenericForeignKeys, which right now are backed by a PositiveIntegerField and so suffer from the same AutoField limitations

Any thoughts on this?

Tom






Caio Ariede

unread,
Jun 11, 2020, 1:07:35 PM6/11/20
to django-d...@googlegroups.com
For the record, there was a related discussion a few months ago:

Adam Johnson

unread,
Jun 11, 2020, 2:22:47 PM6/11/20
to django-d...@googlegroups.com
Big +1 on solving this from me.

- The setting would take any dotted path to a class, or a single class name for a build in field. This would potentially solve [3], and could be useful to people who want to default to other fields like UUIDs (or a custom BigAutoField) for whatever reason

I think we should restrict the setting between normal and big auto fields only. Allowing UUID's would be changing the type, with the potential for havoc with code incompalities throughout django. It's also not possible to migrate tables over to the new type.

What do you think the solution is for third-party apps? They ship their own migrations and can't really be tied to project state.

As Django migrations are derived from the current model state so there’s no way I can think of to express “make this auto-generated field a BigAutoField only if this model is new”.

The autodetector knows if a model is new. It could be that during one version Django outputs BigAutoField for fields added in CreateModel only.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/435EC704-3EF6-4EF4-BF85-175AE29C01F5%40tomforb.es.


--
Adam

Tom Forbes

unread,
Jun 12, 2020, 6:41:19 AM6/12/20
to django-d...@googlegroups.com
I think we should restrict the setting between normal and big auto fields only. Allowing UUID's would be changing the type, with the potential for havoc with code incompalities throughout django. It's also not possible to migrate tables over to the new type.

That’s a really good point, I got a bit swept up with the idea to realise that it would break a lot of things if applied generally!

The autodetector knows if a model is new. It could be that during one version Django outputs BigAutoField for fields added in CreateModel only.

We could make every automatic PK become a BigAutoField but include some logic in the migrations framework to ignore changes between an auto_created AutoField -> BigAutoField? This would ignore all existing models in your migration history, but all completely new models would receive BigAutoField PKs from the start. Users who explicitly add a `BigAutoField` to a model with a previously created `AutoField` would see a migration created as the `auto_created` flag is not set. It would also play nicely with foreign keys which also need to be 8 bytes.

This would take care of third party apps with migrations and not require a setting but the downside is that model state will be slightly different from the database. All models would have `BigAutoField`’s whereas the database would only have an `AutoField`. This feels slightly iffy to me even though I cannot think of a case where it would have any practical effect.

Adam Johnson

unread,
Jun 12, 2020, 7:11:48 AM6/12/20
to django-d...@googlegroups.com
I haven't thought this through, but maybe it's possible to restrict the change to inside AutoField, rather than create a new field class. If its db_parameters() method / db_type() could receive enough context from the migration history, it could potentially determine its own database type. For those who want to fixed field sizes, BigAutoField and SmallAutoField could be left, and MediumAutoField added.

(Whilst poking around I think I spotted a small cleanup: https://code.djangoproject.com/ticket/31698 )



--
Adam

Tom Forbes

unread,
Jun 17, 2020, 8:52:18 AM6/17/20
to django-d...@googlegroups.com
I think we should re-open ticket 31007[1] as there was a consensus in 2017 during the previous attempt at this[2] and things have not really changed - a 32bit primary key is still not a sensible default given the calamity that can occur when you hit the end of it’s range. 

The sticky point is the implementation - how do we migrate new projects whilst minimising disruption to existing projects (who either won’t ever hit the limit, or have too much data to migrate wholesale). I think special casing the migrations framework is an avenue to explore, so I created this today and to my surprise it seems to work quite well: https://github.com/orf/django/commit/0a335f208cee1376c25eff55c6f866de122c7839.

The obvious effect of this is that for old projects the Django model thinks its primary key is 64 bit while the underlying table is only 32 bit. I can’t think of much practical impact of this approach: 
- Python 3 doesn’t distinguish between 32/64 bit numbers so code will continue to work as-is. 
- Third party migrations that ship with 32 bit columns won’t be impacted, and it’s up to them to create migrations if they want to. 
- Adding `id = BigAutoField(primary_key=True)` to a model will create an explicit AlterField migration, when projects are able to migrate.

Am I missing something?

I haven't thought this through, but maybe it's possible to restrict the change to inside AutoField, rather than create a new field class. If its db_parameters() method / db_type() could receive enough context from the migration history, it could potentially determine its own database type.

I’m really not sure how we would do that: wouldn’t it require a pretty big refactor? It would definitely be cleaner.

Adam Johnson

unread,
Jun 17, 2020, 3:56:47 PM6/17/20
to django-d...@googlegroups.com
I think special casing the migrations framework is an avenue to explore, so I created this today and to my surprise it seems to work quite well: https://github.com/orf/django/commit/0a335f208cee1376c25eff55c6f866de122c7839.

That is a pretty neat solution. I think it would work quite well.

What about a way to warn users approaching the limit? I think a database check that detects AutoFields where the next ID is >50% of the way through would be a good solution. That's the kind of monitoring I've built in the past.

(50% might sound like it's near the limit, but the table is growing with an exponential curve, even quite shallow, it's quite close in terms of time.)

I’m really not sure how we would do that: wouldn’t it require a pretty big refactor? It would definitely be cleaner.

Yeah I haven't thought it through. Given the small size of your patch and the lack of expectation for any similar changes, I think special casing the migrations framework is a fair tradeoff.



--
Adam

Caio Ariede

unread,
Jun 18, 2020, 8:25:28 AM6/18/20
to django-d...@googlegroups.com

On 17 Jun 2020, at 16:56, Adam Johnson <m...@adamj.eu> wrote:

I think special casing the migrations framework is an avenue to explore, so I created this today and to my surprise it seems to work quite well: https://github.com/orf/django/commit/0a335f208cee1376c25eff55c6f866de122c7839.

That is a pretty neat solution. I think it would work quite well.

I really like that solution as well, it’s pretty fair :)



What about a way to warn users approaching the limit? I think a database check that detects AutoFields where the next ID is >50% of the way through would be a good solution. That's the kind of monitoring I've built in the past.

(50% might sound like it's near the limit, but the table is growing with an exponential curve, even quite shallow, it's quite close in terms of time.)


I think this is indeed a good to have.

Would we be able to provide instructions on how to migrate from SIGNED INT to UNSIGNED BIG INT? Perhaps a management command that outputs SQL like sqlmigrate?


--
Caio

Adam Johnson

unread,
Jun 18, 2020, 8:40:54 AM6/18/20
to django-d...@googlegroups.com
Would we be able to provide instructions on how to migrate from SIGNED INT to UNSIGNED BIG INT? Perhaps a management command that outputs SQL like sqlmigrate?

It would only be needed to add this to the model:
id = BigAutoField() 

Then migrations would pick it up, because the field is no longer auto_created.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.


--
Adam

Caio Ariede

unread,
Jun 18, 2020, 9:51:07 AM6/18/20
to django-d...@googlegroups.com
What about third party apps?

It would be great to migrate every AutoField to BigAutoField without the need to specify them manually.


--
Caio



Adam Johnson

unread,
Jun 18, 2020, 10:15:39 AM6/18/20
to django-d...@googlegroups.com
This would make the change only affect existing installs of third party apps. I think documentation on using RunSQL() in a migration (in a project app) could be warranted.



--
Adam

charettes

unread,
Jun 18, 2020, 10:18:35 AM6/18/20
to Django developers (Contributions to Django itself)
+1 from me as well but I don't think we should add any logic to the migration framework to handle the transition.

I think the release plan should be something along the following

Phase 1:
- New projects are generated with MODEL_DEFAULT_PK = 'django.db.models.BigAutoField'
- A system check or warning is emitted when the setting is not defined and MODEL_DEFAULT_PK default to 'django.db.models.AutoField' during the transition period. The warning should warn that the value will default to 'django.db.models.AutoField' in a future version of Django.
- An upgrade path is mentioned in the docs to mention that explicit ` id = AutoField()` must be assigned to preserve previous behaviour and avoid migrations. This one of the thing projects such as django-codemod could really help with[0]

Phase 2:
- settings.MODEL_DEFAULT_PK now defaults to 'django.db.models.BigAutoField'
- System check/warning about undefined MODEL_DEFAULT_PK setting is removed.
- Let the migration framework generate migrations just like it normally would for projects that didn't follow the documented migration path.
- Optionally add a system check that warns about usage of `AutoField` because of its possible overflow.

> I think we should restrict the setting between normal and big auto fields only. Allowing UUID's would be changing the type, with the potential for havoc with code incompalities throughout django. It's also not possible to migrate tables over to the new type.

I'm not sure I agree here. For folks that are setting a up a new project starting with UUIDField primary keys can be useful and if we're adding a setting for this purpose I think we should it make it as flexible as possible.


charettes

unread,
Jun 18, 2020, 10:25:52 AM6/18/20
to Django developers (Contributions to Django itself)
Email got sent too fast

I meant

> The warning should warn that the value will default to 'django.db.models.BigAutoField' in a future version of Django.

Regarding third party apps a solution could be to allow a per-app config option or encourage them to explicitly choose which primary key they use for their model but I'm not a big fan of baking logic in the migration framework to treat some fields in a special way.

Simon

Tom Forbes

unread,
Jun 28, 2020, 9:27:28 AM6/28/20
to Django developers (Contributions to Django itself)
I spent some time last week experimenting with my patch, the biggest issue I could find was that under some situations we add explicit casts to array fields in Postgres based on the expected PK field type (i.e `WHERE [1,2,3]::int @> [1,2,3]::bigint`) which causes the query to be invalid. The casts all look redundant however working out when they are redundant is pretty complex, so I'm not sure if this approach has any legs.

> Regarding third party apps a solution could be to allow a per-app config option or encourage them to explicitly choose which primary key they use for their model but I'm not a big fan of baking logic in the migration framework to treat some fields in a special way.

Auto-generated M2M tables are an issue here, we would need to add some option to ManyToManyField() to allow passing in a PK field type. But I like the approach of just saying "if you ship migrations, be explicit about your primary keys" as it seems pretty simple.

Overall I think your two phase plan seems like a good approach 🙌 - lets do it?

Tom Forbes

unread,
Jul 12, 2020, 9:44:19 AM7/12/20
to Django developers (Contributions to Django itself)
I have an initial MR for phase 1 here: https://github.com/django/django/pull/13179. It's lacking docs but I think the rest of the points are covered.

It still needs docs, but a few observations/thoughts:

`UUIDField` is not useful at all without a `default` being set. It's likely not safe to add a `uuid.uuid4` default to the field, so documentation will have to show creating a `UUIDField` subclass. This might cause big headaches with migrations though, if the field is moved around.

M2MFields are still possibly an issue - while you might want UUIDFields for all models you also might want plain `BigAutoField's` for M2M through tables for efficiency.

This error message[1] is not helpful and is more likely to crop up now. Is it worth adding a ticket to somehow improve the message?

If I change the `MODEL_DEFAULT_PK` setting to `django.db.models.UUIDField` and run a test on Postgres I end up with a weird situation:

from django.contrib.sites.models import Site
Site._meta.pk
<django.db.models.fields.UUIDField: id>
Site.objects.values_list("pk")
<QuerySet [(1,)]>


The "UUIDField.to_python" method is also never invoked. I'm not clear if this is something to do with the Django test suite or indicative of a deeper problem.

Reply all
Reply to author
Forward
0 new messages