Integrating migrations with the check framework

190 views
Skip to first unread message

Tom Forbes

unread,
Dec 22, 2020, 5:59:08 PM12/22/20
to Django-d...@googlegroups.com
Hey,
A fairly common problem with large and/or highly trafficked web applications is migration safety: you want your migrations to be non-locking and instantaneous. The specifics are very database dependent but in Postgres simply altering the nullability of a column can incur serious downtime depending on the size of the table in question. The same thing goes for adding a new non-nullable column to a table or even just creating an index without the `CONCURRENTLY` option.

Code review is one way of catching these issues but an automated solution that runs as part of the test suite would be the preferred solution. I don’t think using some kind of linter for this is a good idea as it doesn’t have the context to decide if it _is_ a safe migration (i.e removing `NOT NULL` from a column in the same migration that creates the column is safe). A linter also doesn’t know which migrations are pending and which have been applied.

Rails has the pretty nice strong_migrations Gem (https://github.com/ankane/strong_migrations) which attempts to detect dangerous migrations. Unless I’m missing something right now it seems quite hard to add automated checks like this to Django migrations that can inspect the full set of “pending migration operations". We have the `pre_migrate` signal that takes a “plan” argument with the scary sounding caveat that the passed object has no public API, but using a signal feels kind of wrong. It should be part of a system check - i.e a warning that says “your app may die if this migration is applied”.

I’d like to make this simpler to implement on a per-project basis (or in a third party app) but I’m not sure how. It should be a system check but the only suitable hook we have is the generic `register()` callback that takes a set of `app_configs`, and I’d expect going from that to a pending set of operations would be non trivial. A user-facing interface for this might be a function that is called for every specific type of migration operation with the operation itself and a list of all operations that would be applied of “migrate” was run.

```
@some_migration_callback_decorator(AlterField)
def check_null(operation, pending_operations):
    If operation.is_going_to_not_null:
        if not table_being_created(operation.table_name, pending_operations):
            return Warning(f’Setting {operation.field_name} to NOT NULL may cause downtime’)
```

I’m not sure if Django should ship migration warnings itself but having the ability for projects to enforce arbitrary constraints on their migrations might be an interesting feature?

Slightly related to https://code.djangoproject.com/ticket/31700, where we want to add more contextual/coloured outputs for dangerous operations.

Tom

Adam Johnson

unread,
Dec 23, 2020, 4:18:48 AM12/23/20
to django-d...@googlegroups.com
Hi Tom,

I love this idea and would like to help developers write safe migrations too.

You didn't mention this pre-existing project: https://github.com/3YOURMIND/django-migration-linter . It might be worth checking how that works.

I'm not sure what any public API could look like here. In django-linear-migrations ( https://github.com/adamchainz/django-linear-migrations ) I tapped a bit into the MigrationLoader class, which is undocumented but fairly stable. Perhaps we could just document a few more of the migration internals?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJNoi583wUoQp%2BRBB_%2B3i_vqSPRbVNwhiU_OG7V2eL8x3g%40mail.gmail.com.


--
Adam

Paveł Tyślacki

unread,
Dec 23, 2020, 11:01:45 AM12/23/20
to django-d...@googlegroups.com
Want to share lib I work previously for migration safety with postgres some time ago: https://github.com/tbicr/django-pg-zero-downtime-migrations there are also checker and replacement of unsafe mirations with safe replacement.

ср, 23 дек. 2020 г. в 12:18, Adam Johnson <m...@adamj.eu>:

Ryan Hiebert

unread,
Dec 23, 2020, 12:33:03 PM12/23/20
to django-d...@googlegroups.com
This is an interesting discussion that is separate from, but related to, django-safemigrate, which we use to separate which migrations may run before or after deployment. I intend follow along with this discussion, to see if there are reasonable ways to identify when these zero-downtime operations should be run compared with deployments.


Tom Forbes

unread,
Dec 23, 2020, 1:02:55 PM12/23/20
to django-d...@googlegroups.com
Thanks for all the input here!

I didn’t know about django-migration-linter, and it does seem quite interesting. It works by looking at the complete set of SQL statements that would be run, and running regular expressions/string comparisons on it (https://github.com/3YOURMIND/django-migration-linter/blob/master/django_migration_linter/sql_analyser/base.py#L59-L62). I don’t hate this idea, but I think that having direct access to the migration objects themselves could greatly simplify the implementation.

Django-pg-zero-downtime-migrations also looks interesting and is more in-line with what I was imagining (https://github.com/tbicr/django-pg-zero-downtime-migrations/blob/master/django_zero_downtime_migrations/backends/postgres/schema.py) - it overrides the schema editor to detect unsafe migrations, and provides a nice way to abort migrations that take locks for too long. However I feel that the implementation could be greatly simplified with some support for Django - you shouldn’t really need to override the schema editor, Django should pass you a list of operations and ask “is this safe?”. The results can be reported via the checks framework which allows application owners can decide to treat these warnings as errors or silence them as needed. It also brings those warnings front-and-centre while developing, rather than having it fail when running “migrate” later on in the deployment process.

Django-safemigrate is very related and something I wanted to cover in a later message. I feel that some of this functionality could (or even should) be integrated with Django, as migrations in “real” apps are usually a two step process: you run the schema migrations, which might add a nullable column, deploy your app, then run data migrations to populate the column. At least bringing the concept of a pre-deploy and a post-deploy migration to Django would be pretty useful I think.

I’d say the open questions so far are:
  1. Does it make sense to add a specific system check registration function for migrations? If not, how do we expose this to users wishing to write system checks?
  2. Are we comfortable with creating a public API for some of the migrations internals?
  3. Should we (or can we?) make the distinction between pre-deploy and post-deploy migrations?

Another consideration with all this is that for a non-trivial amount of applications all this doesn’t really matter. If you’ve got an internal app that is used during business hours almost none of this applies - you can get away with downtime and “dangerous” migrations, and similarly if you are starting a new project you don’t want Django to start complaining about adding a non-null field to a table that has no rows in a project that isn’t even deployed yet. This is why I think the MVP of this would just be to allow users to write their own business-specific checks for migrations and allow third party apps to more easily integrate with the migration framework, rather than anything more complicated.

Tom

David Wobrock

unread,
Dec 24, 2020, 11:07:01 AM12/24/20
to django-d...@googlegroups.com

Hi everyone,

I agree that there is room for improvement around migrations in Django with respect to the expected downtime/table locking. But I don’t know whether this should in Django itself or handled by a 3rd party library.

I’m maintaining the django-migration-linter (the very idea was to have an automated tool since reviewing migrations is not something new Django devs can do easily), which currently tackles another issue than the one you mention. It tries, in a best effort manner as you’ve seen through the SQL checks, to understand if the migration is introducing a backward incompatible change to your schema, which will desynchronize your code and database schema (so you’ll get errors when you deploy one without the other – and you’ll have a hard time doing a rollback).

It’s a different aspect, compared to checking that the forward migration will be instant and non-locking. However, it is planned to add this to the linter too (https://github.com/3YOURMIND/django-migration-linter/issues/99).

You’ve noticed that it’s based on the SQL strings, which has advantages and drawbacks. The checks are easy to implement (just a set of rules basically), it could be completely Django-agnostic and it can lint RunSQL operations. But it’s a bit harder to represent a state when multiple SQL statements are executed, to check if they cancel each other out for instance.

 

An idea about something that could be in Django would be, somehow like the “database features”, marking for each migration operation (AlterField, AddField, …) on each database backend, if they can potentially mean downtime. It would be a hint about the risk of the migration, and display a warning if necessary.
Going one step further, for some operations, Django could query the number of rows of the corresponding table and have some risk heuristics. Basically: “you’re adding a column to a table with less than 100 rows, this should not be risky” or “you’re adding a column to a table with millions of rows, be careful!”.
However, as these heuristics would be rather arbitrary, I’m not sure that the Django project would want to make this kind of choices – so it might be a better fit for a library. (what I’m describing is more or less the ultimate goal the django-migration-linter)

Hope this gives some insights about the project and how it could help on this topic :)

Cheers,
David

Paveł Tyślacki

unread,
Dec 25, 2020, 3:27:59 PM12/25/20
to django-d...@googlegroups.com
A few thoughts: migration downtime depends from different points:

- changes itself (existence of exclusive locks in migration especially with other changes in same transaction, luck of backward incompatible migrations between current and new state)
- amount of user data
- load to application
- number of replicas and application instances runned (you probably cannot apply migrations and update code on all of instances simultaneously, eg. this process can be asynchronous)

There also can be different tradeoffs trying to solve different issues on the way to avoid downtime.

Anyway strong apis to detect potential bad cases and replace current migrations sounds like a good idea.

Paveł

чт, 24 дек. 2020 г. в 19:06, David Wobrock <david....@gmail.com>:
Reply all
Reply to author
Forward
0 new messages