Optimizing before applying migrations?

69 views
Skip to first unread message

Christian Hammond

unread,
Sep 11, 2015, 3:38:49 AM9/11/15
to Django developers (Contributions to Django itself)
Hi everyone,

We're working on upgrading our product from Django 1.6 to 1.8, and one of the biggest tasks is moving from Django Evolution to Django's migrations. Please correct me if I'm missing something, but from my testing and investigation into the code, it seems that the migrate command will always apply each migration file one-at-a-time, individually, rather than pre-optimizing all the changes that need to be made across all pending migrations. This results in a lot of unnecessary, potentially expensive SQL operations. I'm wondering if there are plans to optimize this to reduce migration time.

For example, say we have a Book model. Over the course of 5 migrations, we've added 3 fields, changed one field, and removed one more. (Maybe there are other migrations for other models being applied in-between these migrations.)

Now let's say either a new database is being created for the first time, or an existing database is being upgraded from a state prior to the creation of the Book model. My expectation is that the database would only need to issue a single CREATE TABLE statement (and possibly follow-up ALTER TABLE ADD CONSTRAINTs). However, it's looking like it's instead executing a CREATE TABLE and then multiple ALTER TABLEs (even multiple ones per field within one migration).

While not too terrible for a brand new table, it is a problem for an existing table with a lot of data. If you have 5 new field changes to an existing table containing a million rows and you're running on MySQL (for instance), you'll end up with 5 ALTER TABLEs instead of one, causing MySQL to dump data/alter table/import data 5 times.

Django Evolution used to do this as well, and it made for very slow database migrations for tables with a lot of data. We had complaints of multi-day-long upgrades from some big users. Nowadays, it contains an optimizer (https://github.com/beanbaginc/django-evolution/blob/master/django_evolution/mutators.py#L377) that first gathers all the operations that need to be performed, consolidates them whenever possible (such as rolling up any follow-up ALTER TABLEs into an earlier CREATE TABLE, filters out anything table-related for a table that will just be deleted later, and consolidates ALTER TABLEs into as few statements as possible.). This has done wonders for database migration time.

I know there's squashmigrations, but that's only really useful if you're in full control over all affected code and databases. (The documentation states, "if you’re a third-party project, just ensure your users upgrade releases in order without skipping any," which isn't feasible in shipping products.). It seems that the same optimization work could be done at application time though, couldn't it?

Are there already plans to add an optimizer for the full migration? What are the big issues blocking it?

Thanks,

Christian

Andrew Godwin

unread,
Sep 11, 2015, 11:16:48 AM9/11/15
to django-d...@googlegroups.com
Hi Christian,

Squashmigrations contains just the sort of optimiser you mention, complete with all the examples you mention (though because it's foreign-key-aware, it sometimes can't optimise everything away, as it can only see a single app).

It's not used at execution time, however, as part of the design of migrations is that everything gets executed the same everywhere; that said, it would not be infeasible to add a command line option or a third-party command that did do this, if the project was willing to potentially end up with different column orders in tables and so on. There's nothing blocking it from what I know - it shouldn't be too hard to take the loaded operation set and run the optimiser on it before it's applied.

Andrew

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/6e91e997-4e17-48fe-9ca4-3f34accfca41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages