Hi everyone,
We're working on upgrading our product from Django 1.6 to 1.8, and one of the biggest tasks is moving from Django Evolution to Django's migrations. Please correct me if I'm missing something, but from my testing and investigation into the code, it seems that the migrate command will always apply each migration file one-at-a-time, individually, rather than pre-optimizing all the changes that need to be made across all pending migrations. This results in a lot of unnecessary, potentially expensive SQL operations. I'm wondering if there are plans to optimize this to reduce migration time.
For example, say we have a Book model. Over the course of 5 migrations, we've added 3 fields, changed one field, and removed one more. (Maybe there are other migrations for other models being applied in-between these migrations.)
Now let's say either a new database is being created for the first time, or an existing database is being upgraded from a state prior to the creation of the Book model. My expectation is that the database would only need to issue a single CREATE TABLE statement (and possibly follow-up ALTER TABLE ADD CONSTRAINTs). However, it's looking like it's instead executing a CREATE TABLE and then multiple ALTER TABLEs (even multiple ones per field within one migration).
While not too terrible for a brand new table, it is a problem for an existing table with a lot of data. If you have 5 new field changes to an existing table containing a million rows and you're running on MySQL (for instance), you'll end up with 5 ALTER TABLEs instead of one, causing MySQL to dump data/alter table/import data 5 times.
Django Evolution used to do this as well, and it made for very slow database migrations for tables with a lot of data. We had complaints of multi-day-long upgrades from some big users. Nowadays, it contains an optimizer (
https://github.com/beanbaginc/django-evolution/blob/master/django_evolution/mutators.py#L377) that first gathers all the operations that need to be performed, consolidates them whenever possible (such as rolling up any follow-up ALTER TABLEs into an earlier CREATE TABLE, filters out anything table-related for a table that will just be deleted later, and consolidates ALTER TABLEs into as few statements as possible.). This has done wonders for database migration time.
I know there's squashmigrations, but that's only really useful if you're in full control over all affected code and databases. (The documentation states, "if you’re a third-party project, just ensure your users upgrade releases in order without skipping any," which isn't feasible in shipping products.). It seems that the same optimization work could be done at application time though, couldn't it?
Are there already plans to add an optimizer for the full migration? What are the big issues blocking it?
Thanks,
Christian