slow migrations

665 views
Skip to first unread message

Peter Baumgartner

unread,
Jan 7, 2016, 2:07:24 PM1/7/16
to django-d...@googlegroups.com
Hi there, I know this has been discussed in the past, but I'm not sure
where things stand today regarding the the speed of migrations. On
non-trivial projects, it's common to hear about migrations that take
5-10 minutes to run. I've even heard reports about them taking over an
hour.

I know about a couple issues [1] that have been closed around
optimizing the process, but it still feels like there's a serious
issue here. The project I'm working on has ~80 migrations and ~300
tables. It uses Django 1.8 and I backported the 1.9 optimization [2]
to test it. It's still ~5 minutes to run a makemigrations or migrate
command. For those coming from South and previous Django versions,
this is a major degradation. It makes development pretty painful and
regular testing untenable.

Here is the cProfile of a makemigrations command:
https://gist.github.com/ipmb/e65159f7315b9645841b

Is there an issue open for this at the moment? Is there any other info
I can provide to help?

Thanks!

[1] https://code.djangoproject.com/ticket/23745
https://code.djangoproject.com/ticket/24743
[2] https://github.com/django/django/commit/5aa55038ca9ac44b440b56d1fc4e79c876e51393

-- Pete

Tim Graham

unread,
Jan 7, 2016, 2:20:16 PM1/7/16
to Django developers (Contributions to Django itself)
As far as I know, no one is actively working on this area at the moment.

There is at least one open issue:
https://code.djangoproject.com/ticket/22608

You can look through all open migrations tickets too:
https://code.djangoproject.com/query?status=assigned&status=new&component=Migrations&stage=Accepted&col=id&col=summary&col=status&col=owner&col=type&col=version&desc=1&order=id

Florian Apolloner

unread,
Jan 7, 2016, 3:10:26 PM1/7/16
to Django developers (Contributions to Django itself)
Just looking at the topmost three entries:
_expire_cache spends 18 second in delattr, can you try replacing that with del self.__dict__[cache_key]. Maybe also try moving self.__dict__ in a local variable since it will get referenced a lot
subclass_exception seems awfully slow, maybe there are some creative ways to speed it up?
It would also be interesting to know where the entries from collections.py come and replace it with more efficient structures.

The rest is as Tim said.

Cheers,
Florian

Florian Apolloner

unread,
Jan 7, 2016, 3:11:12 PM1/7/16
to Django developers (Contributions to Django itself)
Also, is there any chance that I can get access to this project to profile a little bit more and identify some hotspots?

Peter Baumgartner

unread,
Jan 7, 2016, 3:49:11 PM1/7/16
to django-d...@googlegroups.com
It looks like there are some other fixes in 1.9 that weren't covered
by my monkeypatch. I upgraded the project and included a new cProfile
in the ticket https://code.djangoproject.com/ticket/22608#comment:23.
It shaved 100s off, but still takes 2.5 minutes to create an empty
migration.

Florian, I'll contact you off-list about the project.

-- Pete
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/7b687031-050f-4de3-854c-afd6cbd00ca5%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Florian Apolloner

unread,
Jan 7, 2016, 4:36:45 PM1/7/16
to Django developers (Contributions to Django itself)

Okay, my base time is currently 95 seconds, where we are spending >60 seconds in ModelBase.__new__. I've attached a picture of the profile run, there are not really any obvious spots, so shaving time off there will be hard I fear.

Cheers,
Florian
Screenshot from 2016-01-07 22-31-45.png

Aymeric Augustin

unread,
Jan 7, 2016, 5:04:38 PM1/7/16
to django-d...@googlegroups.com
As far as I understand, the CPU cost comes from generating a full set of model classes for each step of the migration history. That’s consistent with the profile sent by Florian.

I usually end up throwing away the migration history and regenerating a new set of migrations when I get to that point. This requires truncating the django_migrations table manually and faking the new set of migrations.

If the project doesn’t use data migrations, squashmigrations may achieve the same effect. Sadly real-life projects tend to have data migrations whose only purpose is to run once in production. They prevent full squashing.

-- 
Aymeric.


For more options, visit https://groups.google.com/d/optout.
<Screenshot from 2016-01-07 22-31-45.png>

Andrew Godwin

unread,
Jan 7, 2016, 5:08:46 PM1/7/16
to django-d...@googlegroups.com
Yes, it's basically a fundamental design flaw of having the migrations represented this way - it makes autodetection and code generation very accurate, but at the expense of calculation time. There is some optimisation work that can be done to try and avoid building intermediate states, but it's a pretty complex code optimisation problem.

I think it's an acceptable flaw providing you take the time to squash every so often, and I prefer it to South's flaw of missing dependencies out and making tables incorrectly sometimes, but I can understand that it's kind of ridiculously slow sometimes.

Andrew

Carl Meyer

unread,
Jan 7, 2016, 5:12:28 PM1/7/16
to django-d...@googlegroups.com
On 01/07/2016 03:03 PM, Aymeric Augustin wrote:
> As far as I understand, the CPU cost comes from generating a full set of
> model classes for each step of the migration history. That’s consistent
> with the profile sent by Florian.
>
> I usually end up throwing away the migration history and regenerating a
> new set of migrations when I get to that point. This requires truncating
> the django_migrations table manually and faking the new set of migrations.
>
> If the project doesn’t use data migrations, squashmigrations may achieve
> the same effect. Sadly real-life projects tend to have data migrations
> whose only purpose is to run once in production. They prevent full
> squashing.

FWIW, I've also done a hybrid of these two options, where I generate
fresh initial migrations rather than actually using squashmigrations
(for the same reason, to avoid problems with data migrations), but then
I still keep the old migrations around for a transition period and use
the `replaces` attribute (the same one added automatically by
`squashmigrations`) on the new initial migrations. Then later (once the
new migrations are deployed everywhere) delete the old migrations and
the `replaces` attr.

Effectively this is similar to what you're doing, it just takes
advantage of the `replaces` feature to avoid manually fiddling with the
migrations table on each deployment.

If I (or anyone else) ever gets around to it, I think
https://code.djangoproject.com/ticket/24109 could make the actual
squashmigrations command usable for real projects, by letting you just
mark certain migrations as "please ignore when squashing".

Carl

signature.asc

Peter Baumgartner

unread,
Jan 8, 2016, 12:51:13 AM1/8/16
to django-d...@googlegroups.com
Reporting back on some additional findings for what it's worth.
SmileyChris dumped and recreated the project in question's migrations,
manually ordering them to minimize dependencies. It was not a major
reduction in total migrations: 82 to 58 (mostly third-party migrations
and initials), but the time to create an empty migration dropped
massively. It went from 260s on 1.8.6 (160s on 1.9.1) to 2s on both
versions.

I'm not sure what to make of it, but the slowness I was seeing doesn't
seem to be as much related to number of migrations, but perhaps the
contents of the migrations files. Is it possible that having lots of
cross-app migration dependencies could be the cause?

Thanks as always!

-- Pete
> --
> You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/568EE2B2.1040501%40oddbird.net.

Markus Holtermann

unread,
Jan 8, 2016, 6:06:10 AM1/8/16
to django-d...@googlegroups.com
Yes, the more relationships your models have the more time expensive migrations become. Adding/Altering/Removing a field (ForeignKey or not) always requires all related models to be recreated. See https://github.com/django/django/blob/master/django/db/migrations/state.py#L53 for the reasoning behind that.

I have an idea / PoC of how to speed up that model reloading but I'm not quite there yet. I hope to have time to investigate and try out my idea over the next months.

/Markus

Hugo Osvaldo Barrera

unread,
Jan 8, 2016, 9:56:30 PM1/8/16
to django-d...@googlegroups.com
I do the exact same thing (it's been kind of "manually squashing
migrations" for me).
I'm wondering if a PR with a new command to do this would be accepted
(resetmigrations?)

In my case, once data migrations have run in staging/production, they're
useless and can be ignored forever, so there's no point in keeping them
in later squashed migrations months later.

--
Hugo Osvaldo Barrera

Shai Berger

unread,
Jan 9, 2016, 11:30:01 AM1/9/16
to django-d...@googlegroups.com
On Saturday 09 January 2016 04:56:11 'Hugo Osvaldo Barrera' via Django
developers (Contributions to Django itself) wrote:
>
> In my case, once data migrations have run in staging/production, they're
> useless and can be ignored forever, so there's no point in keeping them
> in later squashed migrations months later.
>

There are two kinds of data migrations, generally speaking.

One is the kind that is needed to facilitate some schema migrations. Changing
the type of some field, for example, usually involves creating the new field
(schema migration), transforming the data from the old field to the new (data),
and then removing the old field (and perhaps some renaming; schema). This kind
of migrations, indeed, can be just removed when squashing.

The other is migrations which *create* data -- fill in tables for database-
implemented-enums, for example. If you remove these, you are going to break
your tests (if you do this and haven't broken your tests, your tests are
missing -- I'd go as far as calling them broken).

The second kind is quite common. Having a built-in command that resets
migrations and ignores them is, IMO, encouragement to skip testing, and I
think we shouldn't do that. Some notion (as Carl, I think, mentioned) of
"squashable data migrations" -- essentially, telling the two kinds apart --
would be helpful. but not solve the problem completely, because, ultimately,
the second kind exists. We need to figure out how to help users deal with them.

Shai.

charettes

unread,
Jan 9, 2016, 2:05:43 PM1/9/16
to Django developers (Contributions to Django itself)
Shai, I think I have a viable solution for the the second kind of data
migration your are mentioning.

https://code.djangoproject.com/ticket/26064#ticket

Simon

Carl Meyer

unread,
Jan 10, 2016, 1:10:05 AM1/10/16
to django-d...@googlegroups.com
On 01/09/2016 09:29 AM, Shai Berger wrote:
> There are two kinds of data migrations, generally speaking.
>
> One is the kind that is needed to facilitate some schema migrations. Changing
> the type of some field, for example, usually involves creating the new field
> (schema migration), transforming the data from the old field to the new (data),
> and then removing the old field (and perhaps some renaming; schema). This kind
> of migrations, indeed, can be just removed when squashing.
>
> The other is migrations which *create* data -- fill in tables for database-
> implemented-enums, for example. If you remove these, you are going to break
> your tests (if you do this and haven't broken your tests, your tests are
> missing -- I'd go as far as calling them broken).

I consider this second kind of data migration to be marginally smelly,
and avoid it whenever possible. If an enum's possible values will change
infrequently, I'd much rather have it in code than in the database
(that's almost certainly more efficient, too). If its values will change
frequently, then it's less likely that I have a strong "default set",
and I'd be more likely to leave the creation of initial values to be an
explicit part of deployment setup (and have the tests also explicitly
create the values they need), rather than use a data migration to
enforce a particular initial set.

That said, I do recognize that people use data migrations to create
initial data, and that there are cases where it makes sense.

> The second kind is quite common. Having a built-in command that resets
> migrations and ignores them is, IMO, encouragement to skip testing, and I
> think we shouldn't do that.

I agree that a "resetmigrations" command is dangerous. For my own usage
I'd be more worried about losing custom SQL schema or indexes added via
RunSQL than I would about initial-data migrations, but either way it's
certainly potentially lossy. I'd only be -0 on adding it, but it would
need to come with strong warnings in the documentation.

> Some notion (as Carl, I think, mentioned) of
> "squashable data migrations" -- essentially, telling the two kinds apart --
> would be helpful. but not solve the problem completely, because, ultimately,
> the second kind exists. We need to figure out how to help users deal with them.

Hmm, I'm not sure if there is any general way to "help users deal with"
non-squashable RunPython/RunSQL migrations -- did you have any
particular ideas in mind? If they aren't squashable, they aren't
squashable, and I don't see what Django can do about that that doesn't
require the user to think carefully about their actual migration set and
manually modify migrations. You can't even do something like move
initial-data migrations to the end of the set to allow optimizing away
more schema alterations, because the migration may create the initial
data in a way that only works at that particular moment in schema history.

I think we'll be doing pretty well (it would certainly make a big
difference for my projects) if we can just allow users to identify
migrations that can be ignored when squashing.

Carl

signature.asc

René Fleschenberg

unread,
Jan 10, 2016, 1:50:42 PM1/10/16
to django-d...@googlegroups.com
Hi,

On Saturday 09 January 2016 18:29:32 Shai Berger wrote:
> The other is migrations which *create* data -- fill in tables for
> database- implemented-enums, for example. If you remove these, you are
> going to break your tests (if you do this and haven't broken your tests,
> your tests are missing -- I'd go as far as calling them broken).

I agree with Carl. I avoid Django migrations for this kind of data.

Obviously, you don't get dependency handling that way, but I think this is
typically not a problem for this kind of data.

I wonder if this approach should actually be an official recommendation.

--
René Fleschenberg

Shai Berger

unread,
Jan 10, 2016, 3:07:45 PM1/10/16
to django-d...@googlegroups.com
On Sunday 10 January 2016 08:09:33 Carl Meyer wrote:
> On 01/09/2016 09:29 AM, Shai Berger wrote:
> > There are two kinds of data migrations, generally speaking.
> >
> > [...]
> >
> > The other is migrations which *create* data -- fill in tables for
> > database- implemented-enums, for example. If you remove these, you are
> > going to break your tests (if you do this and haven't broken your tests,
> > your tests are missing -- I'd go as far as calling them broken).
>
> I consider this second kind of data migration to be marginally smelly,
> and avoid it whenever possible.

It should be noted that this kind of migration is what we recommend officially
as a substitute for initial_data fixtures:

https://docs.djangoproject.com/en/1.8/howto/initial-data/#automatically-loading-initial-data-fixtures

Shai.

Venelin Stoykov

unread,
Feb 7, 2017, 6:09:25 PM2/7/17
to Django developers (Contributions to Django itself)
Hello all,

I had the same problem with Django 1.8 project with 76 applications and 229 migrations in total. Even when the project was fully migrated and you execute manage.py migrate it takes 90 seconds after the message "No migrations to apply." to finish.

If I upgrade to Django 1.9 then the time is about 40 sec (reduced by half but still a lot).

I looked at migrate command and  put some prints to see from where comes the slowness. I saw that it is when Django check if there are changes in models which are not reflected in migrations (Django is doing this not only when you run makemigrations but even when you are running migrate).

You can workaround this by running: manage.py migrate --verbosity=0

But when I'm developing I want some verbosity and I started investigation.

I patched django.db.migrations.graph.MigrationGraph.make_state to print the time needed for every migration to make the state. Then I saw that the first 53 migrations took under 1ms but all next needed more than 100ms per migration (some even more than 3sec). The time for individual migration in Django 1.9 was about twice as fast compared to Django 1.8. But still the problem started from the same migration.

Then I deleted all migrations for the problematic app and created new initial migration. Now when I see the timing for making the state from migrations I saw that now 122 migrations took under 1ms (even migrations that before took more than 1 sec).

Then I repeated the procedure of "squashing" migrations (actually deleting and recreating initial migration) for slow apps (4 out of 76 apps in total) and now the state from all my 209 migrations is ready for about 1 ms (even on Django 1.8).

In the problematic migrations I sow in operations migrations.RenameModel. Probably some in rename procedure or I don't know is preventing next migrations to create state faster.

I still have data migrations (migrations.RunPython) and they do not slow down making the state from migrations.

The project is too big and depends on many 3th party apps and is not easy task to migrate it to Django 1.10 or even 1.11 to test if it will be fast without modifying the migrations and I don't have enough spare time to do it.

I hope that this information can help for further investigation of the slowness of making the model state from migrations.

Regards,
Venelin Stoykov

charettes

unread,
Feb 7, 2017, 6:22:24 PM2/7/17
to Django developers (Contributions to Django itself)
Hello Venelin,

Thanks for thanking the time to investigate the source of the slowdown you were experiencing on your project.

I know you mentioned it would require a lot of effort to port your project to Django 1.10 but I'm pretty sure recent changes to RenameModel.state_forward would solve the remaining issues you pointed to.

Django 1.10 avoids rendering model states if not absolutely required (which is never the case when running makemigrations) and Django 1.11 completely stopped rendering models in RenameModel.state_forwards[1]

Simon

Reply all
Reply to author
Forward
0 new messages