Possible solution (or "how I'd build it today if there was no existing code in Django core"):a. Make migrations part of the project and not individual apps. This takes care of problem 3 above.
b. Prefix individual migration files with a UTC timestamp (20161105151023_add_foo) to provide a strict sorting order. This removes the depsolving requirement and takes care of 1 and 2. By eliminating those it makes 4 kind of obsolete as squashing migrations would become pointless.
c. Have reusable apps provide migration templates that Django then copies to my project when "makemigrations" is run.
d. Maintain a separate directory for each database connection.
e. Execute all migrations in alphabetical order (which means by timestamp first). When an unapplied migration is followed by an applied one, ask whether to attempt to just apply it or if the user wants to first unapply migrations that came after it. To me this would work better than 6.
Of course we do have migration support in core and it's not compatible with most of the above list. Any ideas? I think serializing the dependency solver state and reusing it between runs could be a pretty low hanging fruit (like "npm shrinkwrap" or yarn's lock file).
Hello! I have opinions about this :)Possible solution (or "how I'd build it today if there was no existing code in Django core"):a. Make migrations part of the project and not individual apps. This takes care of problem 3 above.It also means it's impossible for apps to ship migrations and define how to upgrade from version to version. I realise that (c) below is part of a proposed solution to this, but how do you propose to match up what's already been run in the database without having names match (and then you just have app migrations by another name)?
b. Prefix individual migration files with a UTC timestamp (20161105151023_add_foo) to provide a strict sorting order. This removes the depsolving requirement and takes care of 1 and 2. By eliminating those it makes 4 kind of obsolete as squashing migrations would become pointless.Unfortunately this does not help all the time as computers' clocks aren't necessarily right or in sync, so it would merely be an approximation and you'd still get the occasional clash.
c. Have reusable apps provide migration templates that Django then copies to my project when "makemigrations" is run.Would these be lined up with their own timestamp in the single serial migration timeline? Would you have to make sure any of these templates from any app update was copied across and put in the order before you used the new columns?
d. Maintain a separate directory for each database connection.This I think might be a good idea, though I'd like to see a more generalised idea of "migration sets" and you then then say which alias uses which set (so you can share sets among more than one connection)
e. Execute all migrations in alphabetical order (which means by timestamp first). When an unapplied migration is followed by an applied one, ask whether to attempt to just apply it or if the user wants to first unapply migrations that came after it. To me this would work better than 6.This is basically what South used to do, and it worked reasonably well in either being successful or exploding enough that people noticed. Given that you're proposing per-project migrations, however, people are going to run into this almost constantly, as they will clash significantly more than per-app ones.
Of course we do have migration support in core and it's not compatible with most of the above list. Any ideas? I think serializing the dependency solver state and reusing it between runs could be a pretty low hanging fruit (like "npm shrinkwrap" or yarn's lock file).I think not only could the dependency solver state be serialised but that it would be a replacement for the datetimes-on-filename proposal in that you could easily pull out a previously-serialised order from disk and then work out what the new ones do.I am generally not keen on the idea of per-project migrations, though - it makes what's in the database a property of the project, not the app, and that's not how Django has worked traditionally. I think an effort to get a more reliable, exposed global ordering of those individual app migrations would go a long way towards the end goal without having to have migration templates, upgrade instructions, and way more collisions between branches.
At the end of the day, though, there's a reason I made the schema editing separate from the migration runners - you can re-use all the nasty work in the schema editing interface and just replace the other part. This huge change is the sort of thing I'd want to see working and proven before we considered changing core, preferably as a third-party app, but of course I'd like to talk through potential smaller changes first, rather than throwing out the entire system.
> 2. Dependency resolution is only stable as long as the migration set is
> frozen. Sometimes introducing a new migration is enough to break existing
> migrations by causing them to execute in a slightly different order. We
> often have to backtrack and edit existing migrations and enforce a strict
> resolution order by introducing arbitrary dependencies.
>
So, you say you really have implicit dependencies between migrations --
dependencies in substance, which aren't recorded as dependencies. This seems
to indicate that you have a lot of manually-written migrations (data
migrations?), since the automatically-written ones do include relevant
dependencies. This seems odd -- it sounds like you're doing something out of
the ordinary.
This would also explain some of your bad experience with squashing -- indeed,
if you have many data migrations, squashing can become much less effective.
> 3. Removing an app from a project is a nightmare. You can't migrate to zero
> state unless the app is still there. There is no way to add "revert all
> migrations for app X" to the migration graph, it's something you need to
> run manually. There is no clean way to remove an app that was ever
> references in a relation. We were forced to do all kinds of hacks to get
> around this. Sometimes it's necessary to create an empty eggshell app with
> the same name and copy all migrations there then add necessary data
> migrations and finally migrations that remove all the models, indices,
> procedures etc. Sometimes people just leave a dead application in
> INSTALLED_APPS to not have to deal with this.
Clear out (maybe even remove) models.py and type "makemigrations", and you get
a migration that deletes everything. The answer to getting rid of the
historical migrations is squashing, but of course you first need squashing to
work properly.
> 4. Squashing migrations is wonky at best. If you create a model in one
> migration, alter one of its fields in another and then finally drop the
> model sometime later, the squashed migration will have Django try to
> execute the alter first and complain about the table not being there. Also
> the only reason we need to squash migrations is to prevent problem 1 above
> from becoming exponentially worse. If migrations were only as slow as the
> underlying SQL commands, we'd likely never squash them.
>
If that's so, it's a bug you should report; it's also an issue you can work-
around by editing the migration to remove the redundant operation. There are
issues with squashing, to be sure, but I don't think this is one of the
serious ones.
> 6. Conflict detection and resolution (migrate --merge) is a make-believe
> solution. It just trains people to execute the command without
> investigating whether their migration history still makes sense.
It could be smarter, assuming it understood the content of migrations. We
could probably improve it to a point where, for most cases, it would either
know to merge automatically or know that there really is a conflict. This would
probably not help you if you have a lot of RunPython's in your migrations.
> Some of these I need to dig deeper into and probably file proper tickets.
> For example I have an idea on how to fix 4 but it would make 1 even slower.
>
> I took some time to get a good long look at what other ORMs are doing. The
> graph-based dependency solving approach is rather uncommon. Most systems
> treat migrations as part of the project rather than the packages it uses.
>
>
> Possible solution (or "how I'd build it today if there was no existing code
> in Django core"):
>
> a. Make migrations part of the project and not individual apps. This takes
> care of problem 3 above.
>
So, there'd be no reason to link a migration to a specific app; quite the
contrary, it would become much more logical to have one migration include
operations for many apps. That could make the process of making an app
reusable while developing it in a project quite painful.
> b. Prefix individual migration files with a UTC timestamp
> (20161105151023_add_foo) to provide a strict sorting order. This removes
> the depsolving requirement and takes care of 1 and 2. By eliminating those
> it makes 4 kind of obsolete as squashing migrations would become pointless.
>
4: No, on large databases, squashing migrations is not pointless.
1&2: Strict order has its issues: Currently, if I find a problem with the last
migration of app A, I roll it back, fix it, and roll forward. With strict
order, I would have to roll back the project, not the app.
> c. Have reusable apps provide migration templates that Django then copies
> to my project when "makemigrations" is run.
>
I'd like to see some more details about how this works; they would need to
include the development process of reusable apps.
> d. Maintain a separate directory for each database connection.
Seems wrong as a blanket idea -- really depends on how the databases are used.
I wouldn't want to find myself maintaining copies of migrations which are
supposed to run on more than one database.
> e. Execute all migrations in alphabetical order (which means by timestamp
> first). When an unapplied migration is followed by an applied one, ask
> whether to attempt to just apply it or if the user wants to first unapply
> migrations that came after it. To me this would work better than 6.
This sounds like a good way to create data losses.
> f. Migrating to a timestamp solves 5.
Not really. Not with a team, since the timestamps will indicate not the real
logical order, but the order of development. You'd need empty "tag" migrations
to set points you want to migrate to...
My solution is to throw away and remake all migrations on a regular basis. Then I `TRUNCATE TABLE django_migrations` and `django-admin migrate --fake`. Obviously this isn’t a great solution.
Since I work mostly on small projects with just a couple developers on staff, doing this every few months suffices to keep the run time below one minute (which is still quite annoying).
There’s a risk to lose important, manually generated migrations, typically those that create indexes. I diff the schema before / after with apgdiff to avoid such problems.
1. Dependency resolution that turns the migration dependency graph into an ordered list happens every time you try to create or execute a migration. If you have several hundred migrations it becomes quite slow. I'm talking multiple minutes kind of slow. As you can imagine working with multiple branches or perfecting your migrations quickly becomes a tedious task.
I have to agree with Marteen.
From my experience what really slow down the migrate and makemigrations
command is the rendering of model states into concrete model classes. This
is something I concluded from my work on adding the plan object to pre_migrate
and post_migrate signals.
As soon as an operation accesses state.apps the rendering kicks in which
triggers the dynamic creation of multiple model classes and the computation
of reverse relationships. There are mechanisms in place to prevent the whole
project model classes from being rendered again when a model state is
altered but if the operation is performed on a model referenced by many
others the relationship chain might force a large number of them to be
rendered again causing massive slow downs.
Markus Holtermann has been working on teaching the migration framework
how to perform database operations without relying on state.apps which should
solve the remaining performance issues of the migrate command. In the case
of makemigrations the last remaining issue in the master branch should be solved
by stopping to rely on state.apps in RenameModel.state_forwards[1].
Patryk, many improvement landed in 1.9 and 1.10 to speed up the commands
dealing with migrations. Are you still seeing the same slowdown on these versions?
Le dimanche 6 novembre 2016 00:32:04 UTC+1, Marten Kenbeek a écrit :On Saturday, November 5, 2016 at 4:53:49 PM UTC+1, Patryk Zawadzki wrote:1. Dependency resolution that turns the migration dependency graph into an ordered list happens every time you try to create or execute a migration. If you have several hundred migrations it becomes quite slow. I'm talking multiple minutes kind of slow. As you can imagine working with multiple branches or perfecting your migrations quickly becomes a tedious task.Did the dependency resolution actually come up in benchmarks/profiles as a bottleneck? When I optimized and benchmarked the dependency graph code, it had no trouble ordering ~1000 randomly generated migrations with lots of inter-app dependencies in less than a second. I'd be surprised if this had any significant impact on the overall performance of migrations.An easy way to test this is the `showmigrations` command, which will only generate the graph without any model state changes or model rendering taking place. It does some other things, but nothing that should take in the order of minutes.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/4a012e54-fae5-4bba-97a9-f323f38e53bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I've just hit another problem related to custom fields.Currently migrations contain information about "rich" fields. If you use a custom field type, the migration code will currently import your field type from its Python module. This is highly problematic in case either the code moves or you later stop using that field type and want to remove the dependency.I am currently in the process of rewriting some of my existing migrations by hand to replace all instances of a custom field type with the type it actually uses for storage. This will eventually allow me to drop the dependency but it's not very nice.
Another problem is that for many custom field tapes makemigrations detects changes made to arguments that do no affect the database in any way (as they are returned by deconstruction).
If we could ever break backwards compatibility, I'd suggest having field deconstruction only return the column type (and necessary arguments) it wants the schema editor to create. This would prevent the migrations from having external dependencies (which is a major win in itself).
I'd also consider having apps.get_model() just use introspection to read the schema and return transient models with default field types for each underlying column type (so a custom JSONField would become a regular boring TextField inside migration code). This would save us tons of "rendering model states" time for the relatively small cost of having to cast certain columns to your preferred Python types inside a couple of data migrations.