About migrations

367 views
Skip to first unread message

Marcin Nowak

unread,
Jun 23, 2017, 5:25:31 AM6/23/17
to Django developers (Contributions to Django itself)
Hi.

At the begining please forgive my engilsh - i'm not a native speaker.

I wrote here and in other places my thoughts about db migrations few times, and probably Tim remembers me so well.
My opinion was not changed, but I realized that I cannot leave Django ecosystem for a long time.
In that case I'd like to talk about migrations, their advantages and disadvantages, and about possible solutions.

Advantages:
  • fast (automatic) creation
  • database independent (model-centric)
  • a standard for django itself and reusable apps
  • possibilty to create migrations manually
Disadvantages:
  • dependent on application layer (a python code - field, model classes, etc)
  • allowing python code within migrations
  • separate files for changesets makes ugly conflicts when merging branches
  • squashing required
I'd like to focus on disadvantages, because they're a casue of using alternative solutions by me (Liquibase in that case).


Application layer dependency

This is something whch causes fails of whole migration system. 
References to the application layer are included within migration files, because of saving a "model state".
Any significant code change will broke migrations.  We must avoid such situations by squashing migrations at "the right time".

In my opinion migrations should be application independent.  Unfortunatelly whole system is based on models written in Python, which may includue custom solutions (i.e. model fields).

I understand that it is hard to cut-out this feature, but I believe that there exist some solution, which drops app layer dependency ands allow using custom fields.


Python code in migrations

This is a generally bad idea. Any python code is strictly related to the time. When code changes, a "pythonic" migration may fail. 
And you will never know about failure  until you setup CI properly.

There are very rare cases, when something from app layer must be called between releases. 
In that cases I'm using management commands, but Liquibase allows me to execute any binary.
It is not so straightforward, so as a developer you feel that you're doing something strange/non-standard, and you should be careful.

The general problem is that some migrations stops working properly in the time. 
This should never happen. Well... I'm used to this and I adhere to this principle.

I'm using Liquibase for years and my migrations broke few times mostly on changesets based on binaries execution (a python code through manage commands).

 
Separate files for changesets

I like Liquibase because of possibility of using files containing more changesets.
When I need to split migration files, I can do that and use "include" directive.
There are advantages: easy conflict resolution, just one file per database and per current major release, and full changes history in older files.

Django produces spearate migraiton files. This causes conflicts and reordering migrations by declarations ("depends"). 
And because squashing is almost required for long project, you're loosing a changesets history.


Squashing required

For long-live projects this is a required operation. Squashing deletes changes history and removes obsolete python code from migrations.
Squashing is a just a workaround tool for migrations design issues.



What I need?

Well, I'd like to simplify my work. That's obvious. Maintaining a Liquibase changes outside Django is harder and requires more time, especially when I'm adding new apps or upgrading existing ones.
But I don't want to switch back to the Django migrations, because of the design issues described above (mostly beacuse of the app layer dependency and squashing requirement).

I started prototyping a tool which translates Django migrations into pure SQL, which can be embedded in a Liquibase changesets.
But Python migrations can't be translated to SQL, of course. And the worst thing is that Django provides Python migrations even for contrib apps.

In that case I realised, that building such tool without changing a concept of the Django migrations (by you - Django Developers), is a little unreasonable, until Python code execution is accepted and used internally (contenttypes, 0002).


What I would like to achieve by this post?

A discussion about removing app layer dependency and removing or limiting RunPython usage, mostly.
This should eliminate requirement of squashing and increase migrations stability.

Separate files vs big file is not important now. This is just unhandy, but does not produce failures.


Kind Regards,
Marcin

Andrew Godwin

unread,
Jun 23, 2017, 1:19:55 PM6/23/17
to Django developers (Contributions to Django itself)
Hi Marcin,

Some of these are problems, yes, but you have to understand they are tradeoffs and the alternative is, in my opinion, worse.

> I believe that there exist some solution, which drops app layer dependency ands allow using custom fields

I would love to see what this would be. I spent a decade trying not to import custom fields from the code to allow this and never succeeded; the only way that I could make it work was to import them. Because Django allows fields to change their type based on the database being talked to, you can't bake the type into the migration at creation time.

> Python code in migrations

This has always been optional; Django's makemigrations will never put in RunPython for you, so if you choose to use it, then it's your choice and you get the problems along with it (more tied to application code). 

> Separate files for changesets

This is needed if you are going to have migrations per-app and thus let apps ship migrations. If you want to not have per-app migrations, then you can have just the one file, but then you make third-party apps have to ship migration snippets you need to remember to include when you make the next migration.

> Squashing is a just a workaround tool for migrations design issues

Yes, specifically it is designed to help solve the custom-fields-app-dependency issue. It's also just nice to have anyway when you get to 100 migrations.

A discussion about removing app layer dependency and removing or limiting RunPython usage, mostly.

OK, so what is your proposed alternative for custom fields? It is no good to propose to remove something without proposing what to do in its place; we can't drop custom field support in migrations. I think RunPython is solvable by our users; if you want to limit usage, then just have a linter rule in your project that rejects RunPython being committed by your developers.

Andrew

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/bd46c905-4402-41b8-a56c-1783020467f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marcin Nowak

unread,
Jun 23, 2017, 2:17:27 PM6/23/17
to Django developers (Contributions to Django itself)

Some of these are problems, yes, but you have to understand they are tradeoffs and the alternative is, in my opinion, worse.


Yes, I understand. But maybe there is another / better alternative.
Let's simplify a little and talk about a python deps first.
 
> I believe that there exist some solution, which drops app layer dependency ands allow using custom fields

I would love to see what this would be. I spent a decade trying not to import custom fields from the code to allow this and never succeeded; the only way that I could make it work was to import them. Because Django allows fields to change their type based on the database being talked to, you can't bake the type into the migration at creation time

The advantages comes from db type independency, this is true, but in the other side you're including the app layer dependency. 

Let's imagine that one of builtin field will change it's definition. Running migrations on two different Django versions will produce two different outputs.
My perspective is more database-like than app-like, so I'm expecting same db schema as a result (for both cases).

So the first thing that comes into my mind sounds: a complete definiton should be baked in migration file. Then, when app layer changes (i.e. upgrading framework or changing custom field definition), the migration system should identify the change and produce new migration with baked in definition. If it is possible to develop, you'll achieve less dependencies. The definition (a meta-description of the field) will be baked in, instead of depending on the field itself. And you'll preserve database type independency.

This is a just first concept that comes to my mind, now.

> Python code in migrations

This has always been optional; Django's makemigrations will never put in RunPython for you, so if you choose to use it, then it's your choice and you get the problems along with it (more tied to application code). 


In that case I'd like to avoid RunPython in Django's contrib apps builtin migrations, not to remove the possibility of running any executable.
I'd just like to add comment to it  "do it at your own risk" ;)  
 
> Separate files for changesets

This is needed if you are going to have migrations per-app and thus let apps ship migrations. If you want to not have per-app migrations, then you can have just the one file, but then you make third-party apps have to ship migration snippets you need to remember to include when you make the next migration.


Leave it as is for a while. It isn't so important.
 

> Squashing is a just a workaround tool for migrations design issues

Yes, specifically it is designed to help solve the custom-fields-app-dependency issue. It's also just nice to have anyway when you get to 100 migrations.


The second isn't an issue, in practice. After x years I have thousands of migrations in Liquibase, and the only one downside of this lies in time required to run them all in CI build. But this is automated, so nobody cares about minute or two.
 
A discussion about removing app layer dependency and removing or limiting RunPython usage, mostly.

OK, so what is your proposed alternative for custom fields? It is no good to propose to remove something without proposing what to do in its place; we can't drop custom field support in migrations.

To be precise - I don't want to remove anything and break compatibility. I'd like to improve some things. 
The (first) proposal is about decoupling migrations from the app layer. I wrote the example few lines above.  

I don't want to write all possible proposals at this moment, because I suppose that you discussed the topic a very long time.
We can focus now on the sepearation of a field definitions, to achieve consistency in time.
 
BR,
Marcin

Andrew Godwin

unread,
Jun 23, 2017, 3:28:07 PM6/23/17
to Django developers (Contributions to Django itself)


The advantages comes from db type independency, this is true, but in the other side you're including the app layer dependency. 

Let's imagine that one of builtin field will change it's definition. Running migrations on two different Django versions will produce two different outputs.
My perspective is more database-like than app-like, so I'm expecting same db schema as a result (for both cases).

So the first thing that comes into my mind sounds: a complete definiton should be baked in migration file. Then, when app layer changes (i.e. upgrading framework or changing custom field definition), the migration system should identify the change and produce new migration with baked in definition. If it is possible to develop, you'll achieve less dependencies. The definition (a meta-description of the field) will be baked in, instead of depending on the field itself. And you'll preserve database type independency.

How do you propose to identify "when the app layer changes"? This is a harder problem to solve that it first appears; the only thing you can rely on to compare to are the migration files themselves, so that necessarily means you need some description of the app layer in there.
 

In that case I'd like to avoid RunPython in Django's contrib apps builtin migrations, not to remove the possibility of running any executable.
I'd just like to add comment to it  "do it at your own risk" ;)  
 

It's only in one migration (https://github.com/django/django/blob/master/django/contrib/contenttypes/migrations/0002_remove_content_type_name.py), and this is because ContentTypes are something that are not purely database-specific. I personally don't like contenttypes anyway, so I would be a fan of making the whole thing vanish into the night, but that's not my call and it would have backwards-compat issues.
 
The second isn't an issue, in practice. After x years I have thousands of migrations in Liquibase, and the only one downside of this lies in time required to run them all in CI build. But this is automated, so nobody cares about minute or two.

I am impressed that your thousands of migrations only take a few minutes to run; you must have a decent database backing it. Some backends are much slower than this. Squash is offered as an option, and not required; and some people don't even squash and just reset their migrations back to 0001.
 
 
To be precise - I don't want to remove anything and break compatibility. I'd like to improve some things. 
The (first) proposal is about decoupling migrations from the app layer. I wrote the example few lines above.  

I don't want to write all possible proposals at this moment, because I suppose that you discussed the topic a very long time.
We can focus now on the sepearation of a field definitions, to achieve consistency in time.
 

Understand when I say that what you are proposing is a very, very big change. Django's ORM is heavily coupled to runtime information and the app layer, and I tried for many years to decouple them and ran into all sorts of issues as a result. Importing the fields from the source code ended up being the easiest, safest method that also happens to produce very easy-to-understand errors when it breaks (rather than using old definitions or silently failing).

I am all for migration improvements, but the overall shape of what you are suggesting seems like changing a few fundamental principles of how migrations are designed and essentially designing one of the other types of system (like django-evolution, or dmigrations). If you want a different kind of system, then you are more than welcome to develop one; Django migrations are very deliberately kept separate from the schema-changing backends (SchemaEditor), so it's easier to write custom migration solutions without having to redo all of the nasty per-database code and SQL generation.

However, proposing to change the core of the way Django works is going to come with a very high bar and me and others are going to want to see concrete proof of backwards-compatability, improvements to developer experience, and a person or people who are willing to put in all the work to make and land the patch. Personally, I would probably ask that a proposed alternate system was developed as a separate library to prove the concept first (re-using all the bits of Django it needs to, like SchemaEditor and some of the operations code)

Andrew

Marcin Nowak

unread,
Jun 23, 2017, 6:02:22 PM6/23/17
to Django developers (Contributions to Django itself)


Understand when I say that what you are proposing is a very, very big change. Django's ORM is heavily coupled to runtime information and the app layer, and I tried for many years to decouple them and ran into all sorts of issues as a result. Importing the fields from the source code ended up being the easiest, safest method that also happens to produce very easy-to-understand errors when it breaks (rather than using old definitions or silently failing).

Understood. 

From my perspective, from few years work with Liquibase (started in 2008 maybe), I had no failures except migration logic errors (due to ordering) or executing binaries.
This solution is completely separated from app logic, it is hell-stable, and supports databases longer than any app lifetime.

The problem with external solution lies in changesets creation, which is a completely manual process, where changes from 3rd party apps or Django itself cannot be automatically applied. 
The only one what is really problematic with Django migrations is "heavy coupling to app layer", as you said, which may cause migrations system failure when app layer is changing.

I'm trying to find a solution to achieve both - stability and automation (or semi-automation).

I am impressed that your thousands of migrations only take a few minutes to run; 

Believe me or not, I was thinking about hundreds. Sorry for a mistake. It may be related to my poor english skills.
Currently there are 1480 changesets, to be precise. They're applying within 2-3 minutes on two databases (sequentially) on a poor jenkins machine.

BR,
Marcin

Marcin Nowak

unread,
Jun 27, 2017, 9:07:50 AM6/27/17
to Django developers (Contributions to Django itself)
About database agnostic migrations.

Liquibase is a tool which is decoupled from the app layer and gives a possibility to write agnostic changesets.
To be independent you must use builtin operations and narrow used field types to "well known" subset
Anything else is passed through, same as plain SQLs are passed througd.

So as a DB architect I can decide what is most important to me - portability or db-specific features.

This concept may be easily adopted to the Django migrations, because the operations are currently implemented.


About custom fields

Liquibase is extensible. The concept is about extending a "well-know" fields subset by registering extensions.
For example - django.contrib.postgres may provide an extension to the migration subsystem by registering new types and new operations.

For 3rd party apps there will be requirement to deliver extension to a migration system, not to the project itself.
This would be real "game changer". Until the extension is installed, your migrations will work. 
You may recreate you app from scratch but leave extensions - and your db refactorings will be still safe.  


About files separation

Liquibase has the include operation. The similar may be implemented for the Django migrations. 
The only problem is that the Python code is rather unordered, and some explicit ordering should be introduced.

This would be quite irrelevant for Python-based changesets (migration files), but after dropping direct application layer dependency there would be possibility to use other language than Python to describe operations to aplly. 


Application layer agnostic migrations

The migrations subsystem must match their field types to the registered one, and render a changeset (migration file) using these registered names instead of direct field classes.
Anything unmatched must be passed as a CUSTOM type or just passed through by name (may introduce incompatibility, when someone will register a field of the same name in the future).
How to mark a custom type requires discussion, of course.


Backward compatibility

You can prevent backward compatibilty by allowing usage of directly imported fields, same as nowdays. In that case the field-mapping layer will be bypassed.


Automatic changes detection

Should work same as nowdays. But I must dig into internals to confirm that.


Marcin

Patryk Zawadzki

unread,
Jul 4, 2017, 10:04:09 AM7/4/17
to Django developers (Contributions to Django itself)
W dniu piątek, 23 czerwca 2017 21:28:07 UTC+2 użytkownik Andrew Godwin napisał:


The advantages comes from db type independency, this is true, but in the other side you're including the app layer dependency. 

Let's imagine that one of builtin field will change it's definition. Running migrations on two different Django versions will produce two different outputs.
My perspective is more database-like than app-like, so I'm expecting same db schema as a result (for both cases).

So the first thing that comes into my mind sounds: a complete definiton should be baked in migration file. Then, when app layer changes (i.e. upgrading framework or changing custom field definition), the migration system should identify the change and produce new migration with baked in definition. If it is possible to develop, you'll achieve less dependencies. The definition (a meta-description of the field) will be baked in, instead of depending on the field itself. And you'll preserve database type independency.

How do you propose to identify "when the app layer changes"? This is a harder problem to solve that it first appears; the only thing you can rely on to compare to are the migration files themselves, so that necessarily means you need some description of the app layer in there.

Have DB backends understand certain field types expressed as strings ("varchar", "text", "blob", "decimal" and so on).

Possibly some backends could implement a wider set than the others ("json", "xml", "rasterimage" etc.).

Have each Field class deconstruct to a field name and params, eg: "decimal", {"digits": 12, "decimals": 2}.

Then a model becomes essentially a list of tuples:

[
    ("title", "varchar", {"length": 100}),
    ("price", "decimal", {"digits": 12, "decimals": 2}),
    ...
]

This is not far from what "render model states" does currently except that it compares much richer model descriptions that leads to no-op migrations being generated each time you change a label or a user-visible part of choices.

Marcin Nowak

unread,
Jul 4, 2017, 10:13:23 AM7/4/17
to Django developers (Contributions to Django itself)
Have each Field class deconstruct to a field name and params  [...]

 
Thanks, @patrys. A field deconstruction is a key to achieve what I tried to describe earlier.
We can discuss the details about implementation, but this is not important now.

Marcin

Carl Meyer

unread,
Jul 4, 2017, 5:49:54 PM7/4/17
to django-d...@googlegroups.com
On 07/04/2017 07:04 AM, Patryk Zawadzki wrote:
> Have DB backends understand certain field types expressed as strings
> ("varchar", "text", "blob", "decimal" and so on).
>
> Possibly some backends could implement a wider set than the others
> ("json", "xml", "rasterimage" etc.).
>
> Have each Field class deconstruct to a field name and params, eg:
> "decimal", {"digits": 12, "decimals": 2}.
>
> Then a model becomes essentially a list of tuples:
>
> [
> ("title", "varchar", {"length": 100}),
> ("price", "decimal", {"digits": 12, "decimals": 2}),
> ...
> ]
>
> This is not far from what "render model states" does currently except
> that it compares much richer model descriptions that leads to no-op
> migrations being generated each time you change a label or a
> user-visible part of choices.

Right, and one reason for generating those "no-op" migrations is that
they aren't actually no-ops, if you value being able to write data
migrations in Python using the ORM. They keep the historical Python
models accurate.

Of course, we do pay a cost in complexity for the "historical ORM"
feature, and it's reasonable to prefer a tradeoff that doesn't pay that
cost and requires all data migrations to be written in SQL. As Andrew
mentioned, there's nothing to prevent anyone from writing an alternative
migrations frontend that takes this approach. It should be able to reuse
the schema editor backend, which does the heavy lifting of cross-db
schema alteration.

It's worth remembering, though, that five or six years ago we _had_ a
range of different migrations solutions that chose different tradeoffs,
and South was the clear winner in user uptake. It's not due to arbitrary
whim that the Django migrations system is based on South and preserves
its popular features, like the historical ORM.

Carl

signature.asc

Patryk Zawadzki

unread,
Jul 7, 2017, 8:09:08 AM7/7/17
to Django developers (Contributions to Django itself)
W dniu wtorek, 4 lipca 2017 23:49:54 UTC+2 użytkownik Carl Meyer napisał:
On 07/04/2017 07:04 AM, Patryk Zawadzki wrote:
> Have DB backends understand certain field types expressed as strings
> ("varchar", "text", "blob", "decimal" and so on).
>
> Possibly some backends could implement a wider set than the others
> ("json", "xml", "rasterimage" etc.).
>
> Have each Field class deconstruct to a field name and params, eg:
> "decimal", {"digits": 12, "decimals": 2}.
>
> Then a model becomes essentially a list of tuples:
>
> [
>     ("title", "varchar", {"length": 100}),
>     ("price", "decimal", {"digits": 12, "decimals": 2}),
>     ...
> ]
>
> This is not far from what "render model states" does currently except
> that it compares much richer model descriptions that leads to no-op
> migrations being generated each time you change a label or a
> user-visible part of choices.

Right, and one reason for generating those "no-op" migrations is that
they aren't actually no-ops, if you value being able to write data
migrations in Python using the ORM. They keep the historical Python
models accurate.

I would argue that this is a fairly optimistic view of the current state :)

They are technically "historically accurate" but the point in history they represent is not necessary the one you had in mind unless you only have a single application and linear migrations (ie. no merge migrations). Our current dependency system only allows you to express "no sooner than X" but the graph solver can execute an arbitrary number of later migrations between the one you depend on and the one you wrote.

Imagine you have app A and migration M1 adds field F. You then create a migration M2 in another application B that needs to access F so you have it depend on (A, M1). Two months later field F is removed or renamed in migration M3. Django has two ways to linearize the graph: (A, M1), (B, M2), (A, M3) or (A, M1), (A, M3), (B, M2). Both are valid but the latter will result in a crash when migrating from an empty DB state. In practice we often have to add arbitrary dependencies to later migrations to force a Python migration to execute in the correct order.

Also I'd argue that (from my personal experience which is obviously limited) having access to historical "choices", a form field label or the hint are not all that useful. In fact I'd be happy with a limited migration system that always returns bare database values without executing any of the field code. Writing portable migrations would be a bit more work but it's mostly a price apps would pay as projects themselves rarely need to be portable.

Anyway, I don't want anyone to think that I complain as I don't have the resources to write yet another migration tool and both South and Django migrations beat writing SQL by hand.

Marcin Nowak

unread,
Jul 7, 2017, 8:42:19 AM7/7/17
to Django developers (Contributions to Django itself)

Anyway, I don't want anyone to think that I complain as I don't have the resources to write yet another migration tool and both South and Django migrations beat writing SQL by hand.

Have you tried Liquibase ever? It is very reliable, unfortunatelly it is missing automatic changesets generation (because models aren't tracked) and you must rewrite Django's and 3rd party apps migrations by yourself. 
I started this topic to talk about an improvement to the Django migrations, to provide some advantages known from Liquibase.
Knowing LB may help everyone to understand my proposals.

So the most important for me is a separation from an application layer to improve stability.
The 2nd one is having a possibility to direct use of Django and 3rd party app migrations, to help everyone making upgrades.
The 3rd one is a "dependency hell", where project-wide and flat files of sequences are easier to maintain, especially when project dependencies (i.e. 3rd party apps) are changing.
They are very important for a long-term projects.

Marcin

Carl Meyer

unread,
Jul 7, 2017, 1:36:59 PM7/7/17
to django-d...@googlegroups.com
On 07/07/2017 05:09 AM, Patryk Zawadzki wrote:
> Right, and one reason for generating those "no-op" migrations is that
> they aren't actually no-ops, if you value being able to write data
> migrations in Python using the ORM. They keep the historical Python
> models accurate.
>
> I would argue that this is a fairly optimistic view of the current state :)
>
> They are technically "historically accurate" but the point in history
> they represent is not necessary the one you had in mind unless you only
> have a single application and linear migrations (ie. no merge
> migrations). Our current dependency system only allows you to express
> "no sooner than X" but the graph solver can execute an arbitrary number
> of later migrations between the one you depend on and the one you wrote.
>
> Imagine you have app A and migration M1 adds field F. You then create a
> migration M2 in another application B that needs to access F so you have
> it depend on (A, M1). Two months later field F is removed or renamed in
> migration M3. Django has two ways to linearize the graph: (A, M1), (B,
> M2), (A, M3) or (A, M1), (A, M3), (B, M2). Both are valid but the latter
> will result in a crash when migrating from an empty DB state. In
> practice we often have to add arbitrary dependencies to later migrations
> to force a Python migration to execute in the correct order.

Yeah, that's an issue I've certainly run into. It's not _that_
unreasonable or arbitrary to solve this by adding a dependency of (A,
M3) on (B, M2), but it would be better if we could express this as a
"must-run-before" dependency on the B side (since the dependency of app
B on app A may be one-way, and we shouldn't have to introduce knowledge
of B into A's migrations -- and A may even be third-party). IMO this
would be a reasonable feature addition.

Carl

signature.asc

Andrew Godwin

unread,
Jul 7, 2017, 5:54:07 PM7/7/17
to Django developers (Contributions to Django itself)
There is already a run-before constraint you can add to migrations for exactly this purpose! It's called "run_before" and is in the same format as the dependencies IIRC.

Andrew

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Adam Johnson

unread,
Jul 7, 2017, 6:36:46 PM7/7/17
to django-d...@googlegroups.com

Patryk Zawadzki

unread,
Jul 11, 2017, 9:12:37 AM7/11/17
to Django developers (Contributions to Django itself)
W dniu piątek, 7 lipca 2017 23:54:07 UTC+2 użytkownik Andrew Godwin napisał:
There is already a run-before constraint you can add to migrations for exactly this purpose! It's called "run_before" and is in the same format as the dependencies IIRC.

The problem with "run before X" is that there is no "X" at the point in time where you write that migration.

I think it would be more robust if Django tried to run migrations _from other apps_ that depend on the currently executing migration before proceeding to run the next migration from the same app.
Reply all
Reply to author
Forward
0 new messages