Migrations, commands and syncdb

532 views
Skip to first unread message

Andrew Godwin

unread,
May 30, 2013, 2:03:15 PM5/30/13
to django-d...@googlegroups.com
Hi everyone,

I'm starting to plan out the commands for the new migrations stuff in Django, and I've hit something of an impasse trying to decide which option to go for.

Short background: South modified syncdb to just sync non-migrated apps, and you had to go and run migrate separately to get migrations working.

Note that this proposal DOES NOT cover the commands for creating and squashing migrations. That will come later, but will probably be "./manage.py createmigration" and "./manage.py squashmigrations"

The proposals are:

 1. Change syncdb so that it both does the old behaviour (adds models for unmigrated apps), and additionally runs any outstanding migrations. There would be a separate "migrate" command for more complex operations, such as reversing them or faking application, which is a little odd.

 2. Leave syncdb as it is, like South does, and have everything happen through a "migrate" command. Leads to weird interactions where each command knows about the other, and must be run in a certain order, but which isn't immediately obvious.

 3. Do everything through a single command - migrations, non-migrated syncing, reversal of migrations, etc. I would call this command "migrate", and start a deprecation cycle on "syncdb" (which  would turn into an alias to "migrate"). Calling "./manage.py migrate" would first sync unmigrated apps, and then run migrations, but would have options so a user could migrate (or sync!) specific apps/target migrations.

I prefer option 3, but getting rid of syncdb might be controversial, so I want to ask for people's opinions. syncdb would continue to exist for at least 3 versions if not forever; it would just be an alias to run "migrate" in its default configuration, and would do exactly what you would expect (whereas with South now, and with option 2, syncdb doesn't do enough).

Andrew

Donald Stufft

unread,
May 30, 2013, 2:05:29 PM5/30/13
to django-d...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

I vote #3.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

signature.asc

Alex Gaynor

unread,
May 30, 2013, 2:06:16 PM5/30/13
to django-d...@googlegroups.com
I'm broadly +1 on deprecating syncbd, it's possibly the most inaccurately named thing in all of Django (hint: it doesn't sync anything).

Alex


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
"I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Carl Meyer

unread,
May 30, 2013, 2:07:41 PM5/30/13
to django-d...@googlegroups.com
On 05/30/2013 12:03 PM, Andrew Godwin wrote:
> The proposals are:
>
> 1. Change syncdb so that it both does the old behaviour (adds models
> for unmigrated apps), and additionally runs any outstanding migrations.
> There would be a separate "migrate" command for more complex operations,
> such as reversing them or faking application, which is a little odd.
>
> 2. Leave syncdb as it is, like South does, and have everything happen
> through a "migrate" command. Leads to weird interactions where each
> command knows about the other, and must be run in a certain order, but
> which isn't immediately obvious.
>
> 3. Do everything through a single command - migrations, non-migrated
> syncing, reversal of migrations, etc. I would call this command
> "migrate", and start a deprecation cycle on "syncdb" (which would turn
> into an alias to "migrate"). Calling "./manage.py migrate" would first
> sync unmigrated apps, and then run migrations, but would have options so
> a user could migrate (or sync!) specific apps/target migrations.
>
> I prefer option 3, but getting rid of syncdb might be controversial, so
> I want to ask for people's opinions. syncdb would continue to exist for
> at least 3 versions if not forever; it would just be an alias to run
> "migrate" in its default configuration, and would do exactly what you
> would expect (whereas with South now, and with option 2, syncdb doesn't
> do enough).

I much prefer option 3; I think a deprecation path for syncdb is fine.
It's always been mis-named anyway, since it doesn't really sync.

Carl

Jannis Leidel

unread,
May 30, 2013, 2:16:30 PM5/30/13
to django-d...@googlegroups.com
+1 on #3

--
Jannis

Shai Berger

unread,
May 30, 2013, 2:45:35 PM5/30/13
to django-d...@googlegroups.com
Hi all,

On Thursday 30 May 2013, Andrew Godwin wrote:
>
> The proposals are:
>
> 1. Change syncdb so that it both does the old behaviour (adds models for
> unmigrated apps), and additionally runs any outstanding migrations. There
> would be a separate "migrate" command for more complex operations, such as
> reversing them or faking application, which is a little odd.
>
> 2. Leave syncdb as it is, like South does, and have everything happen
> through a "migrate" command. Leads to weird interactions where each command
> knows about the other, and must be run in a certain order, but which isn't
> immediately obvious.
>
> 3. Do everything through a single command - migrations, non-migrated
> syncing, reversal of migrations, etc. I would call this command "migrate",
> and start a deprecation cycle on "syncdb" (which would turn into an alias
> to "migrate"). Calling "./manage.py migrate" would first sync unmigrated
> apps, and then run migrations, but would have options so a user could
> migrate (or sync!) specific apps/target migrations.
>
Naming aside, I see little value in an operation that runs the current syncdb
without running pending migrations. Thus, -1 on option 2.

Between 1 and 3, I tend towards 3, assuming that the default operation (that
is, with no flags or arguments) is the equivalent of South's current
"syncdb --migrate" -- that is, the syncdb of option 1. I don't like the
current situation, where syncdb without --migrate is not useful and migrate
requires an explicit choice of apps.

Shai

Anssi Kääriäinen

unread,
May 30, 2013, 3:12:21 PM5/30/13
to Django developers
On 30 touko, 21:03, Andrew Godwin <and...@aeracode.org> wrote:
> I prefer option 3, but getting rid of syncdb might be controversial, so I
> want to ask for people's opinions. syncdb would continue to exist for at
> least 3 versions if not forever; it would just be an alias to run "migrate"
> in its default configuration, and would do exactly what you would expect
> (whereas with South now, and with option 2, syncdb doesn't do enough).

+1 to #3.

I haven't used South as much as I should have (instead I have painful
manual scripts to do migrations). The biggest pain point about
database schemas for me is easily test database setup. That is, sync
from scratch. I do the following currently:
1. load schema + a little bit of data from production through
pg_dump + pg_restore. The data is mostly metadata, in particular
already applied migrations in production.
2. run those migrations that haven't been applied already.

Will I be able to do the above somehow? Will #3 do a similar thing
for test database setup, except that you run all migrations against
empty database? The first migration would be the initial "syncdb", and
then you run rest of the migrations in chain finally arriving to the
wanted database state. Or would it be "sync" all models, then run
migrations against that database?

Sorry if these are stupid questions... I really do not know South as
well as I should, so I need a little tutoring...

- Anssi

Aymeric Augustin

unread,
May 30, 2013, 3:32:12 PM5/30/13
to django-d...@googlegroups.com
+1 on option 3.

In hindsight syncdb isn't a good name. It won't be missed. Please deprecate it.

--
Aymeric.

Andrew Godwin

unread,
May 30, 2013, 3:55:40 PM5/30/13
to django-d...@googlegroups.com


I haven't used South as much as I should have (instead I have painful
manual scripts to do migrations). The biggest pain point about
database schemas for me is easily test database setup. That is, sync
from scratch. I do the following currently:
  1. load schema + a little bit of data from production through
pg_dump + pg_restore. The data is mostly metadata, in particular
already applied migrations in production.
  2. run those migrations that haven't been applied already.

Will I be able to do the above somehow?  Will #3 do a similar thing
for test database setup, except that you run all migrations against
empty database? The first migration would be the initial "syncdb", and
then you run rest of the migrations in chain finally arriving to the
wanted database state. Or would it be "sync" all models, then run
migrations against that database?

Sorry if these are stupid questions... I really do not know South as
well as I should, so I need a little tutoring...

 
The way South works, and the way this will work (for new installs as well as tests) is that if migrations are present for an app, it will always create new databases by running through them from the very first migration to the latest one.

The proposed "squash" feature helps stop any potential long setup times (from hundreds of migrations) by allowing you to replace 100 old ones with a few new ones, optimised to have less operations.

Andrew

Russell Keith-Magee

unread,
May 30, 2013, 8:51:20 PM5/30/13
to django-d...@googlegroups.com
+1 to (3) as well. 

Russ %-)

Luke Plant

unread,
May 31, 2013, 6:39:33 AM5/31/13
to django-d...@googlegroups.com
On 30/05/13 20:55, Andrew Godwin wrote:

> The way South works, and the way this will work (for new installs as
> well as tests) is that if migrations are present for an app, it will
> always create new databases by running through them from the very first
> migration to the latest one.

One problem with this is that you have to be careful to write migrations
that will always work from scratch. This is best practice, but I have on
occasion used data migrations that were for specific problems, and may
have depended on specific data in the database. I've also used them for
populating parts of the database at specific upgrade points. Usually I
wouldn't want this data to be installed for my tests - I want my tests
to explicitly do the setup they need, and not be complicated by data
from data migrations.

For these reasons, and because of the slowness of running all the
migrations, there are some projects where I have a settings file for
running the tests that excludes South from INSTALLED_APPS, which solves
the problem nicely.

The slowness problem can be fixed by squashing, but it seems a shame to
require that simply for the sake of running tests. And it doesn't solve
the other problems. I would like to be able to turn off migrations for
tests, and just skip to the final schema.

Luke

--
"I washed a sock. Then I put it in the dryer. When I took it out,
it was gone." (Steven Wright)

Luke Plant || http://lukeplant.me.uk/

Andrew Godwin

unread,
May 31, 2013, 7:08:32 AM5/31/13
to django-d...@googlegroups.com
One problem with this is that you have to be careful to write migrations
that will always work from scratch. This is best practice, but I have on
occasion used data migrations that were for specific problems, and may
have depended on specific data in the database. I've also used them for
populating parts of the database at specific upgrade points. Usually I
wouldn't want this data to be installed for my tests - I want my tests
to explicitly do the setup they need, and not be complicated by data
from data migrations.

For these reasons, and because of the slowness of running all the
migrations, there are some projects where I have a settings file for
running the tests that excludes South from INSTALLED_APPS, which solves
the problem nicely.

The slowness problem can be fixed by squashing, but it seems a shame to
require that simply for the sake of running tests. And it doesn't solve
the other problems. I would like to be able to turn off migrations for
tests, and just skip to the final schema.

 Yes, this has traditionally been a problem, and it's difficult to say what to do here.

I personally have the opinion that stuff you add in data migrations _should_ be part of the tests - after all, it's usually crucial data that the site won't work without, and you're never going to test an installation without them.

However, I understand people don't like doing this. Problem is, the alternative is to reserve the old syncdb method for running tests, and then to introduce a new setting (which in South is SOUTH_TESTS_MIGRATE) that lets users turn it on and off, which seems like a bad road to go down.

I'd rather keep it so you must run migrations for tests - as I believe we should be making people do this as a general case - and if you want, having the data migration operations have an option you can pass where they no-op during tests, something like:

operations = [
    Python(
        """
        for author in Author.objects.all():
            author.fix()
        """,
        skip_during_tests=True,
    ),
]

Andrew

Anssi Kääriäinen

unread,
May 31, 2013, 7:44:20 AM5/31/13
to django-d...@googlegroups.com, Andrew Godwin
If test database is set up by syncdb and you have some SQL that isn't model migrations, then you will likely need two sets of migrations. One that applies to production database, and one that applies to fresh database from syncdb. You will need to constantly keep the "applies to syncdb database" migrations updated, and there is no guarantee that you will actually arrive to similar state than what you have in production database (in other words, migrations should be part of your tests like Andrew said).

For example the SQL could be trigger or view creation SQL. I know, not the most typical setup. But there are enough users who want to use traditional SQL for schema alterations that this use case should be part of design goals.

Dump + load of production schema is surprisingly good way to attack this problem - you are testing against exact copy of your production schema, and only "in development" migrations need to be applied. This is not a good default, but very useful if you happen to need it. My understanding is that I can actually do this somewhat easily with the planned design.

 - Anssi

Marc Tamlyn

unread,
May 31, 2013, 8:38:17 AM5/31/13
to django-d...@googlegroups.com, Andrew Godwin
I have generally not kept my migrations so they always work from scratch for similar reasons Luke said - Data migrations are often dependent on the data in the site - this is particularly relevant to content-driven (CMSy) sites where I may need to move a load of page about as a data migration at some point in the history, but this data will not exist for the tests. These "throwaway" data migrations are IMO pretty common. As a consequence (and given the speed requirements) I've generally never run a site without SOUTH_TESTS_MIGRATE set.

Also, when using an older library which has a long migration history included, when we start a new site using it we generally use `syncdb --all` so as not to worry about the history. Will this command still be supported by the new `migrate` command?

I can see the argument for making sure the migrations work, and they may be necessary for migrations containing custom SQL the site depends on, but I would suggest that many developers effectively treat migrations as things that can be thrown away once they've been run against all the relevant databases. Perhaps the rebase command needs to consider this - how can we throw away old migrations (especially data migrations).

(PS: +1 to option 3 in principle)

Andrew Godwin

unread,
May 31, 2013, 7:17:33 PM5/31/13
to Marc Tamlyn, django-d...@googlegroups.com
Yes, the large number of weird things people need custom SQL for is why I want to push for migrations much more overall - especially for cases like post_syncdb (should be replaced by data migrations) and the arbitrary initial SQL file support (should be replaced by custom migrations).

I don't plan to add in an equivalent to "syncdb --all" - the squashing should be enough to make things nice and fast for testing, and will have the added benefit where your test database matches your production one more closely. Squash (what was previously called rebase) is designed to let you throw away old migrations, especially in a case where you tightly control all installs.

There might be some scope for a version of the migrate command that does a one-shot, in-memory squash-then-execute - essentially a very quick way of doing initial migrations - but that depends if it turns out to be useful and how fast the migration optimiser/compressor that runs on the squashes will be.

Andrew
Reply all
Reply to author
Forward
0 new messages