Django Schema Migration

18 views
Skip to first unread message

Mike H

unread,
Apr 19, 2007, 10:43:32 AM4/19/07
to django...@googlegroups.com

Hi all,

I've been working on a tool to give me an automated way of altering the
schemas on servers I deploy my django applications to. I know about the
schema-evolution branch, but the current approach of that isn't
completely automated and the work seems to have stopped on it.

The approach I have taken is very simple. Inside each application
installed in your project, you create a migrations module. Inside there
is a list of migrations that need to be applied and an sql directory
that contains the actual migrations themselves.

For example :

cms/
migrations/
migrator.py -- holds the list of migrations to run, in the order
to run them
sql/
InstallInitialStructure.sql
AddNewAttributeColumn.sql
... etc. ...

So with this approach you would create the initial structure sql file
using the django sql generation, then you would have to produce sql
files for each addition / deletion / merge you need to make.

The sql directory can have subdirectories named the same as your database
engine if you need an engine specific sql file.

All that is needed is to copy a migrate.py script into the root of your
project and run 'python migrate.py' - any migrations on any of the
installed applications that have not been run would automatically get
run and a log kept so that migrations do not run twice.

Does this sound useful to anyone who is doing automated deployments?

Mike

David Marko

unread,
Apr 19, 2007, 2:33:41 PM4/19/07
to Django users
Hello,
Can you share the initial code with us? This approach seems to be
good. Do you have experience on some project?

David

Mike H napsal:

Russell Keith-Magee

unread,
Apr 19, 2007, 8:10:57 PM4/19/07
to django...@googlegroups.com
On 4/19/07, Mike H <mi...@mugwuffin.com> wrote:
>
> I've been working on a tool to give me an automated way of altering the
> schemas on servers I deploy my django applications to. I know about the
> schema-evolution branch, but the current approach of that isn't
> completely automated and the work seems to have stopped on it.

There are actually two schema evolution branches. For some reason,
schema evolution is the third rail for Django developers - someone
volunteers to work on schema evolution, starts working, then is never
heard from again.

> Does this sound useful to anyone who is doing automated deployments?

Certainly does. Keep in mind that there are a couple of wiki pages
discussing a design that had offical blessing back in the day. Any
schema-evolution proposal would need to cover at least those use
cases.

Yours,
Russ Magee %-)

Ned Batchelder

unread,
Apr 19, 2007, 8:58:39 PM4/19/07
to django...@googlegroups.com
Just one comment on this: At Tabblo, we've found it very useful to have
not only SQL migrations, but Python migrations as well, because database
migrations involve not just schema changes, but also data changes.
While SQL can be used to update the data, it can be much easier to do it
with Python.

So we've built a similar migration scheme, except the migrations are
Python functions. The vast majority of the migrations (we're up to
number 191) simply execute SQL statements, but the few times we've
wanted the help of the model classes, it has been very helpful to have
the full power of Python available to us.

--Ned.

--
Ned Batchelder, http://nedbatchelder.com

Mike H

unread,
Apr 20, 2007, 5:20:05 AM4/20/07
to django...@googlegroups.com

Thanks Ned, that's certainly something I want to bear in mind. At the
moment the dbmigration code only uses sql migrations, but the way the
migrator is put together will allow me to build support for migrations
written in python quite easily.

Before I release the code, I want to make the migrator call 'install'
from django.core.management if it doesn't find a migration module for
an app. Also, as this is being done as part of a work project, I have to
get permission to release the code from my employer, but this should not
be a problem.

Hopefully I'll have something released by the end of today, if not, by
the end of Monday.

Thanks for your comments!

Mike

Russell Keith-Magee

unread,
Apr 20, 2007, 7:46:23 AM4/20/07
to django...@googlegroups.com
On 4/20/07, Mike H <mi...@mugwuffin.com> wrote:
>
> Before I release the code, I want to make the migrator call 'install'
> from django.core.management if it doesn't find a migration module for
> an app. Also, as this is being done as part of a work project, I have to
> get permission to release the code from my employer, but this should not
> be a problem.

FYI, the 'install' command was removed in 0.96 in favour of 'syncdb'.
Syncdb will identify any missing tables and install them, but it
currently doesn't install m2m tables that have been added to an
existing model.

> Hopefully I'll have something released by the end of today, if not, by
> the end of Monday.

Can't wait to see it!

Yours,
Russ Magee %-)

Mike H

unread,
Apr 20, 2007, 8:03:14 AM4/20/07
to django...@googlegroups.com

Ah, thanks for the heads up!

Unfortunately, syncdb will install the latest SQL for all installed
applications (as far as I am aware). Is there a way just to install the
schema for a specific application? I guess I could call sqlall and run
each supplied statement, but that's pretty much just reimplementing
install ;-)

The reason I need to do this is because keeping track of the revision a
model is at and working out which migrations need to be run gets quite
complex. For this initial version of the migrator I am doing all SQL
through the migrations. Of course, the django management can be used to
generate those initial migrations...

I have some documentation up at

http://www.asmwc.com/dbmigration/

(this is changing literally minute by minute as I produce it!)

Perhaps what I need to do is keep a revision number in each model's Meta
class. Then when I roll out a model to a server the migrator can check
to see if it has been installed before. If not, it can use django to
generate the sql and install it. If it has been installed before it can
see what revision was installed previously and run the migrations that
are flagged with the intermediate revision numbers...

Mike

Mike H

unread,
Apr 20, 2007, 8:17:28 AM4/20/07
to django...@googlegroups.com

Actually, those ramblings might not be relevant just yet...

I made a simple patch to syncdb to see if an application has a migrations
module or not. If it does, it will run it. If not, it will do its
automated thing. Seems to work quite well at the moment and let me get
rid of about a third of the migration code.

Mike

Mike H

unread,
Apr 20, 2007, 9:47:12 AM4/20/07
to django...@googlegroups.com

Hi all,

Release 0.01 of the migration module can be downloaded from
http://www.aswmc.com/dbmigration/
as well as a patch to integrate it into the syncdb command.

It seems to be working well for the developers here, so here's hoping
it's useful for some other people too :)

MikeH

Oliver Charles

unread,
Apr 20, 2007, 10:40:54 AM4/20/07
to django...@googlegroups.com
Hey, this is neat!

It's just a shame that it doesn't use Python models or some type of
database agnostic representation - any plans there?

Great work non the less though!

--
Ollie

Mike H

unread,
Apr 20, 2007, 10:51:57 AM4/20/07
to django...@googlegroups.com

Yeah, eventually I'd like to have the migrations define column
additions, deletions and renames using the models from Django. However
this is my first foray into the internals of Django so I would have a
lot of reading to do! I'm wondering if the schema-evolution branches
would have code I can reuse... and if any of the guys who did work on
them feel like submitting a patch, I'd be very happy :)

Mike

David Marko

unread,
Apr 20, 2007, 3:13:08 PM4/20/07
to Django users
How can 'Python migrations' be used? Can someone explain or place some
example? Where should I put the function?

--- excerpt from docs ---
Using Python migrations
Sometimes it might be easier to do your migration using Python and the
Django models. To define a python migration, simply define a function
with the same name as the migration in migrator.py


Mike H

unread,
Apr 20, 2007, 3:24:43 PM4/20/07
to django...@googlegroups.com

Hi David,

The python functionality is there so that you can manipulate the models
directly or run more complex dynamic sql if you need to.

Let's say that your app is called "myapp" and you want to do something
special to all User records with an aol email address, but that
something special is easier to do in python than it is in sql. Your app
structure should look like this :

myapp/
migrations/
__init__.py
migrator.py

The content of migrator.py could be :

migration_list = ['migration1',]
def migration1():
# python code that you need to run to complete the migration goes here
# e.g.
from django.contrib.auth.models import User
users = User.objects.filter(email_address__endswith='@aol.com')
for user in users:
# do something to the user here and save

Hope that helps.

Mike

David Marko

unread,
Apr 20, 2007, 3:48:32 PM4/20/07
to Django users
Thanks for an advice. I found problem with migrations when running
tests - python manage.py test . It seems that migration is run before
django creates table based on model. Tests should probably not run
migratons at all.

Mike H

unread,
Apr 20, 2007, 3:55:18 PM4/20/07
to django...@googlegroups.com

Ah, yes, the docs aren't clear on this at the moment, but the automatic
schema creation does not run if you use migrations for an application.
You will have to create a migration that creates the tables. (Just use
'python manage.py sqlall' to get the sql for the migration)

The reason for this is that syncdb will create the schema based on the
latest version of your model. But when you're rolling out to a live
server you need to be able to run the migrations to bring the schema up
to date, so you need to know exactly what stage your schema is at, so
you have to run all your sql through migrations.

If you create a migration that sets up your table structure do the tests
run ok?

Mike

On 4/20/2007, "David Marko" <dma...@tiscali.cz> wrote:

Mike H

unread,
Apr 20, 2007, 4:08:19 PM4/20/07
to django...@googlegroups.com

Hi again,

The docs have been updated with the points you raised.

Cheers!

Mike


On 4/20/2007, "Mike H" <mi...@mugwuffin.com> wrote:

David Marko

unread,
Apr 20, 2007, 4:27:34 PM4/20/07
to Django users
GREAT WORK! Migrations are being discussed for years and now we have
some real-life solution. And I must say - it works and helps. Now we
should get rid of SQL, we need some nice DSL syntax.

David

Jeremy Dunck

unread,
Apr 22, 2007, 10:10:11 AM4/22/07
to django...@googlegroups.com
On 4/20/07, Mike H <mi...@mugwuffin.com> wrote:
> It seems to be working well for the developers here, so here's hoping
> it's useful for some other people too :)

This is interesting-- migrating django apps using django. Are there
cases where Django or the DB could become so broken that the migrator
couldn't run? I wasn't sure, so I took the coward's way.

I'm attaching my db_upgrade.py for comparison. (I've attached it
on-list a time or two before, but there are many migration threads.
:)) It uses numbers instead of names for migrations, and has a custom
db_version table (no django model for it). It supports only python
upgrade steps, but those steps are handed a connection to work with if
they want to use SQL directly. Finally, it runs against Ye Olde
Django (0.91) and probably needs some tweaks to run against 0.96.

If I'm reading your code right, the migrations could be applied out of
order, depending on which ones you have already run. Since schema is
stateful, I think strict order is better.

Consider developers Mike and Jeremy, starting with the same DB. Mike
creates a migration, "HalfPrice" and runs it locally. Jeremy creates
a migration, "FullPrice" and runs it locally. Each then update their
checkouts and run syncdb.

The migrator conflict is update like this, say:
migrations = [
"HalfPrice",
"FullPrice"
]

The changes of FullPrice will be run for Mike (since that's not in the
applied list in his DB), and the changes of HalfPrice will be run for
Jeremy (for the same reason).

Now, this probably isn't an issue outside of development, but it was
this situation that led me to use migration numbers rather than names
and have my migration script halt if the DB rev was at or higher than
the highest migration step.

Cheers,
Jeremy

db_upgrade.py

Mike H

unread,
Apr 22, 2007, 11:06:09 AM4/22/07
to django...@googlegroups.com
Hi Jeremy,

I see your point, and yes, migrations can be run out of sequence -
although in practice I've found that it is rarely an issue and if it is,
it can be resolved fairly simply. Let's take a couple of examples. I'm
not trying to be condescending, just thinking it through outloud ;)

1) Mike wants to add " favourite_colour" attribute to a Profile object,
and Jeremy wants to add "shoe_size". There are two migrations that CAN
be applied out of order - they do not depend on each other at all. When
Mike updates, he will get the code and migration for "shoe_size". When
Jeremy updates, he will get the code and migration for
"favourite_colour". No problems there.

2) Jeremy wants to abstract some information (let's say shoe_size) from
the Profile object into a separate 'Clothing' class and add a reference
in Profile to the new class. At the same time, Mike alters the shoe_size
attribute to be a different type. Now we get a situation where the
migrations depend on each other but are not available to both developers.

So, if we used a migration number instead of just the name, the commit
would flag the conflict as the two migration files would conflict.
Great! We've seen the problem immediately!

If we just used the name, our pre-commit update and test would flag the
problem.

Lets say that Jeremy has committed his code first.

Before he commits, Mike updates runs his tests. As part of the test run,
the database is dropped and rebuilt - using the newly merged migrations.
Depending on how SVN merged the migration list, one of them is going to
fail. Mike can see the problem and go and fix it.

3) Five developers are working on a project. Each of them produce a
migration. None of the migrations depend on each other.

If we just use names, none of the migrations will conflict and none of
them depend on each other so all will run without problems.

If we just use numbers, 4 developers will see a conflict after the first
developer commits, even though their code is not broken. They've just
taken the next migration number without being aware that other people
needed to commit migrations too. So they increment their migration
number and commit. Now the first one to commit will cause a conflict for
the rest, and so on until they all contact each other and agree what
their migration numbers will be.

On the other hand, it's a much more efficient way to see if a schema has
all migrations run on it in the right order if we use a number. Although
with a descriptive name, it's much easier to track down what each one is
doing and find an offending migration if there is a problem with the
code that depends on it.

Perhaps a combination of both would be good?

By the way, my dbmigration should have no problems with older versions
of django, I just haven't tested it with them. The patch for
management.py would need changing though.

Cheers,

Mike

> ------------------------------------------------------------------------
>
> #!/usr/bin/python
> import sys, os
> import psycopg
> from django.core.db import db
> from django.conf.settings import DATABASE_USER, DATABASE_NAME, DATABASE_HOST, DATABASE_PORT, DATABASE_PASSWORD, DEBUG, TIME_ZONE, UPGRADE_STEPS_PATH, DEBUG
>
> def usage():
> sys.stderr.write("%s [--nobackup] \n Expects DJANGO_SETTINGS_MODULE to be set.\n")
>
> if len(sys.argv) == 1:
> do_backup = True
> else:
> if sys.argv[1] == "--nobackup":
> do_backup = False
> else:
> usage()
> sys.exit()
>
>
> try:
> os.environ['DJANGO_SETTINGS_MODULE']
> except KeyError:
> usage()
> sys.exit()
>
> def get_conn_string():
> if DATABASE_NAME == '':
> from django.core.exceptions import ImproperlyConfigured
> raise ImproperlyConfigured, "You need to specify DATABASE_NAME in your Django settings file."
> conn_string = "dbname=%s" % DATABASE_NAME
> if DATABASE_USER:
> conn_string = "user=%s %s" % (DATABASE_USER, conn_string)
> if DATABASE_PASSWORD:
> conn_string += " password='%s'" % DATABASE_PASSWORD
> if DATABASE_HOST:
> conn_string += " host=%s" % DATABASE_HOST
> if DATABASE_PORT:
> conn_string += " port=%s" % DATABASE_PORT
> return conn_string
>
> def get_db():
> log("Opening DB for upgrade")
> cur = db.cursor()
> conn = db.connection
> return conn, cur
>
> def get_current_version(cur):
> """
> Assumes:
> create table db_version (branch varchar(50), version integer);
> insert into db_version values ('trunk', 1);
>
> """
> try:
> cur.execute('select branch, version from db_version')
> except psycopg.ProgrammingError, e:
> logerr("Couldn't find db_version \n====%s\n====:\n perhaps you forgot to restore a DB dump or should create the table with ('trunk', 1)?" % e)
> return cur.fetchone()
>
> def log(s):
> sys.stderr.write('info: %s\n' % s)
>
> def logerr(s):
> sys.stdout.write('error: %s\n' % s)
> sys.stdout.write('error: You probably want to restore the DB!\nQuitting.\n')
> sys.exit()
>
> def backup_db(init, final) :
> import subprocess
> from time import gmtime, strftime
> dumpfile = "db_preupgrade_%s_from_%d_to_%d_at_%s_dump" % \
> (DATABASE_NAME, init, final, strftime("%Y%m%dT%H%M%S", gmtime()))
> log("If you're asked for the DB password, it's: %s" % DATABASE_PASSWORD)
> #FIXME: use subprocess.Popen and tie pg_dump output to gzip input.
> ret = subprocess.call(["pg_dump", "-f", dumpfile, "-U", DATABASE_USER, "-h", DATABASE_HOST, DATABASE_NAME])
>
> if ret != 0:
> raise RuntimeError, "Failed to create pg_dump %d" % ret
> return dumpfile
>
> if __name__ == '__main__':
> sys.path.insert(0, UPGRADE_STEPS_PATH)
> import upgrade_steps
>
> conn, cur = get_db()
>
> label, initial_version = get_current_version(cur)
>
> log("Starting at version %d of db branch %s." % (initial_version, label))
>
> steps = []
>
> version = initial_version
> while True:
> try:
> version += 1
> func_name = '%s%d' % (label, version)
> steps.append(getattr(upgrade_steps, func_name))
> if not callable(steps[-1]):
> logerr("%s is not callable, quitting." % func_name)
> sys.exit()
> except SystemExit:
> raise
> except:
> break
> final_version = version-1
>
>
> if initial_version == final_version:
> log("No DB upgrade to do. Already at %s version %d." % (label, initial_version))
> log("Quitting.")
> sys.exit()
>
> log("Upgrading from %d to %d on %s" % (initial_version, final_version, label))
>
> if do_backup:
> try:
> log("Backing up existing db.")
> dumpfile = backup_db(initial_version, final_version)
> except RuntimeError, e:
> logerr(e)
>
> log("Backed up to %s" % dumpfile)
> else:
> log("skipping backup")
>
> version = initial_version
> for step in steps:
> try:
> version += 1
> # FIXME: a successful Django ORM .save() causes a commit in the transaction
> # thereby committing something we really don't want to
> step(conn, cur)
> cur.execute("update db_version set version = %d", [version])
> if DEBUG: #commit changes from each step to aid in debugging.
> conn.commit()
> log("Done with version %s" % version)
> except psycopg.ProgrammingError, e:
> logerr("SQL failed in db upgrade to version %d:%s" % (version, e))
> except Exception, e:
> logerr("Unknown error updating on to version %d: %s" % (version, e))
> conn.commit() #final commit with all steps completed.
>
> log("Successfully completed upgrade from %d to %d" % (initial_version, final_version))
>

lcaamano

unread,
Apr 23, 2007, 11:43:10 AM4/23/07
to Django users
FYI, this is very similar to the approach we've used since 2001 when
doing schema updates of our product, DynaCenter (http://
www.racemi.com). It has nothing to do with Django but the problem
domain is the same.

Our scripts are called "alter scripts" instead of migrators. We also
added version information in the script names to allow the main
migrator script, which we call 'alterdb', to load the scripts in a
configurable order. The driver loads the alter scripts, which
implement a simple two method interface, test and apply, allowing the
driver to call apply only if the test() method returns True. The
alter script can then check the schema in the test() method to see
whether, for example, a particular column is there or not, and then,
if needed, the apply method can create the column and possibly move
data from one place to another.

Of course, as time went by, we added more utlitiy methods to the
"alter scripts" base classes such as methods to create columns, test
presence of a table, create indexes, etc. We can mix SQL and python
in this context as needed.

This method has worked very nicely for us through the years.

daev

unread,
Apr 23, 2007, 1:51:27 PM4/23/07
to Django users
It's very funny but couple of dayes ago I start to write some schema
wvolution framework for django. I've read articles in django-code wiki
about schema evolution and make some major decisions. I realized that
the better way(for me) not to analize application model definition but
to analize django-made DDL code. So I hame 2 DDL files with old schema
definition and new. Old file stored in some "compiled" format in
specific db (using Pickle module).When user changed his model
definition he run schema_evolution.py <app_name> (from the djang-
project root derectory). This script (its more than one file, it whole
multi-file module:) ) gets from django CREATE TABLE statement and then
starts to compare this one with stored earlier. Script parse (using
pyparsing module) two files then try to find changes. It asks user for
renamed tables/fields and then produced output python script that
contains ALTER TABLE statements (that executed by cursor.execute() ).
I plane to release my code under code.google project hosting. May be
it can help somebody(and me of course) to spend less time with schema
evolution/migration tasks.
Thanks. Sorry for my English:)

daev

unread,
Apr 29, 2007, 6:44:47 AM4/29/07
to Django users
Here is Schema Evolution Project home site:

http://code.google.com/p/django-schemaevolution/

Any feedbacks are welcome

Reply all
Reply to author
Forward
0 new messages