I've been working on a tool to give me an automated way of altering the
schemas on servers I deploy my django applications to. I know about the
schema-evolution branch, but the current approach of that isn't
completely automated and the work seems to have stopped on it.
The approach I have taken is very simple. Inside each application
installed in your project, you create a migrations module. Inside there
is a list of migrations that need to be applied and an sql directory
that contains the actual migrations themselves.
For example :
cms/
migrations/
migrator.py -- holds the list of migrations to run, in the order
to run them
sql/
InstallInitialStructure.sql
AddNewAttributeColumn.sql
... etc. ...
So with this approach you would create the initial structure sql file
using the django sql generation, then you would have to produce sql
files for each addition / deletion / merge you need to make.
The sql directory can have subdirectories named the same as your database
engine if you need an engine specific sql file.
All that is needed is to copy a migrate.py script into the root of your
project and run 'python migrate.py' - any migrations on any of the
installed applications that have not been run would automatically get
run and a log kept so that migrations do not run twice.
Does this sound useful to anyone who is doing automated deployments?
Mike
David
Mike H napsal:
There are actually two schema evolution branches. For some reason,
schema evolution is the third rail for Django developers - someone
volunteers to work on schema evolution, starts working, then is never
heard from again.
> Does this sound useful to anyone who is doing automated deployments?
Certainly does. Keep in mind that there are a couple of wiki pages
discussing a design that had offical blessing back in the day. Any
schema-evolution proposal would need to cover at least those use
cases.
Yours,
Russ Magee %-)
So we've built a similar migration scheme, except the migrations are
Python functions. The vast majority of the migrations (we're up to
number 191) simply execute SQL statements, but the few times we've
wanted the help of the model classes, it has been very helpful to have
the full power of Python available to us.
--Ned.
--
Ned Batchelder, http://nedbatchelder.com
Before I release the code, I want to make the migrator call 'install'
from django.core.management if it doesn't find a migration module for
an app. Also, as this is being done as part of a work project, I have to
get permission to release the code from my employer, but this should not
be a problem.
Hopefully I'll have something released by the end of today, if not, by
the end of Monday.
Thanks for your comments!
Mike
FYI, the 'install' command was removed in 0.96 in favour of 'syncdb'.
Syncdb will identify any missing tables and install them, but it
currently doesn't install m2m tables that have been added to an
existing model.
> Hopefully I'll have something released by the end of today, if not, by
> the end of Monday.
Can't wait to see it!
Yours,
Russ Magee %-)
Unfortunately, syncdb will install the latest SQL for all installed
applications (as far as I am aware). Is there a way just to install the
schema for a specific application? I guess I could call sqlall and run
each supplied statement, but that's pretty much just reimplementing
install ;-)
The reason I need to do this is because keeping track of the revision a
model is at and working out which migrations need to be run gets quite
complex. For this initial version of the migrator I am doing all SQL
through the migrations. Of course, the django management can be used to
generate those initial migrations...
I have some documentation up at
http://www.asmwc.com/dbmigration/
(this is changing literally minute by minute as I produce it!)
Perhaps what I need to do is keep a revision number in each model's Meta
class. Then when I roll out a model to a server the migrator can check
to see if it has been installed before. If not, it can use django to
generate the sql and install it. If it has been installed before it can
see what revision was installed previously and run the migrations that
are flagged with the intermediate revision numbers...
Mike
I made a simple patch to syncdb to see if an application has a migrations
module or not. If it does, it will run it. If not, it will do its
automated thing. Seems to work quite well at the moment and let me get
rid of about a third of the migration code.
Mike
Release 0.01 of the migration module can be downloaded from
http://www.aswmc.com/dbmigration/
as well as a patch to integrate it into the syncdb command.
It seems to be working well for the developers here, so here's hoping
it's useful for some other people too :)
MikeH
It's just a shame that it doesn't use Python models or some type of
database agnostic representation - any plans there?
Great work non the less though!
--
Ollie
Mike
--- excerpt from docs ---
Using Python migrations
Sometimes it might be easier to do your migration using Python and the
Django models. To define a python migration, simply define a function
with the same name as the migration in migrator.py
The python functionality is there so that you can manipulate the models
directly or run more complex dynamic sql if you need to.
Let's say that your app is called "myapp" and you want to do something
special to all User records with an aol email address, but that
something special is easier to do in python than it is in sql. Your app
structure should look like this :
myapp/
migrations/
__init__.py
migrator.py
The content of migrator.py could be :
migration_list = ['migration1',]
def migration1():
# python code that you need to run to complete the migration goes here
# e.g.
from django.contrib.auth.models import User
users = User.objects.filter(email_address__endswith='@aol.com')
for user in users:
# do something to the user here and save
Hope that helps.
Mike
The reason for this is that syncdb will create the schema based on the
latest version of your model. But when you're rolling out to a live
server you need to be able to run the migrations to bring the schema up
to date, so you need to know exactly what stage your schema is at, so
you have to run all your sql through migrations.
If you create a migration that sets up your table structure do the tests
run ok?
Mike
On 4/20/2007, "David Marko" <dma...@tiscali.cz> wrote:
The docs have been updated with the points you raised.
Cheers!
Mike
On 4/20/2007, "Mike H" <mi...@mugwuffin.com> wrote:
David
This is interesting-- migrating django apps using django. Are there
cases where Django or the DB could become so broken that the migrator
couldn't run? I wasn't sure, so I took the coward's way.
I'm attaching my db_upgrade.py for comparison. (I've attached it
on-list a time or two before, but there are many migration threads.
:)) It uses numbers instead of names for migrations, and has a custom
db_version table (no django model for it). It supports only python
upgrade steps, but those steps are handed a connection to work with if
they want to use SQL directly. Finally, it runs against Ye Olde
Django (0.91) and probably needs some tweaks to run against 0.96.
If I'm reading your code right, the migrations could be applied out of
order, depending on which ones you have already run. Since schema is
stateful, I think strict order is better.
Consider developers Mike and Jeremy, starting with the same DB. Mike
creates a migration, "HalfPrice" and runs it locally. Jeremy creates
a migration, "FullPrice" and runs it locally. Each then update their
checkouts and run syncdb.
The migrator conflict is update like this, say:
migrations = [
"HalfPrice",
"FullPrice"
]
The changes of FullPrice will be run for Mike (since that's not in the
applied list in his DB), and the changes of HalfPrice will be run for
Jeremy (for the same reason).
Now, this probably isn't an issue outside of development, but it was
this situation that led me to use migration numbers rather than names
and have my migration script halt if the DB rev was at or higher than
the highest migration step.
Cheers,
Jeremy
I see your point, and yes, migrations can be run out of sequence -
although in practice I've found that it is rarely an issue and if it is,
it can be resolved fairly simply. Let's take a couple of examples. I'm
not trying to be condescending, just thinking it through outloud ;)
1) Mike wants to add " favourite_colour" attribute to a Profile object,
and Jeremy wants to add "shoe_size". There are two migrations that CAN
be applied out of order - they do not depend on each other at all. When
Mike updates, he will get the code and migration for "shoe_size". When
Jeremy updates, he will get the code and migration for
"favourite_colour". No problems there.
2) Jeremy wants to abstract some information (let's say shoe_size) from
the Profile object into a separate 'Clothing' class and add a reference
in Profile to the new class. At the same time, Mike alters the shoe_size
attribute to be a different type. Now we get a situation where the
migrations depend on each other but are not available to both developers.
So, if we used a migration number instead of just the name, the commit
would flag the conflict as the two migration files would conflict.
Great! We've seen the problem immediately!
If we just used the name, our pre-commit update and test would flag the
problem.
Lets say that Jeremy has committed his code first.
Before he commits, Mike updates runs his tests. As part of the test run,
the database is dropped and rebuilt - using the newly merged migrations.
Depending on how SVN merged the migration list, one of them is going to
fail. Mike can see the problem and go and fix it.
3) Five developers are working on a project. Each of them produce a
migration. None of the migrations depend on each other.
If we just use names, none of the migrations will conflict and none of
them depend on each other so all will run without problems.
If we just use numbers, 4 developers will see a conflict after the first
developer commits, even though their code is not broken. They've just
taken the next migration number without being aware that other people
needed to commit migrations too. So they increment their migration
number and commit. Now the first one to commit will cause a conflict for
the rest, and so on until they all contact each other and agree what
their migration numbers will be.
On the other hand, it's a much more efficient way to see if a schema has
all migrations run on it in the right order if we use a number. Although
with a descriptive name, it's much easier to track down what each one is
doing and find an offending migration if there is a problem with the
code that depends on it.
Perhaps a combination of both would be good?
By the way, my dbmigration should have no problems with older versions
of django, I just haven't tested it with them. The patch for
management.py would need changing though.
Cheers,
Mike
> ------------------------------------------------------------------------
>
> #!/usr/bin/python
> import sys, os
> import psycopg
> from django.core.db import db
> from django.conf.settings import DATABASE_USER, DATABASE_NAME, DATABASE_HOST, DATABASE_PORT, DATABASE_PASSWORD, DEBUG, TIME_ZONE, UPGRADE_STEPS_PATH, DEBUG
>
> def usage():
> sys.stderr.write("%s [--nobackup] \n Expects DJANGO_SETTINGS_MODULE to be set.\n")
>
> if len(sys.argv) == 1:
> do_backup = True
> else:
> if sys.argv[1] == "--nobackup":
> do_backup = False
> else:
> usage()
> sys.exit()
>
>
> try:
> os.environ['DJANGO_SETTINGS_MODULE']
> except KeyError:
> usage()
> sys.exit()
>
> def get_conn_string():
> if DATABASE_NAME == '':
> from django.core.exceptions import ImproperlyConfigured
> raise ImproperlyConfigured, "You need to specify DATABASE_NAME in your Django settings file."
> conn_string = "dbname=%s" % DATABASE_NAME
> if DATABASE_USER:
> conn_string = "user=%s %s" % (DATABASE_USER, conn_string)
> if DATABASE_PASSWORD:
> conn_string += " password='%s'" % DATABASE_PASSWORD
> if DATABASE_HOST:
> conn_string += " host=%s" % DATABASE_HOST
> if DATABASE_PORT:
> conn_string += " port=%s" % DATABASE_PORT
> return conn_string
>
> def get_db():
> log("Opening DB for upgrade")
> cur = db.cursor()
> conn = db.connection
> return conn, cur
>
> def get_current_version(cur):
> """
> Assumes:
> create table db_version (branch varchar(50), version integer);
> insert into db_version values ('trunk', 1);
>
> """
> try:
> cur.execute('select branch, version from db_version')
> except psycopg.ProgrammingError, e:
> logerr("Couldn't find db_version \n====%s\n====:\n perhaps you forgot to restore a DB dump or should create the table with ('trunk', 1)?" % e)
> return cur.fetchone()
>
> def log(s):
> sys.stderr.write('info: %s\n' % s)
>
> def logerr(s):
> sys.stdout.write('error: %s\n' % s)
> sys.stdout.write('error: You probably want to restore the DB!\nQuitting.\n')
> sys.exit()
>
> def backup_db(init, final) :
> import subprocess
> from time import gmtime, strftime
> dumpfile = "db_preupgrade_%s_from_%d_to_%d_at_%s_dump" % \
> (DATABASE_NAME, init, final, strftime("%Y%m%dT%H%M%S", gmtime()))
> log("If you're asked for the DB password, it's: %s" % DATABASE_PASSWORD)
> #FIXME: use subprocess.Popen and tie pg_dump output to gzip input.
> ret = subprocess.call(["pg_dump", "-f", dumpfile, "-U", DATABASE_USER, "-h", DATABASE_HOST, DATABASE_NAME])
>
> if ret != 0:
> raise RuntimeError, "Failed to create pg_dump %d" % ret
> return dumpfile
>
> if __name__ == '__main__':
> sys.path.insert(0, UPGRADE_STEPS_PATH)
> import upgrade_steps
>
> conn, cur = get_db()
>
> label, initial_version = get_current_version(cur)
>
> log("Starting at version %d of db branch %s." % (initial_version, label))
>
> steps = []
>
> version = initial_version
> while True:
> try:
> version += 1
> func_name = '%s%d' % (label, version)
> steps.append(getattr(upgrade_steps, func_name))
> if not callable(steps[-1]):
> logerr("%s is not callable, quitting." % func_name)
> sys.exit()
> except SystemExit:
> raise
> except:
> break
> final_version = version-1
>
>
> if initial_version == final_version:
> log("No DB upgrade to do. Already at %s version %d." % (label, initial_version))
> log("Quitting.")
> sys.exit()
>
> log("Upgrading from %d to %d on %s" % (initial_version, final_version, label))
>
> if do_backup:
> try:
> log("Backing up existing db.")
> dumpfile = backup_db(initial_version, final_version)
> except RuntimeError, e:
> logerr(e)
>
> log("Backed up to %s" % dumpfile)
> else:
> log("skipping backup")
>
> version = initial_version
> for step in steps:
> try:
> version += 1
> # FIXME: a successful Django ORM .save() causes a commit in the transaction
> # thereby committing something we really don't want to
> step(conn, cur)
> cur.execute("update db_version set version = %d", [version])
> if DEBUG: #commit changes from each step to aid in debugging.
> conn.commit()
> log("Done with version %s" % version)
> except psycopg.ProgrammingError, e:
> logerr("SQL failed in db upgrade to version %d:%s" % (version, e))
> except Exception, e:
> logerr("Unknown error updating on to version %d: %s" % (version, e))
> conn.commit() #final commit with all steps completed.
>
> log("Successfully completed upgrade from %d to %d" % (initial_version, final_version))
>
Our scripts are called "alter scripts" instead of migrators. We also
added version information in the script names to allow the main
migrator script, which we call 'alterdb', to load the scripts in a
configurable order. The driver loads the alter scripts, which
implement a simple two method interface, test and apply, allowing the
driver to call apply only if the test() method returns True. The
alter script can then check the schema in the test() method to see
whether, for example, a particular column is there or not, and then,
if needed, the apply method can create the column and possibly move
data from one place to another.
Of course, as time went by, we added more utlitiy methods to the
"alter scripts" base classes such as methods to create columns, test
presence of a table, create indexes, etc. We can mix SQL and python
in this context as needed.
This method has worked very nicely for us through the years.
http://code.google.com/p/django-schemaevolution/
Any feedbacks are welcome