Migration process hack

22 views
Skip to first unread message

percious

unread,
Dec 5, 2008, 4:45:17 PM12/5/08
to migrate-users
So, I've done migration processes for two medium-large database
schemas (50-100 tables) and I have found what I believe to be a
disconnect in the process of migrating and a database and developing a
database application.

The Problem
-------------------
Here is the problem in general. I am sorry it is so long winded, but
it's hard to see what is going on without this full expression. Maybe
I should write a blog...

You create a schema, and use that schema to create a database. Maybe
you are using Pylons, and using setup-app to create the schema.
Everyone is happy in schema land, and then someone realizes you need
to make a change, and maybe even more changes in the future. You
decide to use migrate, because hell, migrate will make things easier.

So, you write the migration, modify your model code and everything is
honky dory. Clients are happy, your developer feels like he has a
maintainable codebase. Your project begins to grow. Awesome.

Not Awesome. Your second developer needs a development environment.
You give them your model, they setup-app it and they are ready to go.
While they are plugging along, you realize you need a new migrate, so
you create it, and upgrade your development version of the database,
which is now at migration 2. The production also goes off without a
hitch, it's now on 2.

What about your new development buddy? When he ran migrate_version,
he set his database up to version 0. So, migrate will try to do 0->1
and then 1->2, but 0->1 will fail because his database is already at
version 1 even though his version table says its 0. Now he is in
pain, and re-creates his entire database to get moving, and the whole
cycle starts over again, although he can work... until you do the next
migration.

This is not ideal, and I have been struggling with it at one of my
clients. In that situation, I am the developer buddy, bopping in from
time to time to lend a hand. I spend probably 5% of my time getting
up and running, where if their migration system was somehow linked to
their database model, I could be synchronized, and actually use it for
a development tool, rather than a dusty old production tool.

Solution 1
---------------
So, I came up with my own solution, and have talked to a close friend
about an alternative solution. I'd like to discuss both of them as
they pertain to migrate, in the hopes that we can improve migrate, and
re-connect the development environment with the production one.

My solution is to connect the model base code to the migration
version. The way I accomplish this is to add a version.py file at the
root of my model code, which has, incredibly, the version of the
migration changeset that matches this model definition. I also have a
custom database creation script which creates a migrate_version table
and fills it with a record of the correct version number. Now I can
svn up, and migrate up and everyone is happy.

The problem with this method is that I have to maintain a damn file
with the version number, and not go insane doing it. Since my model
contains 3 different database schemas in it, that means I have to
maintain 3 variables in my version.py . Personally, I don't mind
this, but I had to write more documentation than the amount in this
post to make certain that my predecessors (or me in 6 months) can
figure out what the heck is going on.

Solution 2
--------------
The second solution, which Mark came up with, is to preserve the
initial model. Then, you do all of the migrations to bring you up to
the current codebase. This does not require any version file
tracking, and if your migrations add boilerplate entries, you don't
have to record them in a second place.

I think solution 2 is reasonable, but could be more time consuming to
execute (which is probably not really a problem with everyone's 2ghz
machines these days). The problems I see are that the boilerplate
entries you make for production may a not be the same as you make for
development. This makes testing a bit harder if they differ.

I also feel that solution 2 is more prone to problems, simply because
there are a lot more cranks to turn to get a development database.

How can migrate help
-----------------------------
The way to provide solution 1 to the problem in migrate would be to
allow the developer to connect his migrations to his model code, and
allow migrate some level of control over the "versions" module in the
model code. So, when you do a migrate commit, you are also saying
that the model code supports this version of migration. It will
probably only take me a few hours to make this happen. We then
create an easy way to create the migrate_version table by importing
something. This way you can have a custom createdb script that also
creates the migrate_version table. Perhaps we could even provide a
reasonable createdb template.

Solution 2 only requires us to provide a custom creation script. For
this we will have to preserve the initial schema (schema 0) and then
upgrade. We could probably also include a template instead of a
script so that you can manage boilerplate on your own.

My own conclusions
----------------------------
I lean more towards solution 1, mostly because I feel that it will
provide for easier boilerplating, because you will have access to your
whole model at once, instead of bits and pieces of your model as it
moves from one version the next. I also think the boilerplate for
dev. is likely to be a lot different than that of prod for most folks,
and I feel like solution 1 offers an easier way of handling it. It
also seems like solution 1 is much more efficient, because you don't
have to go and re-do all of that migration work, you have already
arrived at the correct-for-now solution.

If you made it this far congratulations! Now, go grab a cup of
coffee, stretch your arms, come back and tell me what you think.

Christophe de VIENNE

unread,
Dec 5, 2008, 6:24:33 PM12/5/08
to migrat...@googlegroups.com
Hi Chris,

A quick answer before going to bed...

Our solution to this problem is a lot like solution 1, and came up
quite naturally.
In our project, the repository is a submodule sitting next to the model module.
The command line tool which is responsible for database creation first
create the table using the model, then load the repository to check
what is the last version, and initialize the migrate_version table
with it.

Here is the responsible function :

> def initdb(bind=None, drop_first=False):
> if bind is None:
> bind = metadata.bind
> if drop_first:
> drop_tables(bind=bind)
> for t in tables.tbcreateorder:
> t.create(bind=bind)
> from migrate.versioning.api import version_control, version
> import sf.core.migrepo
> v = version(sf.core.migrepo.__path__[0])
> log.info( "Setting current version to '%s'" % v )
> version_control(bind.url, sf.core.migrepo.__path__[0], v)

The only problem I have is that it is not (yet) zipsafe and force me
to unzip the egg.
The big advantage is that you don't have any damn file to maintain...
and we never had the

I guess most people do more or less the same. It could be added as a
recipe in the documentation.
The best solution might be pylons setup-app to be migrate-aware - it
is already SA-aware.

My two cents,

Christophe

2008/12/5 percious <ch...@percious.com>:

Diez B. Roggisch

unread,
Dec 6, 2008, 4:37:57 AM12/6/08
to migrat...@googlegroups.com
percious schrieb:

> So, I've done migration processes for two medium-large database
> schemas (50-100 tables) and I have found what I believe to be a
> disconnect in the process of migrating and a database and developing a
> database application.

<snip/>

Ok, stretching, putting away the beer, starting to write...

Please note that due to lack of schema support & the fact that our own
model is only partially SQLAlchemy-based so far, my personal experiences
with migrate are limited. Yet I do have some.

If I see things correctly, all you need to combine is the existing,
up-to-date model in your codebase with the version-information known to
migrate.

So how about migrate growing a simple schema-tool that (re)creates a DB
from scratch using a passed model as e.g. string, and then stores it's
own "latest version" inside the version table?


Then you don't have to duplicate information that is likely to diverge
otherwise.

Or am I'm missing something crucial here?

Diez

percious

unread,
Dec 8, 2008, 10:04:40 AM12/8/08
to migrate-users
I really like this solution, and I will probably implement it.

Thanks for your insight.

-chris

percious

unread,
Dec 8, 2008, 10:39:16 AM12/8/08
to migrate-users
Christopher,

I was thinking about the implementation for this a little more in
depth. The only issue I see is that I like to recreate my database
from the previous version from scratch while testing my migrate
scripts. Since your solution looks at the current migrate version,
this would not work for me, since I have an older version that I am
trying to upgrade. The simple solution to this problem would simply
be to ad a --version parameter to your initdb script which would allow
you to override the migrate version. What do you think?

cheers.
-chris

On Dec 5, 4:24 pm, "Christophe de VIENNE" <cdevie...@gmail.com> wrote:

Christophe de VIENNE

unread,
Dec 8, 2008, 10:42:22 AM12/8/08
to migrat...@googlegroups.com
I think you are in this case because the database was initially
created with the wrong version.
This means you will create a special case for databases created
without using your function... which should not exists any more once
you integrate it into your product (and habits).
So I would say I would not do that :-)

2008/12/8 percious <ch...@percious.com>:

Christophe de VIENNE

unread,
Dec 8, 2008, 10:42:46 AM12/8/08
to migrat...@googlegroups.com
Glad I can help.

This made me think about all of this, and I had an idea that could
make all this even more natural.

Since the migrate repository is itself a module, we could have it
exposing a simple api which would be the migrate.versioning.api with
the default repo being itself.

With such an api one would write, to create a database :

metada.create_all()
myrepo.version_control(metadata.bind)

A good think would be at least version_control (and the versioning
api) to take a module instead of a path.

Any thoughts ?

Christophe

2008/12/8 percious <ch...@percious.com>:

Florent Aide

unread,
Dec 8, 2008, 10:48:05 AM12/8/08
to migrat...@googlegroups.com
On Mon, Dec 8, 2008 at 4:42 PM, Christophe de VIENNE
<cdev...@gmail.com> wrote:
>
> Glad I can help.
>
> This made me think about all of this, and I had an idea that could
> make all this even more natural.
>
> Since the migrate repository is itself a module, we could have it
> exposing a simple api which would be the migrate.versioning.api with
> the default repo being itself.
>
> With such an api one would write, to create a database :
>
> metada.create_all()
> myrepo.version_control(metadata.bind)
>
> A good think would be at least version_control (and the versioning
> api) to take a module instead of a path.
>
> Any thoughts ?

That would be nice!

Reply all
Reply to author
Forward
0 new messages