http://www.bitbucket.org/DeadWisdom/migratory/
The idea is a database migration system that:
* Is simple.
* Doesn't make you use sql. This is an orm, we shouldn't have to use sql.
* Can be automatic. Predicts the migration script for you so you
don't have to think about what has changed.
* Works well in a version control system, or even distributed
ones. Because damnit.
* During the migration process, *allows you to use the state of
your previous models as if they were still there*. This is key, and is
not done anywhere else, as far as I know.
Currently it's tested on mysql, postgresql_psycopg2, and sqlite3.
Thanks.
In the aftermath of DjangoCon [1], Simon Willison, Andrew Godwin and
myself started the django-migrations SIG [2], with the aim of getting
a migrations framework into the Django core that draws from the best
parts of our three respective frameworks (dbmigrations, South and
Django Evolution)
[1] http://www.youtube.com/watch?v=VSq8m00p1FM
[2] http://code.google.com/p/django-migrations-sig/
After some initial activity, there hasn't been much progress, mostly
due to other development priorities in our busy lives (personally,
this means aggregations and some other features for Django v1.1).
However, I have high hopes that we can get discussions going to
deliver something in the v1.2/1.3 timeframe.
You clearly have some good ideas (and more importantly, some working
code implementing those ideas) - I would encourage you to get involved
with the SIG.
Some quick notes regarding what I have seen so far:
Firstly, your suggestion that people symlink into django.contrib is
_really_ bad practice. There shouldn't be any need for your code to
reside in django.contrib, and I'd like to discourage this as a
practice before it takes hold as a defacto standard.
If there is some technical reason why the django.contrib namespace is
required, then raise that issue on the developers list and we can see
what we can do to break that dependency. I can't think of anything
that would cause such a dependency, but it's usually the things you
don't consider that turn out to be problems :-)
> The idea is a database migration system that:
> * Doesn't make you use sql. This is an orm, we shouldn't have to use sql.
I would agree that using SQL shouldn't be required, but the consensus
of all the discussions that have taken place around 'the one true
schema evolution framework' is that a raw SQL mode should be possible.
This is for at least two reasons:
1) Satisfying DB Admins that want to manually audit any schema change.
The bigger the company, the more likely a migration strategy will be
required, and the more likely that DB admins will want to check
everything you do.
2) Performance on very large databases. "for obj in
Model.objects.all(): change(obj); obj.save()" is very Pythonic and
very easy to read, but will not perform very well on large databases.
Having an easy way to drop to raw SQL is essential for these cases.
> * During the migration process, *allows you to use the state of
> your previous models as if they were still there*. This is key, and is
> not done anywhere else, as far as I know.
This is very cool, and I like the way that this aspect is implemented
(the stored snapshot files). I had considered solving this problem
using some metamodel magic, but your approach is a lot more elegant.
> Currently it's tested on mysql, postgresql_psycopg2, and sqlite3.
Tested in what sense?
- I've run it on my test project and it works?
- I've got a test suite and I've run it for each backend?
- I've got comprehensive test cases which pass for all backends?
I have two reasons for asking:
1) Your test suite seems very small - especially considering the
multitude of edge cases that need to be considered. The breadth of
testing is one way to get indication of how robust the project is.
2) Ticket #7835 describes a feature that should hopefully make testing
schema migration easier - I'd be interested in any feedback you might
have on this ticket, and what features you would need to make testing
Migratory easier.
Best of luck with Migratory - this isn't a trivial task, and what you
have shown so far is quite impressive.
Yours,
Russ Magee %-)
Brantley Harris wrote:
> The idea is a database migration system that:
> * Is simple.
> * Doesn't make you use sql. This is an orm, we shouldn't have to use sql.
>
As Russ said, there's a need for SQL, and his points are pretty valid
here. You don't want SQL if you're planning to be database-independent,
but if you're using the library for migrations on only one database,
people might want to fiddle with e.g. triggers, full text indexes, and
the like. While I've been slowly wrapping and abstracting more and more
of these in South, there's some features that simply only exist in one
database and aren't really worth it.
> * Can be automatic. Predicts the migration script for you so you
> don't have to think about what has changed.
> * Works well in a version control system, or even distributed
> ones. Because damnit.
> * During the migration process, *allows you to use the state of
> your previous models as if they were still there*. This is key, and is
> not done anywhere else, as far as I know.
>
I like the whole snapshotting thing, which as far as I can tell is the
thing that really enables a lot of this ORM stuff; in South, we've had
no end of issues with things like signals going out too early (do you
deal with that yet/at all?) and people trying to use the ORM inside a
migration.
Personally, I needed a migrations system that was separate from the ORM,
since that's how the project South was initially created for was going.
From a quick look through the source, the snapshotting only preserves
the fields and not any model methods, but that's really the best you can
hope for, otherwise there's wonderful problems with missing imports
(initially, I thought your snapshots might be just a copy of models.py,
but nope).
Also, how well does it do with custom fields? Especially fields that
used to be defined, but are now renamed or deleted? (An edge case, I
know). (South does reasonably poorly in this area automatically, but you
can change the field type used to match).
I think you're somewhat justified in starting a new project here; as
hopeful as I am to see the dizzying array of migration libraries reduced
to something sane, it's different enough to South (because you use
syncdb), django-evolution (because migrations are still all explicit),
and dmigrations (no raw SQL).
Me and Andy were at one point discussing a very similar idea of storing
the models with the migrations in South (we were going to just shove
them on the bottom of the migration's .py file), but we never followed
it through; I'm beginning to think it might be worth it, since many
people know and sometimes even love the good old ORM.
Also, Russ, #7835 would be very welcome, and reduce the need for my
(somewhat hackish) unit test suite that does indeed mungle app_cache as
well as sys.path.
Andrew
Ah, I wasn't sure that this was active, cool.
>
> You clearly have some good ideas (and more importantly, some working
> code implementing those ideas) - I would encourage you to get involved
> with the SIG.
>
> Some quick notes regarding what I have seen so far:
>
> Firstly, your suggestion that people symlink into django.contrib is
> _really_ bad practice. There shouldn't be any need for your code to
> reside in django.contrib, and I'd like to discourage this as a
> practice before it takes hold as a defacto standard.
>
> If there is some technical reason why the django.contrib namespace is
> required, then raise that issue on the developers list and we can see
> what we can do to break that dependency. I can't think of anything
> that would cause such a dependency, but it's usually the things you
> don't consider that turn out to be problems :-)
Ah yes, this is definitely a problem. See, I had to be able to import
based on a string (database backend), and I was having problems doing
so without an absolute import. I defaulted to this, and didn't think
much of it. I'll get on this.
>
>> The idea is a database migration system that:
>> * Doesn't make you use sql. This is an orm, we shouldn't have to use sql.
>
> I would agree that using SQL shouldn't be required, but the consensus
> of all the discussions that have taken place around 'the one true
> schema evolution framework' is that a raw SQL mode should be possible.
> This is for at least two reasons:
>
> 1) Satisfying DB Admins that want to manually audit any schema change.
> The bigger the company, the more likely a migration strategy will be
> required, and the more likely that DB admins will want to check
> everything you do.
>
> 2) Performance on very large databases. "for obj in
> Model.objects.all(): change(obj); obj.save()" is very Pythonic and
> very easy to read, but will not perform very well on large databases.
> Having an easy way to drop to raw SQL is essential for these cases.
>
Yes, my solution to that, although undocumented I realize, is to allow
migrations to also be completely sql. So they would live right next
to regular migrations, but have a .sql on them. I was also going to
make an option to ./mananage.py migrate that is --sql, so it would
build the migration for you just the same, but as sql.
>> * During the migration process, *allows you to use the state of
>> your previous models as if they were still there*. This is key, and is
>> not done anywhere else, as far as I know.
>
> This is very cool, and I like the way that this aspect is implemented
> (the stored snapshot files). I had considered solving this problem
> using some metamodel magic, but your approach is a lot more elegant.
>
>> Currently it's tested on mysql, postgresql_psycopg2, and sqlite3.
>
> Tested in what sense?
> - I've run it on my test project and it works?
> - I've got a test suite and I've run it for each backend?
> - I've got comprehensive test cases which pass for all backends?
>
> I have two reasons for asking:
>
> 1) Your test suite seems very small - especially considering the
> multitude of edge cases that need to be considered. The breadth of
> testing is one way to get indication of how robust the project is.
>
Tested as in I've got a test suite and I've run it for each backend,
BUT, my test suite is not very large yet, and is missing a lot of edge
cases, which is what I'm primarily working on right now.
> 2) Ticket #7835 describes a feature that should hopefully make testing
> schema migration easier - I'd be interested in any feedback you might
> have on this ticket, and what features you would need to make testing
> Migratory easier.
I'll get on that.
>
> Best of luck with Migratory - this isn't a trivial task, and what you
> have shown so far is quite impressive.
>
Thanks Russ!
It doesn't sound like this should be as big a problem as you describe.
Also, keep in mind that database backends are allowed to live outside
django.db.backends - this is how we support external development of
backends. The backend import code from Evolution and from Django
itself should give you a few pointers on how to do this in a location
independent fashion.
>> 2) Performance on very large databases. "for obj in
>> Model.objects.all(): change(obj); obj.save()" is very Pythonic and
>> very easy to read, but will not perform very well on large databases.
>> Having an easy way to drop to raw SQL is essential for these cases.
>
> Yes, my solution to that, although undocumented I realize, is to allow
> migrations to also be completely sql. So they would live right next
> to regular migrations, but have a .sql on them. I was also going to
> make an option to ./mananage.py migrate that is --sql, so it would
> build the migration for you just the same, but as sql.
The 'put the sql in a file' option is the same thing that Evolution
does. Better still would be to be able to 'compile' a python migration
into SQL - this would keep all the auditing DB admins happy - but I
acknowledge that this is a pretty hard ask.
Yours,
Russ Magee %-)
It's not a big problem, I'll figure it out. I took the easy way out,
and will be punished :)
>>> 2) Performance on very large databases. "for obj in
>>> Model.objects.all(): change(obj); obj.save()" is very Pythonic and
>>> very easy to read, but will not perform very well on large databases.
>>> Having an easy way to drop to raw SQL is essential for these cases.
>>
>> Yes, my solution to that, although undocumented I realize, is to allow
>> migrations to also be completely sql. So they would live right next
>> to regular migrations, but have a .sql on them. I was also going to
>> make an option to ./mananage.py migrate that is --sql, so it would
>> build the migration for you just the same, but as sql.
>
> The 'put the sql in a file' option is the same thing that Evolution
> does. Better still would be to be able to 'compile' a python migration
> into SQL - this would keep all the auditing DB admins happy - but I
> acknowledge that this is a pretty hard ask.
Not so hard, there is actually an undocumented command "python
manage.py migratesql" that will pop out the sql that the next
migration would do. It's not documented so far because if you use it
on an edited migration that has custom api commands, they will still
run when popping out the sql, which is a problem. Theoretically if I
could put it into an uncommitted transaction, everything would work
out, but I'm not sure if that works for each backend? So I have to
figure that one out.