As perhaps was inevitable, I'm proposing implementing part of a schema 
migration backend in Django. I'm not sure if this is a 1.3 thing (it may 
well be 1.4, perhaps with some implemented in time for 1.3 but not 
exposed), but it's something I'd like to get started in this release 
cycle. To make it clear, I'm happy to make all these changes - I'm alre
Firstly, let me make it clear that this is not a proposal to merge South 
into core. Quite the opposite, in fact, the idea is to keep the option 
open to have different migration frontends available and in fact make 
the implementation of them much easier.
Secondly, this particular proposal is something that me and Russ have 
preliminarily agreed on here at the sprints - however, I'd really like 
people to suggest changes and things we may have missed. Implementing 
this essentially consists of drawing a line of how much of a migrations 
framework we'll implement in Django, and I'm only mostly sure we have it 
in the right place.
The first part of the proposal is pretty uncontroversial, and it's to 
implement schema-changing operations on the backends. Specifically, the 
proposed new operations are:
  - add_table
  - delete_table
  - rename_table
  - add_column
  - rename_column
  - alter_column
  - delete_column
  - add_primary_key
  - delete_primary_key
  - add_unique
  - delete_unique
  - add_index
  - delete_index
Some of these operations are already mostly implemented (add_table, 
add_index, etc.) in backends' creation modules, but they'll need a bit 
of rearranging and separating into a full public API. I also plan to 
modify them to take model names and field names, instead of table names 
and column names, so the API is exclusively using the Django model layer 
to represent changes (there's a possibility that some changes make sense 
for schemaless databases as well, specifically renames, so it's best not 
to tie it directly to relational databases).
(Additionally, this means that if someone has specified the table name 
or column name directly using something like Meta: db_table then we'll 
need to have those as either extra arguments to the function or as 
marked strings - e.g mark_raw('auth_users'))
I expect this will take a while and be quite fiddly, but we have the 
codebases of django-evolution and South to draw on for the modification 
code, so there's not much new discovery and backend-specific bugfixing 
to be done.
Additionally, unlike in current South, backends will always be able to 
generate SQL for operations, but it won't necessarily be runnable 
(things like index names can only be resolved at runtime, so you'd get 
code like "DROP INDEX <<User.username-index>> ON users;". We feel this 
is a pretty good tradeoff between being able to actually work with 
things like index names (they're basically nondeterministic) while also 
satisfying people who like to read the SQL that's going to be run.
The second part is to implement migration tracking, dependency resolving 
and basic running into Django. There will be a core contract migrations 
have to follow:
  - Migrations are per-application
  - Applications have a directory (usually appname/migrations/, but 
configurable via a setting like SOUTH_MIGRATION_MODULES does now), which 
contains zero or more .sql and .py files
  - Inside an application, migrations are implicitly ordered by name (by 
string sort, so "0001_initial" is before "0002_second", and "alpha" is 
before "beta", but "11_foo" is not before "2_bar").
  - Migrations are uniquely identified by the combination of their app 
label and their migration name.
  - There will be a table, probably "migration_history" or similar, 
which records which migrations have been applied, and when.
  - Django will ship with a "migrate" command, which will work out what 
migrations to run, and run them. There will be an automatic mode which 
runs dependencies, and a manual mode where you say if you'd like to run 
each migration (and ones that are missing dependencies it tells you 
about, but you're not allowed to run).
As for the migration files themselves, the idea is to provide a very 
basic interface that means that apps (and Django itself, potentially) 
can ship migrations that have no dependencies, but that still allows 
third-party tools like South to exist that will provide ORM access and 
autogeneration.
.py migration files will be a normal Python module, and should have a 
"migrate" callable, which will get called with three arguments (a 
connection/operations instance, much like 'south.db.db', reverse, a 
boolean saying if the migration should run backwards, which will be 
entirely optional and some migrations will just raise an error, and 
dry_run, which indicates if the migration should just run through and 
check there's no obvious calling errors, which is useful for catching 
errors on MySQL before the SQL gets sent to the database).
The files can also optionally have a __depends__ variable in scope, 
which should be an iterable of (app_label, migration_name) or 
(app_label, migration_name, reverse_dependency) tuples - this is used to 
calculate the dependencies for this migration.
The migration name of a .py file is simply the filename with the 
extension removed.
.sql migration files will just be loaded and run as-is. Dependencies can 
be declared with comments at the top of the file (such as "-- depends: 
auth 0002_remove_username"). Because SQL is database-dependent, 
filenames can be of two formats: "migration_name.sql" or 
"migration_name.backend_name.sql". Django will attempt to run the 
database-specific one first, and then fall back to the 'generic' one.
The idea behind all of this is to allow reusable apps to ship with 
migrations using an engine or generator of their choosing, and still 
have them interact correctly with everything else. For example, Django 
might ship a migration like this:
   from django.db import models
   __depends__ = [("auth", "0002_remove_username")]
   def migrate(connection, reverse, dry_run):
       if reverse:
connection.ops.add_column("auth.User", "username", 
models.CharField(max_length=100))
       else:
           connection.ops.delete_column("auth.User", "username")
And a future South might make migrations like this (I'm not proposing 
this as the future, it needs improvement, but it's an example):
   from south.v3 import SchemaMigration
   __depends__ = [("auth", "0002_remove_username")]
   class migrate(SchemaMigration):
       def forwards(self, db, orm):
           db.delete_column("auth.User", "username")
       def backwards(self, db, orm):
           db.add_column("auth.User", "username", 
self.gf("django.db.models.fields.CharField")(max_length=100))
Here, the SchemaMigration class' constructor is the thing that takes 
(connection, reverse, dry_run), and then delegates to the appropriate 
methods and uses a few wrappers.
That's the proposal, then. The grounding idea is to provide a consistent 
framework for migrations to run in, and absorb all the parts that really 
should be done once and done well (backend-specific implementations, 
dependency resolvers, etc.). The backend changes obviously have to go 
into core, while the tracking, dependency resolution and management 
commands should, I propose, go into a "django.contrib.migrations".
There's the issue of MultiDB, as always, but my proposal for that is to 
allow some mechanism to select the migrations directory per database 
alias (be that in a router or a setting), and then have a --database 
option on migrate - there's already going to be a way to provide 
directories that aren't appname/migrations/, so this won't be too much 
of an addition. That allows people who are separating tables to have 
entirely separate migration sets for each database, and people who are 
sharding, etc. to have them all pointing at the same set.
Criticisms, changes, and observations are very welcome. This is the kind 
of thing I really want to be done, and be done right first time.
Andrew
For the record, this was supposed to end "I'm already pretty familiar 
with both the model layer and migrations, and I feel like using up all 
my free time."
I'll hit send less quickly next time.
Andrew
On Fri, May 28, 2010 at 12:06 PM, Andrew Godwin <and...@aeracode.org> wrote:
>  - Inside an application, migrations are implicitly ordered by name (by
> string sort, so "0001_initial" is before "0002_second", and "alpha" is
> before "beta", but "11_foo" is not before "2_bar").
'11_foo' would sort before '2_bar', obviously.
'1_bar' would not sort before '11_foo', which I think is the point
Andrew is trying to get across - this will be an alphanumeric sort,
you must prefix with '0' if you want predictability.
Cheers
Tom