> Since currently, forwards and backwards in Migration rely on the
> global db object, their signature would need to change to include a db
> argument whose value is the database object for the current database
> the migration is being applied to. So, a forwards function definition
> would look like:
>
> def forwards(self, orm, db):
> db.add_column(...)
This would be backwards-compatible if south detected old migrations
(which don't take the db parameter) and call them "correctly" with the
default DB.
> An issue to consider is dependency checking and the addition of
> migration history records to different databases. I haven't come up
> with a solution I'm satisfied with here. I feel that all databases
> should include their own migration history table, but I'm uncertain if
> each should contain records for all migrations or only those which
> apply to that particular database.
I think for the purposes of depends_on, it'd be better to have a
record in each migration history table -- in my mind, these migrations
were "applied", even if that application was a no-op.
To keep this distinction clear, perhaps applies_to is a bad naming -
perhaps something more like "mutates" or "alters".
> Another issue is transaction management. Currently, a migration is an
> atomic unit, and a transaction can safely be committed once the
> migration succeeds on a single database. However, if a migration is
> being applied to multiple databases and succeeds on one but fails on
> another, what should be done? Should we live the databases in
> different states, or should we wait to commit all transactions until
> we know the migration has succeeded on all databases it applies to?
Just to be clear, this issue only applies to DBs that allow
transactional DDL at all, and of the DBs south supports, only pg and,
debatably, SQL Server, support it. If a DB doesn't support
transactional DDL, all bets are off for a safe multi-db migration (or
any migration, really).
I think holding open the transactions until all are completed would
only be good if a migration were applied to all aliases in parallel -
otherwise, throughput on all shards (for example) would be hobbled for
an O(N=# shards) period. I get the issue of code being unaligned
with some (but not all) DBs.
There's also that chance that even after all migration work has been
done, one or more of the DBs fail the transaction upon attempting
commit - commit on A succeeds, B goes down before commit can succeed.
This is admittedly less of a problem than the chance that a given
migration has a bug or hits deadlock or ...
Would it be useful to think through different reasons people use multi-db?
1) write sharding
2) read slave
3) multi-master
4) unit-of-control (data from different sources/departments/cultures)
5) unit-of-access (some data is random-access, some is localized)
Are there other scenarios? Are any of these badly served by
applies_to + held-open transactions, possibly parallelized?
For what it's worth, our (Votizen) immediate need is # 4, and we're on
MySQL, so this idea of transactional DDL is aside from our immediate
need. :-/
This is why the import at the top of migrations is "from south.v2 import
..." - new migrations can be v3, and thus the change can be detected
very easily.
>
>> An issue to consider is dependency checking and the addition of
>> migration history records to different databases. I haven't come up
>> with a solution I'm satisfied with here. I feel that all databases
>> should include their own migration history table, but I'm uncertain if
>> each should contain records for all migrations or only those which
>> apply to that particular database.
>
> I think for the purposes of depends_on, it'd be better to have a
> record in each migration history table -- in my mind, these migrations
> were "applied", even if that application was a no-op.
>
> To keep this distinction clear, perhaps applies_to is a bad naming -
> perhaps something more like "mutates" or "alters".
This is a very, very tricky thing. Really, it should be the case that if
you swap some database X for a fresh one, and re-run South, it applies
the correct migrations to that database.
To that end, I think the best direction is to record all migrations in
all databases (and also write the running code that way) - migrations
that are not destined for the currently-running database are simply
no-opped and put in the record table anyway.
>
>> Another issue is transaction management. Currently, a migration is an
>> atomic unit, and a transaction can safely be committed once the
>> migration succeeds on a single database. However, if a migration is
>> being applied to multiple databases and succeeds on one but fails on
>> another, what should be done? Should we live the databases in
>> different states, or should we wait to commit all transactions until
>> we know the migration has succeeded on all databases it applies to?
>
> Just to be clear, this issue only applies to DBs that allow
> transactional DDL at all, and of the DBs south supports, only pg and,
> debatably, SQL Server, support it. If a DB doesn't support
> transactional DDL, all bets are off for a safe multi-db migration (or
> any migration, really).
>
> I think holding open the transactions until all are completed would
> only be good if a migration were applied to all aliases in parallel -
> otherwise, throughput on all shards (for example) would be hobbled for
> an O(N=# shards) period. I get the issue of code being unaligned
> with some (but not all) DBs.
>
> There's also that chance that even after all migration work has been
> done, one or more of the DBs fail the transaction upon attempting
> commit - commit on A succeeds, B goes down before commit can succeed.
> This is admittedly less of a problem than the chance that a given
> migration has a bug or hits deadlock or ...
I say the migrations should be applied in series to each database; i.e.
for migration in migrations:
for database in databases:
migration.run(database)
If a migration fails, roll it back on that database it failed on, and
abort there. Then, when South re-runs, database A has it applied (and a
record to that affect), B doesn't, and so it only gets run on B.
There might be some work needed to the "migrate" command to allow
migrating forwards/backwards on single databases, should they get out of
sync.
> Would it be useful to think through different reasons people use multi-db?
>
> 1) write sharding
> 2) read slave
> 3) multi-master
> 4) unit-of-control (data from different sources/departments/cultures)
> 5) unit-of-access (some data is random-access, some is localized)
>
> Are there other scenarios? Are any of these badly served by
> applies_to + held-open transactions, possibly parallelized?
>
> For what it's worth, our (Votizen) immediate need is # 4, and we're on
> MySQL, so this idea of transactional DDL is aside from our immediate
> need. :-/
All of those I've heard of being used, but 2) certainly doesn't matter
with South - the read slaves should propagate the schema changes.
I'm still in favour of not making this too automatic, as we'll never be
everything to everyone. However, I think the idea as a whole is a decent
improvement on the current (and sorry) state of affairs.
Andrew
All of this sounds reasonable. I think I have enough direction to get started.
> --
> You received this message because you are subscribed to the Google Groups "South Users" group.
> To post to this group, send email to south...@googlegroups.com.
> To unsubscribe from this group, send email to south-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/south-users?hl=en.
>
Two points:
1) A migration -- in particular, a data migration -- may well fail only on
commit, because Foreign Key constraints are deferred; which means they only
get checked on commit.
2) People who want to have unified transactions across databases use 2-phase
commit. It doesn't quite ensure the transactions' state consistency, but it
makes things a lot better. When multiple databases were introduced into
Django, the idea of dealing with 2-phase commit was rejected; developers
preferred the attitude of documentation and prayers. While you can guess I
disapprove, I see no sense in making South holier than Django-core in that
regard.
> I say the migrations should be applied in series to each database; i.e.
>
> for migration in migrations:
> for database in databases:
> migration.run(database)
>
> If a migration fails, roll it back on that database it failed on, and
> abort there. Then, when South re-runs, database A has it applied (and a
> record to that affect), B doesn't, and so it only gets run on B.
>
I agree.
> > Would it be useful to think through different reasons people use
> > multi-db?
> >
> > 1) write sharding
> > 2) read slave
> > 3) multi-master
> > 4) unit-of-control (data from different sources/departments/cultures)
> > 5) unit-of-access (some data is random-access, some is localized)
> >
> > Are there other scenarios?
See https://code.djangoproject.com/wiki/MultipleDatabaseSupport and (linked
from it) http://groups.google.com/group/django-
developers/browse_thread/thread/9f0353fe0682b73?pli=1
I think that whole thread should be required reading for anyone wishing to
implement multi-db support. It includes discussions and arguments and
reasonings for decisions taken and whatnot.
Also, I'm not sure if my own favorite use case (separate transaction control:
writing a log to the database, which gets committed whether or not the "main"
transaction is committed) is covered by 4) above.
Have fun,
Shai.
Unfortunately this doesn't answer how to manage different models in
different databases. For example, model.A only goes in database 'A'
and model.B only in database 'B'. looking through the south.changes
source code, the following statement pops out:
# Don't do anything for unmanaged, abstract or proxy models
if model._meta.abstract or getattr(model._meta, "proxy", False) or not
getattr(model._meta, "managed", True):
continue
This says that if the model.A._meta.managed variable is False, then
the model will be ignored.
Before south migration is called, by:
1. playing around with the "_managed" attributes,
2. changing the south.db.db database, and
3. managing migrations separately for each database,
it might be possible to get all multi-db working as it. should.what do
you think?
rosen,
Just to be clear, it happens fairly often that allow_syncdb will
return True for multiple DBs for any given model. This is useful in
sharding, and is the reason allow_relation exists - any 2 model
instances might be sharded to the same replica (along some
app-convenient partition). It's not correct to assume that any model
would go to one (and only one) DB.