Final call for feedback: Multi-db

6 views
Skip to first unread message

Russell Keith-Magee

unread,
Dec 18, 2009, 2:43:34 AM12/18/09
to Django Developers
Hi all,

This is a second and final call for feedback on the multidb branch.

Barring any objections or the discovery of major problems, my
intention is to commit this early next week, hitting the alpha 1
feature deadline by the skin of our collective teeth :-)

There has been one big change since the last call for feedback -
thanks to Justin Bronn, GIS is now fully multi-db compliant.

There have also been a couple of small changes - mostly integrating
the contrib applications with multidb features. For example, the
contenttypes app now maintains a cache of content type objects that is
multi-db aware.

The only really visible change is a new 'db_manager()' operator on
Managers. This is used to obtain a Manager instance that is bound to a
specific database. This is required because:

Author.objects.using('foo')

will return a QuerySet that is bound to foo - however, methods like
User.objects.create_user(...) and
Permission.objects.get_by_natural_key(...) are on the Manager, not the
QuerySet.

So, you can now call:

Author.objects.db_manager('foo')

which will return a Manager (Author's default manager) that is bound
to the foo databse. Subsequent calls to using() can change this
binding if necessary.

At the last call for feedback, questions were raised about admin
support. I've done some investigation, and it turns out that you can
write ModelAdmin definitions that bind a model to a different
database. I'm in the process of documenting the exact steps that are
required. Coming up with a pretty integration layer with admin will be
left for Django 1.3, when we will be addressing the issue of how to
provide a good public interface to multidb for common use cases
(master/slave, sharding, etc)

As always, the code is available in the multi-db SVN branch:

http://code.djangoproject.com/svn/django/branches/soc2009/multidb/

or from Alex's github branch:

http://github.com/alex/django/tree/multiple-db

Again, any and all feedback welcome.

Yours,
Russ Magee %-)

Brett Hoerner

unread,
Dec 18, 2009, 12:41:34 PM12/18/09
to Django developers
I'm not sure if 1.2 intended to fully support read-slaves, but I'll
post this quick anyway as we've just run into it while trying to
upgrade at DISQUS.

You might think that having support for multiple databases implies
that using a read-slave would Just Work, and that's mostly true.
There's one edge case I've run into when you try to relate objects
using instances created from different mirrors of the same database.
Because of the checks against instance._state.db you can't select an
object from a read-slave and assign it to a foreign key relation (and
probably other relations) on the master, even though you know this is
a mirror and not a different database.

Here's a quick code example: http://dpaste.org/Bozd/

The only solution I've thought of (and I haven't thought long, I just
ran into this) is another database setting where you could tell Django
that this DB is a mirror of another (by name?) so that
instance._state.db on a read-slave created object actually holds the
value of DATABASES['that_read_slave']['mirror_of'] (or whatever key).
In other words a User selected from 'read_slave' might actually have a
user_instance._state.db value of 'default', because that's it's true
home, and any relations should be compared to that.

I would think read-slaves would be a pretty common application of
multidb, but I can only speak to our use case. I know it's a bit late
in the game, but we'll have to work up our own local fix or go with a
proper one before we can deploy 1.2. And to think I was so happy
about how many local Django patches I was able to remove going from
1.0->1.2. ;)

Amazing work, Alex & Russell, many thanks.

Regards,
Brett

On Dec 17, 11:43 pm, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:

Alex Gaynor

unread,
Dec 18, 2009, 12:50:33 PM12/18/09
to django-d...@googlegroups.com
> --
>
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
>

Thanks for the feedback Brett (sorry I didn't get back to your original email).

I think you're right that this is an important usecase, any case where
identical pks indicate mirroring across different DBs this will fail.
I'm wondering if perhaps the most prudent thing to do would be to
simply remove this check. The end result will be you'll get an
integrity error on Postgres/Oracle when you try to save (and SQLite
and MySQL will just let you do whatever).

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Brett Hoerner

unread,
Dec 18, 2009, 12:56:14 PM12/18/09
to Django developers
On Dec 18, 9:50 am, Alex Gaynor <alex.gay...@gmail.com> wrote:
> I'm wondering if perhaps the most prudent thing to do would be to
> simply remove this check.  The end result will be you'll get an
> integrity error on Postgres/Oracle when you try to save (and SQLite
> and MySQL will just let you do whatever).

That would certainly work for us. I guess it's really a question of
how it'd affect users with multiple truly distinct databases. I think
it's a nice error for people "new" to Django and the concept of
multiple databases, but when you're using more than one DB already...

Brett

Jani Tiainen

unread,
Dec 19, 2009, 4:46:04 AM12/19/09
to django-d...@googlegroups.com
On Fri, Dec 18, 2009 at 9:43 AM, Russell Keith-Magee <freakb...@gmail.com> wrote:
Hi all,

This is a second and final call for feedback on the multidb branch.

Barring any objections or the discovery of major problems, my
intention is to commit this early next week, hitting the alpha 1
feature deadline by the skin of our collective teeth :-)


Haven't run any tests, but as a small request - I would be very happy that you guys take a look ticket #11017 it's quite performance killer to some selects on char fields (specially startswith) on Oracle.

Jacob Kaplan-Moss

unread,
Dec 19, 2009, 9:23:53 AM12/19/09
to django-d...@googlegroups.com
On Sat, Dec 19, 2009 at 3:46 AM, Jani Tiainen <red...@gmail.com> wrote:
> Haven't run any tests, but as a small request - I would be very happy that
> you guys take a look ticket #11017 it's quite performance killer to some
> selects on char fields (specially startswith) on Oracle.

This has nothing to do with multidb; let's try to keep things on
track, please. We've got about three months of bug fix time set aside
soon; right now we're working on getting the last few features into
1.2.

Jacob

Russell Keith-Magee

unread,
Dec 19, 2009, 9:48:23 AM12/19/09
to django-d...@googlegroups.com

You're right - read slaves are an intended common use case, and the
cross-database checks will get in the way for that case.

I quite like the solution you have proposed. However, the exact
interpretation and consequences need to be thought about a bit - the
issue of whether an object originating from the slave gets state.db
set to the slave it came from or the master is one such problem. I
strongly suspect the answers won't be too complicated (or particularly
counterintuitive), but I don't want to rush it just in case we get it
wrong on the first attempt.

For the purposes of commit next week, I'll strip out the
cross-database checks. We can always add stronger checks once we've
got the code out there in the alpha release and we've had time to have
a full discussion about how to address the master/slave problem
properly.

Yours,
Russ Magee %-)

Brett Hoerner

unread,
Dec 22, 2009, 6:06:15 PM12/22/09
to Django developers
On Dec 19, 6:48 am, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:

> You're right - read slaves are an intended common use case

I know the branch landed but I'd like to mention another issue
regarding read-slaves, hope that's OK. :)

Running tests against code that uses master and read-slaves (but
actually point at the same exact DB while testing) is currently not
possible. When the tests begin each DB is expected to be started fresh
(the runner stops if it cannot), so you can't use the same DB name/
host for two entries in settings.DATABASES. I think it's awfully
common for people not develop with *actual* read-slaves on their local
machine, but rather to use the abstractions available and point their
"read-slave" at the same information as their default DB.

I could just set TEST_NAME for the read-slave to something else, but
in our production code and tests (as an example) we'll have some test
inserting data on master and another using it via a read-slave.
Unless I'm missing something, we'd have to create all data on both the
master and read-slaves in order to test properly, which is awfully
awkward and possibly confusing.

Any thoughts? Off the top of my head all I can think of is that the
test setup could check if any DBs match up in name/host and not try to
drop/create after the first.

Thanks,
Brett

Russell Keith-Magee

unread,
Dec 22, 2009, 7:27:25 PM12/22/09
to django-d...@googlegroups.com

Hrm... interesting problem. In addition to your name-matching
approach, I can think of two approaches off the top of my head:

* Allow TEST_NAME=None to mean "don't try and instantiate this
database in test mode"

* Allow a top level TEST_DATABASES setting; TEST_DATABASES would
override DATABASES; if TEST_DATABASES isn't defined, then TEST_NAME
would be used.

I'll need to cogitate on this over my Christmas pudding :-) Any other
suggestions welcome.

Yours,
Russ Magee %-)

Brett Hoerner

unread,
Dec 22, 2009, 7:55:19 PM12/22/09
to Django developers
On Dec 22, 4:27 pm, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:

>  * Allow TEST_NAME=None to mean "don't try and instantiate this
> database in test mode"

That sounds good, too.


>  * Allow a top level TEST_DATABASES setting; TEST_DATABASES would
> override DATABASES; if TEST_DATABASES isn't defined, then TEST_NAME
> would be used.

This would have to function differently than the current DATABASES
setting, though, right? Otherwise I don't see how the same problem
would be avoided ('default' and 'read_slave' both point the same
physical DB).


> I'll need to cogitate on this over my Christmas pudding :-) Any other
> suggestions welcome.

Awesome, thanks much.


Brett

Craig Kimerer

unread,
Dec 22, 2009, 11:58:06 PM12/22/09
to django-d...@googlegroups.com
On Tue, Dec 22, 2009 at 4:55 PM, Brett Hoerner <br...@bretthoerner.com> wrote:
On Dec 22, 4:27 pm, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:
>  * Allow TEST_NAME=None to mean "don't try and instantiate this
> database in test mode"

That sounds good, too.

If I was using the slaving part of multi-db i'd be very likely to write some internal code that made it so if you tried to save (alter) an object retrieved from a slave it'd either 1) log a nasty error message or 2) raise an exception.  If this approach was used, I would be unable to write any kind of unit test against code that depended on stuff coming from a [master /] slave.

Unfortunately, I don't have a better suggestion at this time.

Craig

Michael Manfre

unread,
Dec 23, 2009, 12:32:23 AM12/23/09
to Django developers
With multiple database defined, what is the expected behavior for
syncdb and the other db related commands? The documentation shows that
it is relatively easy to associate an admin form with a given
database, but is there a way of associated a model or app to a given
database?

Regards,
Michael Manfre

Russell Keith-Magee

unread,
Dec 23, 2009, 4:04:11 AM12/23/09
to django-d...@googlegroups.com
On Wed, Dec 23, 2009 at 1:32 PM, Michael Manfre <mma...@gmail.com> wrote:
> With multiple database defined, what is the expected behavior for
> syncdb and the other db related commands?

The management commands all work the same way under multidb - they
only ever work on a single database at a time. If you don't specify a
database, the 'default' databse is used.

> The documentation shows that
> it is relatively easy to associate an admin form with a given
> database, but is there a way of associated a model or app to a given
> database?

Yes - ish. If you're working with your own application and models,
just define a custom manager for that model. The manager just needs to
override get_query_set() and applies a using() modifier:

class PersonManager(models.Manager):
def get_query_set(self):
return super(PersonManager, self).get_query_set().using('other')

class Person(models.Model):
objects = PersonManager()
name = models.CharField(max_length=50)
...

Unfortunately, this approach doesn't work for a model in a reusable
app - for example, you can't easily push contrib.auth.User to a
different database. However, you can just call
User.objects.using('other').... whenever you want to use the User
model.

I know this is less than ideal, but the goal for 1.2 was to complete
the plumbing and get the important porcelain in place (e.g., using()).
The goal for 1.3 is to identify the common end-user use cases for
multidb and make some easily exploitable hooks for end-users.

Yours,
Russ Magee %-)

Joe

unread,
Jan 4, 2010, 10:40:16 AM1/4/10
to Django developers
Has this code been merged to a 1.2 alpha build somewhere or is the
multi-db branch still the current release? Only asking because the
first message in the thread indicated a schedule which meant the code
would be merged in before EOY and I just want to make sure I'm on the
right codebase moving forward :)

Thanks,
Joe

On Dec 23 2009, 4:04 am, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:


> On Wed, Dec 23, 2009 at 1:32 PM, Michael Manfre <mman...@gmail.com> wrote:
> > With multiple database defined, what is the expected behavior for

> > syncdb and the otherdbrelated commands?

Alex Gaynor

unread,
Jan 4, 2010, 10:43:42 AM1/4/10
to django-d...@googlegroups.com
Yes, multiple database support was merged into trunk on December 22:
http://www.djangoproject.com/multidb-changeset/

Alex

> --
>
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
>

--

Brett Hoerner

unread,
Jan 5, 2010, 11:42:26 AM1/5/10
to Django developers
On Dec 22 2009, 4:27 pm, Russell Keith-Magee <freakboy3...@gmail.com>
wrote:

> I'll need to cogitate on this over my Christmas pudding :-)

Did you come to any conclusions, or need any more feedback on the read-
slave testing issue?

I in no way mean to rush, I just wanted to make sure I didn't (and
don't) miss anything as I'm more than happy to test any changes
against our rather large and quirky setup.

Thanks,
Brett

Russell Keith-Magee

unread,
Jan 5, 2010, 11:09:31 PM1/5/10
to django-d...@googlegroups.com
On Wed, Jan 6, 2010 at 12:42 AM, Brett Hoerner <br...@bretthoerner.com> wrote:
> On Dec 22 2009, 4:27 pm, Russell Keith-Magee <freakboy3...@gmail.com>
> wrote:
>> I'll need to cogitate on this over my Christmas pudding :-)
>
> Did you come to any conclusions, or need any more feedback on the read-
> slave testing issue?

I haven't reached any conclusions - my pudding was not enlightening in
this regard. :-)

The complication is that without a good model of how master/slave will
work in normal code, it's difficult to work out how the test framework
should behave. My suggestion of TEST_NAME=None or your suggestion of
TEST_NAME= (default TEST_NAME) are both relatively simple to
implement, but I don't want to rush into implementing something that
doesn't actually help in practical terms.

If you're actually doing master/slave in the wild, your guidance may
actually be more enlightening than my theoretical navel gazing. In
particular - how have you got master/slave configured? How do you find
and select slave databases? How does that approach degrade when
DATABASES suddenly has less entries (including the case of a config
with no slaves)?

> I in no way mean to rush, I just wanted to make sure I didn't (and
> don't) miss anything as I'm more than happy to test any changes
> against our rather large and quirky setup.

You certainly haven't missed anything. There have been a couple of
discussions on separate threads about cross-database joins, but they
haven't turned into any trunk commits yet.

Yours,
Russ Magee %-)

Brett Hoerner

unread,
Jan 7, 2010, 2:17:34 PM1/7/10
to Django developers
On Jan 5, 8:09 pm, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> If you're actually doing master/slave in the wild, your guidance may
> actually be more enlightening than my theoretical navel gazing. In
> particular - how have you got master/slave configured? How do you find
> and select slave databases? How does that approach degrade when
> DATABASES suddenly has less entries (including the case of a config
> with no slaves)?

Yes, we're actually doing read-slave queries on Django 1.0.x using
some private API hacks.

We basically have the same layout of DATABASES that multidb went with,
but we only use different managers to dispatch queries. In other
words, `Foo.rs_objects.all()' vs `Foo.objects.all()'. It's pretty
basic, but it's worked for us.

So that's equivalent to the `using' syntax, you can just imagine we
have places in our code where the developer knows that read-slave
replication isn't a problem and we want to offload a query, so
`Foo.objects.using('read_slave')...' is used. We don't do any special
selection right now, `DATABASES['read_slave']' is hard coded per
deployment instance, different instances might use different read-
slaves for various reasons but those reasons also require us to use
whole different app servers too, and so those requests are chosen by a
frontend proxy rather than some in-app magic.

Anyway, most of that doesn't really matter, I think. What matters is
that we don't do any special degrading if `DATABASES' is different.
As soon as you use `using' (or our equivalent) you're hard coding the
use of another DB name, so in development we just have `DATABASES
['read_slave']' use the same settings as `default' does.

So in the end the `TEST_NAME=None' solution works well for our case at
least, I would imagine for any number of read-slaves you'd want to be
able to point them at the `default' DB (without doing a dump and sync)
during tests - I mean, that's what a read-slave is, no?

Regards,
Brett

Russell Keith-Magee

unread,
Jan 7, 2010, 10:33:36 PM1/7/10
to django-d...@googlegroups.com

I completely agree that you don't need to have read slaves in order to
test application logic (unless, of course, you're checking your read
slave selection behavior).

However, I'm a little confused as to how your setup will work with the
change you propose, If you have a database setup with:

"read-slave": { ... TEST_NAME=None },

then my understanding of your proposal is that the only change is that
read-slave won't get created under the test setup. But doesn't that
mean that::

MyModel.objects.using('read-slave').filter(...)

will fall over? Either the read-slave alias won't exist in DATABASES
(if we fully clean up the DATABASES setting), or the read-slave alias
will point to a database with no name.

In your opinion, how does using() (or the test framework) compensate
for a database alias that is referenced in code, but non-existent
during testing?

FYI - I've opened #12542 to track this particular issue. I've also
opened #12541 to track the cross database validation/read slave
identification problem you raised earlier.

Yours,
Russ Magee %-)

Brett Hoerner

unread,
Jan 8, 2010, 1:24:02 PM1/8/10
to Django developers
On Jan 7, 7:33 pm, Russell Keith-Magee <freakboy3...@gmail.com> wrote:
> then my understanding of your proposal is that the only change is that
> read-slave won't get created under the test setup. But doesn't that
> mean that::
>
>     MyModel.objects.using('read-slave').filter(...)
>
> will fall over?

No, not in my mental image of the setup. Take the following,

DEFAULT_ENGINE = 'postgresql_psycopg2'
DEFAULT_NAME = 'my_database'

DATABASES = {
'default': {
'ENGINE': DEFAULT_ENGINE,
'NAME': DEFAULT_NAME,
},
'read_slave': {
'ENGINE': DEFAULT_ENGINE,
'NAME': DEFAULT_NAME,
'TEST_NAME': None,
},
}

So the important thing here is that 'read_slave' *is* defined, but in
my local test settings it uses the same `DATABASE_NAME' (and host,
user, password, port) as my `default'. The `TEST_NAME = None' change
will simply allow me to get past the error caused when `read_slave'
tries to drop and create a database that `default' has an open session
to (it just dropped and created itself, after all).

Now in code like,

MyModel.objects.using('read-slave').filter(...)

That should be a valid connection (`DATABASES['read_slave']') but it's
actually connecting to the exact same DB as `default', so a filter
should find objects created on `default', just like you'd imagine in a
real world read-slave setup.

Does that make more sense? There's really nothing magic going on
here, it's only a matter of telling it not to drop/create the DB. I
think maybe `TEST_NAME = None' could be confusing? I didn't mean to
imply that the alias wasn't properly setup and functional.

Regards,
Brett

Russell Keith-Magee

unread,
Jan 19, 2010, 10:27:12 AM1/19/10
to django-d...@googlegroups.com
On Sat, Dec 19, 2009 at 10:48 PM, Russell Keith-Magee
<freakb...@gmail.com> wrote:
> On Sat, Dec 19, 2009 at 1:41 AM, Brett Hoerner <br...@bretthoerner.com> wrote:
>>
>> I would think read-slaves would be a pretty common application of
>> multidb, but I can only speak to our use case.  I know it's a bit late
>> in the game, but we'll have to work up our own local fix or go with a
>> proper one before we can deploy 1.2.  And to think I was so happy
>> about how many local Django patches I was able to remove going from
>> 1.0->1.2. ;)
>
> You're right - read slaves are an intended common use case, and the
> cross-database checks will get in the way for that case.

Hi Brett,

FYI - I've just uploaded a patch to #12540 that implements something
similar to what you proposed, along with a few other fixes. I'd be
interested in hearing any feedback you may have.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
Jan 19, 2010, 10:28:16 AM1/19/10
to django-d...@googlegroups.com
On Tue, Jan 19, 2010 at 11:27 PM, Russell Keith-Magee

Oh - and one more thing - I haven't forgotten about the TEST_NAME
issue - I just wanted to get the foreign key and read-slave stuff
working first. multi-db testing is next on my list.

Russ %-)

Reply all
Reply to author
Forward
0 new messages