Multiple projects, model inheritance, and one common database

56 views
Skip to first unread message

barbara shaurette

unread,
Oct 8, 2008, 12:47:42 PM10/8/08
to Django users, bshau...@tippit.com
We're building a couple of different projects - one social network-y
site, and one that's bloglike. So each has its own unique database,
but they do share one common set of content.

I've created a third "common" project to hold the base models, at
least - the model inheritance seems to be working, but for instances
of each class, each project is still looking to its own database.

Obviously, I don't want to recreate that same content in each database
- I need to be able to maintain it in just one common database. But I
can't figure out how to get the blog and network projects to look to
this common database for this specific piece of content.

I'm just looking for ideas - I realize this falls within the realm of
multi-DB support, which is sort-of-there-but-not-really. Am I going
to have to resort to raw SQL to hit this third database? Has anyone
solved this problem and if so, what was your strategy?

Carl Meyer

unread,
Oct 8, 2008, 2:21:02 PM10/8/08
to Django users
On Oct 8, 12:47 pm, barbara shaurette <bshaure...@gmail.com> wrote:
> We're building a couple of different projects - one social network-y
> site, and one that's bloglike.  So each has its own unique database,
> but they do share one common set of content.

You'll almost certainly have better luck having the two sites share
the same database, and use the Sites framework where needed to
distinguish content for only one or the other, than trying to hack
some solution with two separate DBs and a third shared one.

Carl

barbara shaurette

unread,
Oct 8, 2008, 2:46:54 PM10/8/08
to Django users
Actually, that's been suggested ... and considered. The problem is
that these two initial projects aren't going to be the only ones.
There will be more sites in the future, all slightly different, but
all sharing this common type of content. And putting *all* of them on
the same database is, alas, just not scalable.

barbara shaurette

unread,
Oct 9, 2008, 2:53:13 PM10/9/08
to Django users
*bump*

Seriously - anyone have any ideas?

Malcolm Tredinnick

unread,
Oct 9, 2008, 8:53:37 PM10/9/08
to django...@googlegroups.com

On Thu, 2008-10-09 at 11:53 -0700, barbara shaurette wrote:
> *bump*
>
> Seriously - anyone have any ideas?

Come on, Barbara, it's only been just over 24 hours since you posted.
For something this complex, you're audience of prospective respondees is
pretty limited. After all, you're basically asking how to do
multi-database stuff in Django, so, by definition, it requires knowledge
of the internals.

As it turns out, I do have some ideas, but since they're basically
explaining how the code works, it'll take a bit of time to write up and
I haven't had that time yet. If you wait a bit longer, maybe your
patience will be rewarded.

Regards,
Malcolm

Malcolm Tredinnick

unread,
Oct 10, 2008, 7:24:01 AM10/10/08
to django...@googlegroups.com

On Wed, 2008-10-08 at 09:47 -0700, barbara shaurette wrote:
> We're building a couple of different projects - one social network-y
> site, and one that's bloglike. So each has its own unique database,
> but they do share one common set of content.
>
> I've created a third "common" project to hold the base models, at
> least - the model inheritance seems to be working, but for instances
> of each class, each project is still looking to its own database.

Since model inheritance is really just a shortcut for one-to-one
relations, it's going to be a lot easier to think about this in terms of
explicit relations, particularly when it comes to debugging code. So
let's work with a model setup like this:

class Parent(models.Model):
...

class Child(models.Model):
parent_ptr = models.OneToOneField(Parent)
...

Now, this won't actually work if Child and Parent are stored in
different databases, since cross-database relations don't work. You'll
need to manually manage that relation (parent_ptr will actually need to
be an Integer Field that you set to be the pk of the Parent) -- although
it's possible to work around that with a fair bit of extra work.

> Obviously, I don't want to recreate that same content in each database
> - I need to be able to maintain it in just one common database. But I
> can't figure out how to get the blog and network projects to look to
> this common database for this specific piece of content.
>
> I'm just looking for ideas - I realize this falls within the realm of
> multi-DB support, which is sort-of-there-but-not-really. Am I going
> to have to resort to raw SQL to hit this third database? Has anyone
> solved this problem and if so, what was your strategy?

The reason that people keep saying multiple database support mostly
exists now is for one fairly simple reason. The database connection --
which is the object holding information about engine type, db name,
username, password -- is almost entirely local to the Query class (with
a couple of cavaets I'll mention below). There are still some missing
pieces, but I suspect what's there now will probably work for the
situation you have.

The trick to using this functionality (aside from being prepared to look
at the code in django/db/models/sql/query.py from time to time to work
out how to debug the SQL problems you might run across) is to create a
custom manager on the Parent model that, when it is creating the
QuerySet, explicitly sets the database connection to be a new object
that points to the other database.

When you look at what a django.db.connection is, it's just an instance
of django.db.backend.DatabaseWrapper(...). So to make a version that
points at a different data, you make a different DatabaseWrapper
instance and set that as the connection attribute on the Query object.
Specifically, in your get_query_set() method, you create a QuerySet
object, as per normal, then set the connection:

def get_query_set(self):
qs = QuerySet(self.model)
qs.query.connection = self.set_connection()

where set_connection() is some method you write that returns a
DataWrapper instance set up correctly to connect to the right database.

Now, I mentioned above that you could kind of fake the relation between
Child and Parent. That would require writing a field that was similar to
OneToOneField but knew how to switch to speaking to the right database
at the right time. I was hired to help a client do a similar, but not
identical, thing a couple of months ago and it took us a solid couple of
days to get that working and get all the bugs out. It's not trivial (and
it is a motivation for me to rewrite large swathes of the internals of
related fields for Django 1.1 or 1.2, since we need to do this regularly
for multiple-database support, at a minimum for error checking).

Simulating the relation in Python code at a more manual level is
probably easier. You write a custom save() method on your model, leave
parent_ptr as an IntegerField, and use normal Python attributes to hold
the data for the parent fields. Then your save() method will create a
Parent instance (or fetch an existing one), fill in its fields with the
data from the Child instance, save the Parent, copy the parent's id into
the parent_ptr field and the call the normal save() on the Child. Sounds
more complicated in prose than it probably is in code. All you're doing
is manually copying over the data from the child to the parent, saving
the parent to its own database (using its custom manager) and then
copying back the reference manually and saving the child.

Finally, the couple of cavaets I mentioned with respect to changing
query.connection. Firstly, although it would work to have one connection
be to, say, a MySQL database and another one to a PostgreSQL database,
you can't mix-and-match with Oracle databases here (or anything else
that uses a custom Query class). The reason is that the custom Query
class is installed in the django.db.models.sql.query namespace at import
time and you can't switch back and forth. That's one of the things we
need to fix for proper multi-db support, but it's fairly fiddly.

Secondly, when you pickle (which is the pre-requisite for caching) a
Query object -- which also happens when you pickle a QuerySet -- the
unpickling process restores the global database connection object. So
don't pickle something where you've set it to have a manual connection
of some kind. The reason for this is that database connections can't be
pickled, in general, so we wipe them out before pickling and have to
restore them upon unpickling. And we currently assume there's only a
single connection (since Django doesn't really support multiple database
stuff). There's an XXX-prefixed comment in the Query class that reminds
people of that (in the __setstate__ method).

So that should give you a reasonable leg-up on the work required.
Basically, write a custom manager to set the database connection on the
queryset before it returns it (for non-default connections). And manage
the inter-model relations manually through an overridden save() method
or similar, unless you really, really want to write a new related field.

Sing out if you have any questions about this.

Regards,
Malcolm

Reply all
Reply to author
Forward
0 new messages