Multiple database support

11 views
Skip to first unread message

koenb

unread,
May 20, 2008, 9:17:39 AM5/20/08
to Django developers
For those interested in multiple database support, I have started
working on it again, and posted my work-in-progress to ticket #4747.

I started from trunk and added things from the multidb branch little
by little, since so much has changed in that area since then.
There is still a lot more that needs to be checked and a number of
things to be redone. References to the default connection object are
all over the codebase, there is still a lot of work left to get all of
them straightened out.

What is working (more or less):
- using existing databases (though there might still be quirks when
using different engines)
- running tests
- some of the management commands (eg sqlall, sqlflush, but loaddata,
inspectdb or syncdb are not quite there yet)

Important to mention is that relations across databases are not
supported.

Anyway, if anyone is interested in helping, please let me know!

Koen

Nicola Larosa (tekNico)

unread,
May 20, 2008, 10:56:04 AM5/20/08
to Django developers
koenb wrote:
> For those interested in multiple database support, I have started
> working on it again, and posted my work-in-progress to ticket #4747.
> ...
> Anyway, if anyone is interested in helping, please let me know!

I am going to need this in a month or so. Actions speak louder than
words, so many thanks for your efforts. However, there were news two
months ago, summarized in this thread:

Yet another SoC introduction: Getting multi-db done
http://groups.google.com/group/django-developers/browse_thread/thread/a0bc69e64ad8e318/

It would be nice to coordinate each one's efforts, to avoid wasting
time. Ben, Daryl, any news?

--
Nicola Larosa - http://www.teknico.net/

Casper Jensen

unread,
May 20, 2008, 11:05:50 AM5/20/08
to django-d...@googlegroups.com
On Tue, May 20, 2008 at 4:56 PM, Nicola Larosa (tekNico)
<nicola...@gmail.com> wrote:
> It would be nice to coordinate each one's efforts, to avoid wasting
> time. Ben, Daryl, any news?
Currently, I have not worked on the project, since the proposal,
because of job and university commitments. I plan to track the
development at begin to help with the development when I get more time
(over the summer).

- Casper

koenb

unread,
May 20, 2008, 11:05:52 AM5/20/08
to Django developers
Ah, missed that one.
Anyway, I only did the easy parts (that is, getting data in and out of
existing databases).
Thanks for the pointer, I'll try to keep an eye on that.

Koen

On 20 mei, 16:56, "Nicola Larosa (tekNico)" <nicola.lar...@gmail.com>
wrote:
> koenb wrote:
> > For those interested in multiple database support, I have started
> > working on it again, and posted my work-in-progress to ticket #4747.
> > ...
> > Anyway, if anyone is interested in helping, please let me know!
>
> I am going to need this in a month or so. Actions speak louder than
> words, so many thanks for your efforts. However, there were news two
> months ago, summarized in this thread:
>
> Yet another SoC introduction: Getting multi-db donehttp://groups.google.com/group/django-developers/browse_thread/thread...

Daryl Spitzer

unread,
May 20, 2008, 11:20:18 AM5/20/08
to django-d...@googlegroups.com
I've unfortunately been too busy to make time to work on this since
PyCon. The last thing I've done (after writing some code on the
flight home) is to send a patch to Ben Ford. Not long after that Ben
created a Mercurial repository (with my patch) and a Trac project.
You'll want to contact him.

I would still like to get my patch working so others (and myself) can
start testing it. I won't have time this week, but so far it looks
like I may be able to make some time next week. If I don't, I see if
I can at least make enough time to write up the API I came up with at
PyCon.

--
Daryl

Nicola Larosa (tekNico)

unread,
May 20, 2008, 2:13:39 PM5/20/08
to Django developers
Daryl Spitzer wrote:
> If I don't, I see if I can at least make enough time to write up the API
> I came up with at PyCon.

Please do, that would be great.

Ben Ford

unread,
May 21, 2008, 6:33:57 AM5/21/08
to django-d...@googlegroups.com
Hi all,

I'll sort out the hg repo (it now needs to point at trunk - not qsrf) and trac project if I get time this evening and make it public readable for everyone who's interested.

Is there a ticket in django we could use to track progress on this? We could use 4747, but if we do decide on a new API that might be a bit confusing... We could of course just use the mailing list and trac project, thoughts?

It's great to see some interest in multiple db support again :-)

Ben

2008/5/20 Nicola Larosa (tekNico) <nicola...@gmail.com>:



--
Regards,
Ben Ford
ben.f...@gmail.com
+447792598685

Daryl Spitzer

unread,
May 21, 2008, 8:31:08 AM5/21/08
to django-d...@googlegroups.com
> Is there a ticket in django we could use to track progress on this? We could
> use 4747, but if we do decide on a new API that might be a bit confusing...
> We could of course just use the mailing list and trac project, thoughts?

There's also http://code.djangoproject.com/ticket/1142. With the
mailing list and trac project, do we need a ticket for more than just
a place to attach patches to invite others to test?

> I'll sort out the hg repo (it now needs to point at trunk - not qsrf) and
> trac project if I get time this evening and make it public readable for
> everyone who's interested.

Thanks Ben.

--
Daryl

Jacob Kaplan-Moss

unread,
May 21, 2008, 5:16:47 PM5/21/08
to django-d...@googlegroups.com
Hi guys --

Sorry for coming late to the party, but I'm just now catching up on django-dev.

I'm really glad to see you get the ball rolling on multiple db
support, and once I'm dug out from my backlog I'll be happy to start
reviewing code and helping out if I'm needed.

However, before we get to that point, I've got some pretty serious API
concerns with the current approach, so I think I should outline those
before y'all go much further. I don't want you to expend much effort
just to get a -1 smackdown.

The current mechanism of defining "other" databases in the settings
module is just fine, and the underlying mechanism of having
queries/managers "know" their connection is similarly dandy. But the
wheels come off when it comes to the "public" API where users will
choose which connection they use.

As far as I can tell, you've currently provided two hooks to use a
secondary connection: set the model's default connection in the
settings module (which is OK, I suppose, though I might want to
nitpick the syntax a bit), and assigning to ``Model.objects.db``.

This second one is a disaster waiting to happen -- you've had to muddy
things up with threadlocals to work around some problems already. Also
consider the "bookkeeping" you'd need to do to deal with objects
across multiple database simultaneously (think sharding). You'd have
to keep juggling ``Model.objects.db`` and saving old ones... ugh.

Here's how I think it should work:

* I'd like the default connection for each and every object to be the
default database forever and always. I find putting models for default
connections in settings distasteful and I'd rather just a single API
for changing the connection (see below). However, I imagine I'll be in
the minority here so I'm prepared to cede this point if necessary.

* There needs to be an official API to get a model (or perhaps a
manager) which references a different "context" --
``Model.objects.db`` should be read-only. So you'd call some API
method, and get back a sort of proxy object that uses the other
connection. Here's a strawman API::

>>> from django import db
>>> from someapp.models import Article

>>> Article.objects.all()
[... all Articles from the default database ...]

>>> ArticlesOnOtherDatabase =
db.get_model_for_other_connection(Article, "private")
>>> ArticlesOnOtherDatabase.objects.all()
[... all Articles from the database defined with the "private" key ...]

This should make the threadlocal stuff unnecessary, and (to my eye) is
a lot more sane than assigning the ``Manager.db``. Oh, and please
choose a better better name than
``db.get_model_for_other_connection()``; given that you're building
the bikeshed you might as well paint it, too.

Jacob

Ben Ford

unread,
May 21, 2008, 6:10:13 PM5/21/08
to django-d...@googlegroups.com
Hi Jacob,

I'd be interested in your thoughts on a declarative approach to defining the other databases...? I'll have my mercurial repo synced up to trunk tomorrow (my time) and I'll re-apply the patch I got from Daryl to it as a starting point. Hopefully people will be able to have a look through it and compare the declarative approach proposed with the existing multi-db approach.


As far as I can tell, you've currently provided two hooks to use a
secondary connection: set the model's default connection in the
settings module (which is OK, I suppose, though I might want to
nitpick the syntax a bit), and assigning to ``Model.objects.db``.

This second one is a disaster waiting to happen -- you've had to muddy
things up with threadlocals to work around some problems already. Also
consider the "bookkeeping" you'd need to do to deal with objects
across multiple database simultaneously (think sharding). You'd have
to keep juggling ``Model.objects.db`` and saving old ones... ugh.

I built a non trivial application with multi-db as it is right now and found the api to be a bit hairy to be honest. I think it would be an advantage, especially in a "database rich" environment to be able to build a db on the fly much like a model, rather than be tied to what's in a dict in settings. I don't really like the objects.db way of doing it, and I found myself doing a fair bit of hacking to get it to work.

> * There needs to be an official API to get a model (or perhaps a
> manager) which references a different "context" --
> ``Model.objects.db`` should be read-only. So you'd call some API
> method, and get back a sort of proxy object that uses the other
> connection. Here's a strawman API::
>
>  >>> from django import db
>  >>> from someapp.models import Article
>
>  >>> Article.objects.all()
>   [... all Articles from the default database ...]
>
>  >>> ArticlesOnOtherDatabase = db.get_model_for_other_connection(Article, "private")
>  >>> ArticlesOnOtherDatabase.objects.all()
>  [... all Articles from the database defined with the "private" key ...]

Agreed, the way I got round this was to build the model again from scratch each time I wanted to access objects in a different database and have the dynamicaly created model persist in the app cache. I took most of this from the dynamic models entry on the wiki, it's here, look in the duplicate_model function:
    http://www.djangosnippets.org/snippets/442/
This would really need work (especially the field copying code, which is fairly horrifying! I know that doesn't work for all field types too - yuk) before it becomes a 'good idea', and I'm not even sure it's the right way to go, however I'd be interested in weather people think it's a valid approach.


* I'd like the default connection for each and every object to be the
default database forever and always. I find putting models for default
connections in settings distasteful and I'd rather just a single API
for changing the connection (see below). However, I imagine I'll be in
the minority here so I'm prepared to cede this point if necessary.

The API which I think is being proposed is that there should be a central register for database connections. In my mind this would be the place to go to get hold of a connection for use in a queryset (and all the other places it's needed) and I think the correct default behaviour of the class/object would be to return the connection defined in settings.DATABASE_*. The code to build the declarative DatabaseWrapper is already there, and there a method to build one of these from what's in settings too. This should make it easy to get hold of connection in all of the places where we currently do "from django.db import connection".

It's great to see this revived again :-)

Cheers
Ben

oggie rob

unread,
May 21, 2008, 9:43:30 PM5/21/08
to Django developers


On May 21, 2:16 pm, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:
> * There needs to be an official API to get a model (or perhaps a
> manager) which references a different "context" --
> ``Model.objects.db`` should be read-only. So you'd call some API
> method, and get back a sort of proxy object that uses the other
> connection. Here's a strawman API::
>
>     >>> from django import db
>     >>> from someapp.models import Article
>
>     >>> Article.objects.all()
>     [... all Articles from the default database ...]
>
>     >>> ArticlesOnOtherDatabase =
> db.get_model_for_other_connection(Article, "private")
>     >>> ArticlesOnOtherDatabase.objects.all()
>     [... all Articles from the database defined with the "private" key ...]
>

Has anybody considered declaring the connection when getting the
manager? Something like:
Artist.objects.all()
Widget.objects(db='a').all()
Obviously with the default database for the case when "db" isn't
passed. Also you could override the Manager to use a different
database by default (e.g. Widget.objects.all() might always use an
OTHER_DATABASE while all other models use the main db, if you create a
custom Manager for Widget)

This still leaves questions about how syncdb would be achieved, at
least. But if it could be done, the API seems simple to understand.

-rob

koenb

unread,
May 21, 2008, 11:50:57 PM5/21/08
to Django developers
I really like this line of thought: having the persistence layer of a
model fixed is a good idea.
Relations is a big issue here: unless we can support relations across
databases, switching connections is always a big opportunity to shoot
yourself in the foot.
I would propose a function that can collect "clusters" of models, that
is a collection of models that somehow are related to each other and
use that function to a) check that they all use the same database
during validation, and b) if we provide a API to register a model for
an additional connection (that is a second one), you get copies of the
models for the entire cluster, relations and all. Like that we could
even have syncdb create the tables for these 'backup models' too.

Koen

On 21 mei, 23:16, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:

Simon Willison

unread,
May 22, 2008, 10:59:38 AM5/22/08
to Django developers
I have to admit I'm slightly worried about the multi-database
proposal, because at the moment it doesn't seem to solve either of the
multi-db problems I'm concerned about.

The proposal at the moment deals with having different models live in
different databases - for example, the Forum application lives on DB1
while the Blog application lives on DB2.

I can see how this could be useful, but the two database problems that
keep me up at night are the following:

1. Replication - being able to send all of my writes to one master
machine but spread all of my reads over several slave machines.
Thankfully Ivan Sagalaev's confusingly named mysql_cluster covers this
problem neatly without modification to Django core - it's just an
alternative DB backend which demonstrates that doing this isn't
particularly hard: http://softwaremaniacs.org/soft/mysql_cluster/en/

2. Sharding - being able to put User entries 1-1000 on DB1, whereas
User entries 1001-2000 live on DB2 and so on.

I'd love Django to have built-in abilities to solve #1 - it's a really
important first-step on scaling up to multiple databases, and it's
also massively easier than any other part of the multi-db problem.

I wouldn't expect a magic solution to #2 because it's so highly
dependent on the application that is being built, but at the same time
it would be nice to see a multi-db solution at least take this in to
account (maybe just by providing an easy tool to direct an ORM request
to a specific server based on some arbitrary logic).

I may have misunderstood the proposal, but I think it's vital that the
above two use cases are considered. Even if they can't be solved
outright, providing tools that custom solutions to these cases can be
built with should be a priority for multi-db support.

Cheers,

Simon

Ben Ford

unread,
May 22, 2008, 12:28:42 PM5/22/08
to django-d...@googlegroups.com
Hi all,

I've now re-applied Daryls patch (which was against qsrf) to a clone of django trunk in a mercurial repo. It's available at http://hg.woe-beti.de and there's a trac set up for it at http://trac.woe-beti.de. Feel free to make use of both of these. Although I've disabled to ability to create tickets perhaps the wiki might be a good place to discuss the API? Anyone can clone from the hg repo, give me a shout if you would like push access and I'll sort it out.

Cheers,

Ivan Sagalaev

unread,
May 22, 2008, 1:53:13 PM5/22/08
to django-d...@googlegroups.com
Simon Willison wrote:
> Thankfully Ivan Sagalaev's confusingly named mysql_cluster

BTW does anyone have a suggestion how to rename it? I've picked
mysql_cluster simply because I didn't know that there exists the thing
named "MySQL Cluster" (no kidding :-) ).

Eratothene

unread,
May 22, 2008, 5:56:34 PM5/22/08
to Django developers
I think there is a third issue.

Usage of several RDBMS in one django application simulatneously

For example we maintain two RDBMS: monetdb and postgresql. The latest
and most accessed data is stored in monetdb for performance. After one
month data is moved to posgresql which holds the full archive.

Mike Scott

unread,
May 23, 2008, 7:00:09 AM5/23/08
to django-d...@googlegroups.com
On Fri, May 23, 2008 at 2:59 AM, Simon Willison <si...@simonwillison.net> wrote:
1. Replication - being able to send all of my writes to one master
machine but spread all of my reads over several slave machines.
Thankfully Ivan Sagalaev's confusingly named mysql_cluster covers this
problem neatly without modification to Django core - it's just an
alternative DB backend which demonstrates that doing this isn't
particularly hard: http://softwaremaniacs.org/soft/mysql_cluster/en/

Personally I think this is something that is better solved by the database software itself. Having replication code-side is something that I don't feel to good about. But maybe thats just me.
 


2. Sharding - being able to put User entries 1-1000 on DB1, whereas
User entries 1001-2000 live on DB2 and so on.



This is something I would love, an example being archives. (As Eratothene points out.

Maybe having to state a storage location on a per-row level. (IE this could happen by overriding the manager, and simply switching DB at selection time. or being able to provide the DB info at selection time.)

koenb

unread,
May 23, 2008, 9:06:24 AM5/23/08
to Django developers
You need to be aware that this is quite complicated if your model has
foreign keys in it: if you use the ORM to do queries, the ORM would
have to be so smart as to when to split up your queries instead of
doing joins.
eg you have model A which foreign keys to a User model. For a row of A
that is in the same database as User, the ORM could simply use a join,
but for a row of A that was already in the other database, it can't.

I do not believe this is a trivial change.

My proposal is to keep things simple in a first phase: let's make it
possible to use different databases for different models with the
restriction that relations should not cross databases. Once we get all
that working, we may look at making the query generation deal with
those.

Koen

Simon Willison

unread,
May 23, 2008, 9:16:54 AM5/23/08
to Django developers
How about mysql_masterslave or mysql_replicated (I prefer the second)?

Manuel Saelices

unread,
May 23, 2008, 10:26:36 AM5/23/08
to Django developers
On 23 mayo, 13:00, "Mike Scott" <mic...@gmail.com> wrote:
>
> Maybe having to state a storage location on a per-row level. (IE this could
> happen by overriding the manager, and simply switching DB at selection time.
> or being able to provide the DB info at selection time.)

Maybe i good thing was to provide a middleware that does db selection.
Think in applications user centered. In those, user define what DB
use.

You can provide an API for change DB of all next queries, and use if
you want in a middleware (looking at request.user).

Ivan Sagalaev

unread,
May 24, 2008, 2:52:24 PM5/24/08
to django-d...@googlegroups.com
Simon Willison wrote:
> How about mysql_masterslave or mysql_replicated (I prefer the second)?

Yes, mysql_replicated seems right. Thanks!

koenb

unread,
May 28, 2008, 3:25:39 AM5/28/08
to Django developers
On 22 mei, 18:28, "Ben Ford" <ben.for...@gmail.com> wrote:
> Hi all,
>
> I've now re-applied Daryls patch (which was against qsrf) to a clone of
> django trunk in a mercurial repo. It's available athttp://hg.woe-beti.deandthere's a trac set up for it athttp://trac.woe-beti.de. Feel free to make use of both of these. Although
> I've disabled to ability to create tickets perhaps the wiki might be a good
> place to discuss the API? Anyone can clone from the hg repo, give me a shout
> if you would like push access and I'll sort it out.
>
> Cheers,
> Ben
>
> --
> Regards,
> Ben Ford
> ben.for...@gmail.com
> +447792598685

I have been adding some code to Ben's mercurial repo on [http://hg.woe-
beti.de], see also [http://trac.woe-beti.de].

What has been realised (more or less):
- connection and model registration
- validation that related objects use the same connection
- database_engine specific SQL generation (needs more checking)
- some management commands accept connection parameter, others can
generate output for multiple connections
- syncdb can sync different connections
- transaction management over connections
- some tests (needs a lot more work)

This means point 3 of the discussion at [http://trac.woe-beti.de/wiki/
Discuss] is somewhat working, but definitely needs a lot more testing.

I do need some help with creating tests for all this though.
I have not figured out yet how to create tests that check that the
right SQL is being generated for the backend used. (Nor how to test
the right database was touched by an action, this is quite obvious
manually, but I do not know how to automate this.)

I put some ideas on using multiple databases for copying (backup or
migration) of objects (point 4 of the discussion note) in [http://
trac.woe-beti.de/wiki/APIModelDuplication].

Please comment, add, shoot etc. Any help will be much appreciated.

Koen

Daryl Spitzer

unread,
Jun 4, 2008, 10:53:05 AM6/4/08
to django-d...@googlegroups.com
Another couple weeks have slipped by and I continue to be crazy-busy.
(But each week I'm busy for a different reason--so I continue to be
foolishly optimistic that I'll soon get a week with some free time.)

Anyway, I don't have time to read this thread through with the care it
deserves, but I thought I shouldn't let that stop me from finally
writing a description of the API I proposed at the PyCon sprint.

Each app would have a databases.py file that contains classes used to
define databases connections (in the same manner as classes are used
to define models). Here's an example:

----

from django.db import connections

class LegacyDatabase(connections.DatabaseConnection):
engine = 'sqlite3'
name = '/foo/bar/legacy_db.sqlite3'

----

(And the other DATABASE_* settings (from settings.py) could certainly
be defined as attributes of a DatabaseConnection class.)

JUST FOR TESTING, I propose we allow a database connection to be
specified in a model with a Meta attribute, like this:

----

from django.db import models
from legacy.databases import LegacyDatabase

class LegacyStuff(models.Model):
...

class Meta:
db_connection = LegacyDatabase

----

Jacob expressed his extreme distaste for this at PyCon--for good
reason. (We don't want to encourage coupling models to databases.)
But just so we can get a working patch and start testing, I propose we
go with this for now.

Adrian suggested we allow the specification of database connections
per-app using the new app() function being proposed for settings.py.
I haven't seen a description of this since PyCon, but I think it would
look something like:

app(name='legacy', db_connection='LegacyDatabase')

(I'm sure I'm leaving several important arguments out of this example.)

Perhaps one could implement sharding by defining multiple
DatabaseConnection classes in a databases.py file (we could support
these files at the project level in addition to the app level) and
putting them in a list. Then one could write a function to return the
appropriate database to use and specify that callable in the argument
to the app function (or perhaps as an argument to the url function in
urls.py).

I haven't given any thought to replication. Perhaps someone who needs
this could think about whether this proposal could somehow make
supporting replication easier (or if it might get in the way), or if
it's simply orthogonal to this.

--
Daryl

mengel

unread,
Jun 16, 2008, 10:05:32 PM6/16/08
to Django developers


On May 22, 9:59 am, Simon Willison <si...@simonwillison.net> wrote:

> 1. Replication - being able to send all of my writes to one master
> machine but spread all of my reads over several slave machines.


> 2. Sharding - being able to put User entries 1-1000 on DB1, whereas
> User entries 1001-2000 live on DB2 and so on.

It seems to me this isn't beyond doing in the current setup; but I'm
not sure I understand
the underlying mechanism well enough. For case 1, you need an object
class that
creates two (or more) (apparently identical) Models.model classes, one
attached to each database, based on the field types declared as class
variables:
* on searches, it picks one of the model classes to search
* on saves, saves the same data to each object class in turn

For case 2, it's very similar, except you need to run the query on all
sides (unless
you can tell it should only go to one) building a chained query-set
union type to hold
the result, and for saves pick the right model to save to based on
the condition.

In each case, the underlying models have to be tied to the right
databases, but this can
be done using the mechanism in the proposal so far..

David Cramer

unread,
Jun 17, 2008, 12:50:11 AM6/17/08
to Django developers
I suppose I'll chime in here since we actually wrote master/slave
replication code on Curse.

Our approach:

- read_cursor and write_cursor exist. write_cursor is what cursor
would point ot.
- get queries all use the read cursor
- saves all use the write cursor
- we had a list of database connections, which stored the same
settings, just in a tuple format
- reading I believe used something like itertools.cycle but I can't
honestly say without looking at the code

Beyond this, the database itself should handle writing the objects to
the slaves. Django shouldn't even bother.

In regards to multiple databases in general. it is my feeling that
even if it is not good practice, Django _NEEDS_ to support a model
being attached to a database other than the default. So if you have
mydjango_blogs, and mydjango_forums databases, my Forum model always
goes to the write db when it queries, and same for blogs. I myself
like a Meta solution to this as it makes sense.

In MySQL as well, you can optimize things, so that if they use the
same connection, you can just query on that database. It's select X
from mydatabase.mytable. I'm not sure if something similar exists in
other database engines.

Jan Oberst

unread,
Jul 8, 2008, 8:33:40 AM7/8/08
to Django developers
Hi guys,

I've been heavily swamped with work for college, so I missed this
thead and the few others on multiple databases. Sorry.

I have implemented a proof-of-concept database scaling solution for
Django. It tackles all kind of scaling issues I have seen in Django.
It's purpose is mainly to find out if we could scale up Django at all.
I didn't worry too much about syntax and the way it's supposed to
integrate into Django - I just hacked away in Django code to make it
work the fastest possible way I could think of.


The solution covers the largest part of Simon's #2 problem. I added a
few attributes and config parameters to the ORM so you can decide
which models are hosted on which server. One model can be hosted on 20
servers with the actual location depending on a foreign key value.

We're using it to store data for different groups on different servers
for a more horizontal scaling. For example if a photo got a ForeignKey
to group A it will be routed to server 15 because of some logic.

You can also route objects 1-1000 to server 1 and 1001-2000 to server
2.


I have also added database denormalization, caching foreign key
querysets to the DB, bulk prefetching, in-model privacy checks and a
few other things.

A large percentage of the stuff probably isn't suitable for Django-
trunk. Most of it tackles quite specific and hard scaling issues, but
I guess there's a way to build it more modular and make it work for
more people. After all I'm new to Django-developers and also to
opening up my work.

If some of you are interested in the code and would benefit from it I
would be more than happy to share.

Just posting a big pile of code probably won't help you too much, so I
thought I'd write a few lines documentation about each part and post
them here. Does that sound reasonable?

Jan

On May 22, 4:59 pm, Simon Willison <si...@simonwillison.net> wrote:
> I have to admit I'm slightly worried about the multi-database
> proposal, because at the moment it doesn't seem to solve either of the
> multi-db problems I'm concerned about.
>
> The proposal at the moment deals with having different models live in
> different databases - for example, the Forum application lives on DB1
> while the Blog application lives on DB2.
>
> I can see how this could be useful, but the two database problems that
> keep me up at night are the following:
>
> 1. Replication - being able to send all of my writes to one master
> machine but spread all of my reads over several slave machines.
> Thankfully Ivan Sagalaev's confusingly named mysql_cluster covers this
> problem neatly without modification to Django core - it's just an
> alternative DB backend which demonstrates that doing this isn't
> particularly hard:http://softwaremaniacs.org/soft/mysql_cluster/en/
>
> 2. Sharding - being able to put User entries 1-1000 on DB1, whereas
> User entries 1001-2000 live on DB2 and so on.
>
> I'd love Django to have built-in abilities to solve #1 - it's a really
> important first-step onscalingup to multiple databases, and it's

Ben Ford

unread,
Jul 8, 2008, 10:04:57 AM7/8/08
to django-d...@googlegroups.com
Hi Jan,

It sounds like you've made great progress. We have an informal trac and hg repo set up at trac and hg dot woe-beti.de respectively. you're more than welcome to add your documentation there! Let me know if you want an hg repo tp play with too and I'll sort it out for you.

Cheers,
Ben

2008/7/8 Jan Oberst <jan.o...@gmail.com>:

Jan Oberst

unread,
Jul 13, 2008, 4:15:43 PM7/13/08
to Django developers
I've been doing a little reading on multi-db code an wiki. You've
basically been tackling problem #3 (different data types and engines)
- which I didn't care about at all. That's good, I guess.

The way I handle database connections is just by having a connection
pool of different connection objects alive at the same time and create
new cursors on the connections I need. Since I have only implemented
the most simple SELECT FROM WHERE for one and many rows I haven't
worried too much about commit and rollback and stuff like that. So I
don't get all of your code and why you need to use threadlocals and
stuff like that.

The basic thing I do when a Django Model is sharded to different
server is this:

1. I write a get_shards classfunction for every model that does some
logic and returns a list of one or more shard objects that have a link
to this shard's connection.
2. Ask the model class on which shards it is, this returns a list of
shard objects.
3. For each shard object I get a new database cursor from the
connection which lives in a seperate connection object for every shard
(I'm not perfectly sure if this is thread safe)
4. For each of those cursors I repeat the query Django wanted to run.
Then I try to stich the responses together the best I can.

You guys obviously know your code better than me. Should I start re-
writing my code (necessary after queryset-refactoring) based on your
patch?

Jan

On Jul 8, 4:04 pm, "Ben Ford" <ben.for...@gmail.com> wrote:
> Hi Jan,
>
> It sounds like you've made great progress. We have an informal trac and hg
> repo set up at trac and hg dot woe-beti.de respectively. you're more than
> welcome to add your documentation there! Let me know if you want an hg repo
> tp play with too and I'll sort it out for you.
>
> Cheers,
> Ben
>
> 2008/7/8 Jan Oberst <jan.obe...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages