Django and Multiple Database Support

64 views
Skip to first unread message

Scott johnson

unread,
Dec 29, 2005, 6:35:41 AM12/29/05
to django...@googlegroups.com, Michael Air
Hello,

*Long time lurker; first time poster*.  My partner and I are standarizing on Django / Python for the front end of www.ookles.com, our new startup.  We're both php folk and I've convinced him that Django is better than ruby (based, honestly, on my respect for Simon and Adrian rather than a real analysis; transitive geek fu; any who). 

The issue at hand for us is we have essentially one application: Ookles which is a highly interactive web site.  Now having scaled a MySQL app (lets not go down the postgres discussion; we're using MySQL 5 so we have views and stored procs) to the terabyte level at my last gig, I learned the lesson of "DO NOT; REPEAT DO NOT; PUT EVERYTHING IN ONE DB".  But it at least feels like Django is really oriented around the concept of one db.  To wit:

I've been playing with django today (really like it so far!). To start a project, you define your database model with a set of sub-classes representing each database table. Once you've defined everything you can do a couple of things, either " django-admin.py sql app" which will spit out an sql script for you to create your database tables with or you can do "django-admin.py install app" and that will create all the tables for you. Fantastic stuff and if you can get your head around doing it this way, then it saves time by having the model setup and the database.
 
So far, Django feels much more solid than rails. Yes, you have to do more but you feel good about doing it. Relying on rails magic is sometimes a little hard to swallow :)

(From my co founder Mike)

Now I haven't hacked Django much myself yet (I've been working on the back end tools, db loader and overall schema).  What support does Django have for multiple db stuff?

I apologize if this is a real newbie question.  I did run through the basic docs (great job btw; thank you).  Also we'll need to write a baseline simple db backed, captcha class near immediately to keep out the spam bots.  Any interest in our contributing that back?

Thanks
Scott
--
-------------------------------------------------------
J. Scott Johnson
*** Now Available for Consulting ***
blog: http://fuzzyblog.com/
fuzzy...@gmail.com
aim: fuzzygroup
cell: 857 222 6459
-------------------------------------------------------

Adrian Holovaty

unread,
Dec 29, 2005, 10:10:22 AM12/29/05
to django...@googlegroups.com
On 12/29/05, Scott johnson <fuzzy...@gmail.com> wrote:
> Now I haven't hacked Django much myself yet (I've been working on the back
> end tools, db loader and overall schema). What support does Django have for
> multiple db stuff?

Hey Scott,

Welcome!

Django doesn't support multiple DBs out of the box, but this is
something we (Simon particularly) have been very keen on adding.
Fortunately, the database code is rather nicely abstracted, so most of
the refactoring would be in just two modules.

To what extent would you be using multiple DBs? Would you be employing
the technique whereby certain records (e.g. users 1 to 1,000,000) are
in certain databases, or would you be doing more of a standard setup,
in which you'd want database reads to be spread evenly across multiple
DBs? Go ahead and explain the setup, and we can get started on
designing the feature.

> Also we'll need to write a baseline simple
> db backed, captcha class near immediately to keep out the spam bots. Any
> interest in our contributing that back?

Sure, we're always interested in contributions! You might have a look
at Ian Holsman's Django captcha app here:
http://feh.holsman.net/articles/2005/12/15/django-captcha-app
I haven't looked at the code myself, but it could be helpful.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Jacob Kaplan-Moss

unread,
Dec 29, 2005, 3:29:09 PM12/29/05
to django...@googlegroups.com
On Dec 29, 2005, at 9:10 AM, Adrian Holovaty wrote:
> ... or would you be doing more of a standard setup,

> in which you'd want database reads to be spread evenly across multiple
> DBs? Go ahead and explain the setup, and we can get started on
> designing the feature.

I've always though that this particular -- and common -- use case
should be delegated to the DB level using one of the many excellent
replication/distribution tools for your database. For example, you
could easily do read distribution with pg_pool or sqlrelay, and it
would be transparent to Django. I don't see a good reason to tackle
replication in Django itself as that's more or less a solved problem.

Jacob

Simon Willison

unread,
Dec 29, 2005, 5:09:05 PM12/29/05
to django...@googlegroups.com

On 29 Dec 2005, at 20:29, Jacob Kaplan-Moss wrote:

> I've always though that this particular -- and common -- use case
> should be delegated to the DB level using one of the many excellent
> replication/distribution tools for your database. For example, you
> could easily do read distribution with pg_pool or sqlrelay, and it
> would be transparent to Django. I don't see a good reason to
> tackle replication in Django itself as that's more or less a solved
> problem.

I disagree. There's a lot more to separate databases than just
replication - when you scale big there are all kinds of ways things
might need to be partitioned. You might want to keep "cheap" data
(like traffic logs for user's weblogs) on a different DB cluster from
expensive data (like their blog entries themselves). Some data gets
accessed all the time while some other data is only ever written -
etc etc.

I'd love Django to have a reputation as the web framework that
scales. As far as I can tell, big LAMP sites that scale are mostly
done as PHP with a whole load of custom scary stuff - connections to
multiple databases, memcached, even XMLRPC calls to backend services
written in Java. We already have caching and we can do calls to
backend services easily but the single database connection assumption
is baked right in to the framework.

Unfortunately, I don't have the experience of scaling big to say much
more than that. This is where input from people like Scott becomes
invaluable :)

Cheers,

Simon

hugo

unread,
Dec 29, 2005, 6:42:52 PM12/29/05
to Django users
>I've always though that this particular -- and common -- use case
>should be delegated to the DB level using one of the many excellent
>replication/distribution tools for your database. For example, you
>could easily do read distribution with pg_pool or sqlrelay, and it
>would be transparent to Django. I don't see a good reason to tackle
>replication in Django itself as that's more or less a solved problem.

Beside the fact that most of those "excellent" after-market replication
solutions just plain suck ;-) - there are very good reasons to have
data-driven distribution. This might be controlled by tables - some
tables living on different databases or servers - or by content. Like
data from older years living in other databases or on other servers. So
even if you get those replication solutions to work reliable (which I
never was able to do with sqlrelay - the client goes bozo if the server
has problems), you _can't_ do those data-driven distribution with
after-market tools. This has to be done on application level.

A scenario from my work: a ERP system which produces loads of
accounting data. Older data is moved to some external database, because
it would put too much load on the active database. Then there are
special data aggregates that are stored in another database for faster
queries - they are specially prepared. So the application needs to
access three databases simultaniously. And in large installations,
those databases will be even run on different servers.

So, yes, I do think it would be very useful for Django to be able to
access multiple databases via it's ORM. Actually I have already exactly
that requirement in some Django project: while trying to build a admin
interface for my db-based nameserver and db-based mail-system, I have
the problem that DNS and Mail are handled by different databases, so I
am blocked on that project currently, as Django won't let me work with
both databases ...

bye, Georg

Kenneth Gonsalves

unread,
Dec 29, 2005, 11:28:53 PM12/29/05
to django...@googlegroups.com
On Friday 30 Dec 2005 5:12 am, hugo wrote:
> So, yes, I do think it would be very useful for Django to be able
> to access multiple databases via it's ORM

incidently, we need support for postgres schemas also - i have an
Financial Accounting app where, in multi-company mode, the tables
for each company are stored in a separate schema, also for
multilingual stuff this is ideal

--
regards
kg

http://www.livejournal.com/users/lawgon
tally ho! http://avsap.org.in
ಇಂಡ್ಲಿನಕ್ಸ வாழ்க!

Simon Willison

unread,
Dec 30, 2005, 8:22:14 AM12/30/05
to django...@googlegroups.com, Michael Air

On 29 Dec 2005, at 11:35, Scott johnson wrote:

> Now I haven't hacked Django much myself yet (I've been working on
> the back end tools, db loader and overall schema). What support
> does Django have for multiple db stuff?

I've started a ticket to track discussions on this issue:

http://code.djangoproject.com/ticket/1142

Cheers,

Simon

Greg

unread,
Jan 10, 2006, 2:44:33 AM1/10/06
to Django users
I'm faced with the multiple-schema problem in MySQL, which AFAICT is a
lot simpler than actually having multiple databases because at least
you don't have to coordinate multiple connections. Actually, it
basically already works. I wrote

class Poll(meta.Model):
class META:
db_table = "myschema.poll"

but then Django tries to quote the whole name (including the period)
which of course makes MySQL complain (beause the SQL should be
`myschema`.`poll`, not `myschema.poll`). So I hacked
DatabaseWrapper.quote_name so that it doesn't quote periods, but is
that the "right" way to handle this or is it an ugly hack? It seems to
work very well, but it feels like it might be cleaner to add a
db_schema attribute or something instead. (Should I submit a patch? :-)

Adrian Holovaty

unread,
Jan 10, 2006, 9:16:36 AM1/10/06
to django...@googlegroups.com
On 1/10/06, Greg <goo...@abbas.org> wrote:
> So I hacked
> DatabaseWrapper.quote_name so that it doesn't quote periods, but is
> that the "right" way to handle this or is it an ugly hack? It seems to
> work very well, but it feels like it might be cleaner to add a
> db_schema attribute or something instead. (Should I submit a patch? :-)

Perhaps we could have the best of both worlds, and have the Model
constructor split on period, storing db_schema internally if a schema
was provided.

Reply all
Reply to author
Forward
0 new messages