Hey Scott,
Welcome!
Django doesn't support multiple DBs out of the box, but this is
something we (Simon particularly) have been very keen on adding.
Fortunately, the database code is rather nicely abstracted, so most of
the refactoring would be in just two modules.
To what extent would you be using multiple DBs? Would you be employing
the technique whereby certain records (e.g. users 1 to 1,000,000) are
in certain databases, or would you be doing more of a standard setup,
in which you'd want database reads to be spread evenly across multiple
DBs? Go ahead and explain the setup, and we can get started on
designing the feature.
> Also we'll need to write a baseline simple
> db backed, captcha class near immediately to keep out the spam bots. Any
> interest in our contributing that back?
Sure, we're always interested in contributions! You might have a look
at Ian Holsman's Django captcha app here:
http://feh.holsman.net/articles/2005/12/15/django-captcha-app
I haven't looked at the code myself, but it could be helpful.
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org
I've always though that this particular -- and common -- use case
should be delegated to the DB level using one of the many excellent
replication/distribution tools for your database. For example, you
could easily do read distribution with pg_pool or sqlrelay, and it
would be transparent to Django. I don't see a good reason to tackle
replication in Django itself as that's more or less a solved problem.
Jacob
> I've always though that this particular -- and common -- use case
> should be delegated to the DB level using one of the many excellent
> replication/distribution tools for your database. For example, you
> could easily do read distribution with pg_pool or sqlrelay, and it
> would be transparent to Django. I don't see a good reason to
> tackle replication in Django itself as that's more or less a solved
> problem.
I disagree. There's a lot more to separate databases than just
replication - when you scale big there are all kinds of ways things
might need to be partitioned. You might want to keep "cheap" data
(like traffic logs for user's weblogs) on a different DB cluster from
expensive data (like their blog entries themselves). Some data gets
accessed all the time while some other data is only ever written -
etc etc.
I'd love Django to have a reputation as the web framework that
scales. As far as I can tell, big LAMP sites that scale are mostly
done as PHP with a whole load of custom scary stuff - connections to
multiple databases, memcached, even XMLRPC calls to backend services
written in Java. We already have caching and we can do calls to
backend services easily but the single database connection assumption
is baked right in to the framework.
Unfortunately, I don't have the experience of scaling big to say much
more than that. This is where input from people like Scott becomes
invaluable :)
Cheers,
Simon
Beside the fact that most of those "excellent" after-market replication
solutions just plain suck ;-) - there are very good reasons to have
data-driven distribution. This might be controlled by tables - some
tables living on different databases or servers - or by content. Like
data from older years living in other databases or on other servers. So
even if you get those replication solutions to work reliable (which I
never was able to do with sqlrelay - the client goes bozo if the server
has problems), you _can't_ do those data-driven distribution with
after-market tools. This has to be done on application level.
A scenario from my work: a ERP system which produces loads of
accounting data. Older data is moved to some external database, because
it would put too much load on the active database. Then there are
special data aggregates that are stored in another database for faster
queries - they are specially prepared. So the application needs to
access three databases simultaniously. And in large installations,
those databases will be even run on different servers.
So, yes, I do think it would be very useful for Django to be able to
access multiple databases via it's ORM. Actually I have already exactly
that requirement in some Django project: while trying to build a admin
interface for my db-based nameserver and db-based mail-system, I have
the problem that DNS and Mail are handled by different databases, so I
am blocked on that project currently, as Django won't let me work with
both databases ...
bye, Georg
incidently, we need support for postgres schemas also - i have an
Financial Accounting app where, in multi-company mode, the tables
for each company are stored in a separate schema, also for
multilingual stuff this is ideal
--
regards
kg
http://www.livejournal.com/users/lawgon
tally ho! http://avsap.org.in
ಇಂಡ್ಲಿನಕ್ಸ வாழ்க!
> Now I haven't hacked Django much myself yet (I've been working on
> the back end tools, db loader and overall schema). What support
> does Django have for multiple db stuff?
I've started a ticket to track discussions on this issue:
http://code.djangoproject.com/ticket/1142
Cheers,
Simon
class Poll(meta.Model):
class META:
db_table = "myschema.poll"
but then Django tries to quote the whole name (including the period)
which of course makes MySQL complain (beause the SQL should be
`myschema`.`poll`, not `myschema.poll`). So I hacked
DatabaseWrapper.quote_name so that it doesn't quote periods, but is
that the "right" way to handle this or is it an ugly hack? It seems to
work very well, but it feels like it might be cleaner to add a
db_schema attribute or something instead. (Should I submit a patch? :-)
Perhaps we could have the best of both worlds, and have the Model
constructor split on period, storing db_schema internally if a schema
was provided.