- Casper
I would still like to get my patch working so others (and myself) can
start testing it. I won't have time this week, but so far it looks
like I may be able to make some time next week. If I don't, I see if
I can at least make enough time to write up the API I came up with at
PyCon.
--
Daryl
There's also http://code.djangoproject.com/ticket/1142. With the
mailing list and trac project, do we need a ticket for more than just
a place to attach patches to invite others to test?
> I'll sort out the hg repo (it now needs to point at trunk - not qsrf) and
> trac project if I get time this evening and make it public readable for
> everyone who's interested.
Thanks Ben.
--
Daryl
Sorry for coming late to the party, but I'm just now catching up on django-dev.
I'm really glad to see you get the ball rolling on multiple db
support, and once I'm dug out from my backlog I'll be happy to start
reviewing code and helping out if I'm needed.
However, before we get to that point, I've got some pretty serious API
concerns with the current approach, so I think I should outline those
before y'all go much further. I don't want you to expend much effort
just to get a -1 smackdown.
The current mechanism of defining "other" databases in the settings
module is just fine, and the underlying mechanism of having
queries/managers "know" their connection is similarly dandy. But the
wheels come off when it comes to the "public" API where users will
choose which connection they use.
As far as I can tell, you've currently provided two hooks to use a
secondary connection: set the model's default connection in the
settings module (which is OK, I suppose, though I might want to
nitpick the syntax a bit), and assigning to ``Model.objects.db``.
This second one is a disaster waiting to happen -- you've had to muddy
things up with threadlocals to work around some problems already. Also
consider the "bookkeeping" you'd need to do to deal with objects
across multiple database simultaneously (think sharding). You'd have
to keep juggling ``Model.objects.db`` and saving old ones... ugh.
Here's how I think it should work:
* I'd like the default connection for each and every object to be the
default database forever and always. I find putting models for default
connections in settings distasteful and I'd rather just a single API
for changing the connection (see below). However, I imagine I'll be in
the minority here so I'm prepared to cede this point if necessary.
* There needs to be an official API to get a model (or perhaps a
manager) which references a different "context" --
``Model.objects.db`` should be read-only. So you'd call some API
method, and get back a sort of proxy object that uses the other
connection. Here's a strawman API::
>>> from django import db
>>> from someapp.models import Article
>>> Article.objects.all()
[... all Articles from the default database ...]
>>> ArticlesOnOtherDatabase =
db.get_model_for_other_connection(Article, "private")
>>> ArticlesOnOtherDatabase.objects.all()
[... all Articles from the database defined with the "private" key ...]
This should make the threadlocal stuff unnecessary, and (to my eye) is
a lot more sane than assigning the ``Manager.db``. Oh, and please
choose a better better name than
``db.get_model_for_other_connection()``; given that you're building
the bikeshed you might as well paint it, too.
Jacob
As far as I can tell, you've currently provided two hooks to use a
secondary connection: set the model's default connection in the
settings module (which is OK, I suppose, though I might want to
nitpick the syntax a bit), and assigning to ``Model.objects.db``.
This second one is a disaster waiting to happen -- you've had to muddy
things up with threadlocals to work around some problems already. Also
consider the "bookkeeping" you'd need to do to deal with objects
across multiple database simultaneously (think sharding). You'd have
to keep juggling ``Model.objects.db`` and saving old ones... ugh.
* I'd like the default connection for each and every object to be the
default database forever and always. I find putting models for default
connections in settings distasteful and I'd rather just a single API
for changing the connection (see below). However, I imagine I'll be in
the minority here so I'm prepared to cede this point if necessary.
BTW does anyone have a suggestion how to rename it? I've picked
mysql_cluster simply because I didn't know that there exists the thing
named "MySQL Cluster" (no kidding :-) ).
1. Replication - being able to send all of my writes to one master
machine but spread all of my reads over several slave machines.
Thankfully Ivan Sagalaev's confusingly named mysql_cluster covers this
problem neatly without modification to Django core - it's just an
alternative DB backend which demonstrates that doing this isn't
particularly hard: http://softwaremaniacs.org/soft/mysql_cluster/en/
2. Sharding - being able to put User entries 1-1000 on DB1, whereas
User entries 1001-2000 live on DB2 and so on.
Yes, mysql_replicated seems right. Thanks!
Anyway, I don't have time to read this thread through with the care it
deserves, but I thought I shouldn't let that stop me from finally
writing a description of the API I proposed at the PyCon sprint.
Each app would have a databases.py file that contains classes used to
define databases connections (in the same manner as classes are used
to define models). Here's an example:
----
from django.db import connections
class LegacyDatabase(connections.DatabaseConnection):
engine = 'sqlite3'
name = '/foo/bar/legacy_db.sqlite3'
----
(And the other DATABASE_* settings (from settings.py) could certainly
be defined as attributes of a DatabaseConnection class.)
JUST FOR TESTING, I propose we allow a database connection to be
specified in a model with a Meta attribute, like this:
----
from django.db import models
from legacy.databases import LegacyDatabase
class LegacyStuff(models.Model):
...
class Meta:
db_connection = LegacyDatabase
----
Jacob expressed his extreme distaste for this at PyCon--for good
reason. (We don't want to encourage coupling models to databases.)
But just so we can get a working patch and start testing, I propose we
go with this for now.
Adrian suggested we allow the specification of database connections
per-app using the new app() function being proposed for settings.py.
I haven't seen a description of this since PyCon, but I think it would
look something like:
app(name='legacy', db_connection='LegacyDatabase')
(I'm sure I'm leaving several important arguments out of this example.)
Perhaps one could implement sharding by defining multiple
DatabaseConnection classes in a databases.py file (we could support
these files at the project level in addition to the app level) and
putting them in a list. Then one could write a function to return the
appropriate database to use and specify that callable in the argument
to the app function (or perhaps as an argument to the url function in
urls.py).
I haven't given any thought to replication. Perhaps someone who needs
this could think about whether this proposal could somehow make
supporting replication easier (or if it might get in the way), or if
it's simply orthogonal to this.
--
Daryl