It seems GSOC has finally come to a close and so I'm giving my final
status update as a part of GSOC (but I'm not going anywhere!). When
we last left off I had just gotten Oracle support working, however
after reviewing with Russ we agreed that the solution was a good bit
too hacky, and the real root of the problem was that the Query class
has 2 functions, one to record information and build a Pythonic
representation of a Query, which is the same for all SQL backends, as
well as to actually generate SQL from this representation, which is
different in the case of Oracle and others. Therefore the solution is
to actually split these up into separate classes, so we can swap out
SQL generators without needing to care about the data collector. In
short that's what I've been working on. Unfortunately this isn't done
at the time of writing (and the end of GSOC), however as I said the
code basically works now, it's just not in a form that would end up
back in Django. But, as I said, I'm not going anywhere. I'm going to
continue to work on this problem, and I'll continue to checkin with
django-developers as design decisions and complications come up.
For now, thanks for all the useful ideas, constructive criticism, and
words of encouragement django-developers has provided as I've worked
on this.
Alex
--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me
So far, the implemented API is pretty low-level - it lets you direct
queries to a particular database, but the practical end-user API for
use cases such as slave/master hasn't been worked on that much. If you
go back to the original specifications, the suggested solution is to
put this in the manager, and so far I can't see any reason why this
won't work.
> 2. Only creating tables on the databases specified in 'using'. It's
> confusing (to me) connecting to a database, trying to select all my users to
> find out I am on the wrong db (because the table is empty). Perhaps tables
> should only be created on the database they are using. I don't have a good
> suggestion for this as if you were sharding data, you would want to create
> the databases on all tables that this model could potentially live on.
> Perhaps using could be be a string or list of connections,
I think this makes the third time I get to tell Alex "I told you so" :-)
I agree that this is a problem. We're still working on a solution. I'm
not sure that the Meta: using approach will be enough - consider the
case where you want contrib.auth (or some app you don't control and
doesn't specify using) to be synchronized to the non-default database.
I'm hoping to sort this out with Alex and some of the core devs during
DjangoCon.
> 3. I have multiple databases defined (some multiple times). It would be
> cool if we could 'ignore' certain databases. An example, I have 3 MySQL
> instances running. MASTER_MAIN_DB, MASTER_OTHER_DB, SLAVE_MAIN_DB. I want
> to be able to refer to them all, but also all the contrib apps I am using I
> want to live on MASTER_MAIN_DB. So in my settings I have:
> DATABASES = {'default': MAINDB_MASTER,
> 'MASTER_MAIN_DB': MAINDB_MASTER,
> 'SLAVE_MAIN_DB': MAINDB_SLAVE,
> 'MASTER_OTHER_DB': OTHERDB_SLAVE
> }
> Which means that when I run tests, it tries to drop tables on MAINDB_MASTER
> twice. Perhaps someone (Alex?) knows of a better way to do this?
Is there any reason (other than clarity) that you want to be able to
explicitly refer to 'default' and 'MASTER_MAIN_DB'? Is there some
reason that it isn't practical to just call 'default' the main-master
database and refer to it as such?
The reason I'm asking is that the duplication you are doing here will
result in you opening two different connections to MAINDB_MASTER. I
can't think of an obvious reason that this would e required. I suspect
we could work around the problem if we had some sort of aliasing in
the DATABASES definition (i.e., set up 'default' as an alias of
'MASTER_MAIN_DB'), but before we add this, I'd like to understand the
use case to see if it is worth the effort (and potential confusion).
> 4. I am using ContentTypes, and while running tests, if the default database
> is not created first, then the tests fail with an exception that the
> django_content_type table does not exist. For now I have just hacked it so
> the default table is created before any of the others. Perhaps there is a
> better way to fix this problem than that?
We'll have to look into this. Thanks for the report.
> For things like #4, where is the proper place to file a bug about that (if
> there isn't a bug already)? Do bugs from Django branches go in the normal
> tickets filed on djangoproject.com?
Yes. There is a soc2009/multidb version identifier in Trac; open your
tickets there and assign them to that version.
Yours,
Russ Magee %-)
So far, the implemented API is pretty low-level - it lets you direct
On Fri, Sep 4, 2009 at 7:14 AM, Craig Kimerer<craig....@gmail.com> wrote:
> I've spent a little time using this branch and looking at the possibility of
> using it with my project. Below is a short list of problems and ponies that
> I have encountered (or want).
>
> 1. It'd be awesome if we could mark certain databases as slaves. Inserts /
> deletes / creates / drops would only run on the masters (table creation and
> deletion specifically). I can skip the slaves by passing in the databases I
> want to sync, but I still have the next issue.
queries to a particular database, but the practical end-user API for
use cases such as slave/master hasn't been worked on that much. If you
go back to the original specifications, the suggested solution is to
put this in the manager, and so far I can't see any reason why this
won't work.
We'll have to look into this. Thanks for the report.
> 4. I am using ContentTypes, and while running tests, if the default database
> is not created first, then the tests fail with an exception that the
> django_content_type table does not exist. For now I have just hacked it so
> the default table is created before any of the others. Perhaps there is a
> better way to fix this problem than that?
Yes. There is a soc2009/multidb version identifier in Trac; open your
> For things like #4, where is the proper place to file a bug about that (if
> there isn't a bug already)? Do bugs from Django branches go in the normal
> tickets filed on djangoproject.com?
tickets there and assign them to that version.
Yours,
Russ Magee %-)
At the moment it is true that all tables are created on all databases,
but that won't be true in the final version. This ties in with my
comment on your original point 2 - we need much better ways to
describe what data is on what database. Create/Write/Read access to
that data is part of that specification.
I'm yet to be convinced that `Meta: using` is actually a good thing.
IMHO, it's the very model of a setting that makes it impossible to
re-use your application. The setting will probably survive into the
final version, but I suspect we need a much better mechanism than
`Meta: using` for most common use cases. Again, this comes back to my
comment on your original point 2.
> Thanks for the response,
No problems. Thanks for taking some beta code for a spin and giving us feedback.
Yours,
Russ Magee %-)
FWIW, Russ, Joseph Kocherhans, and I discussed this at the DjangoCon
sprints and our conclusion was to have syncdb only sync a single table
at a time, and to take a --exclude flag (or was it --include?) to
specify what models should be syncd.