Proposal: New transaction API with multiple databases

39 views
Skip to first unread message

Adrian Holovaty

unread,
Mar 13, 2009, 6:51:02 PM3/13/09
to django-d...@googlegroups.com
I've been trying to get a multiple-database setup working with Django.
Thanks to some trunk changes from the past few days (not to mention
all of Malcolm's work with queryset-refactor, etc.), doing SELECTs
from multiple databases is now pretty easy -- and I'd even call it
"clean"! But INSERTs, UPDATEs and DELETEs are still a pain, mostly due
to Django's transaction management infrastructure.

In exploring this, I've realized that Django's transaction management
needs to be refactored to get multiple-database support working. BUT,
if we do it right, we have an opportunity to simplify our transaction
APIs at the same time. I have never liked our transaction APIs --
requiring decorators? WTF? -- so I have long waited for the day when
we can clean it up and make it into something that I actually enjoy
using (and don't have to look up in the documentation each time I want
to use it).

So I have a proposal. Here are the main design concepts:

* Django manages a set of connections, accessible either via a global
dictionary (maybe django.db.connections) or some API
(django.db.get_connection()). This is what we've been talking about in
the "More multi-database plumbing" thread on django-developers.

* Connections are specified in the settings file. Each connection has
a label, like "default", "auth", whatever.

* Transactions are managed via methods on connection objects. NOT via
some strange decorator and magic global django.db.transaction variable
that comes out of thin air.

* QuerySets (and likely model objects, too) have hooks for
*optionally* specifying which connection to use in their queries.

* We retain the concept of a "default" connection, which is the key
for making this backwards-compatible and easy to use.

Here's some example code:

"""
from django import db
from mysite.users.models import User

# This updates User objects on the default connection with
# auto-commit.
User.objects.update(is_registered='t')

# This updates User objects on the "auth" connection with
# auto-commit.
User.objects.with_connection("auth").update(is_registered='t')

# This is equivalent to the previous example, demonstrating
# that with_connection() also can take a connection object.
conn = db.get_connection('auth')
User.objects.with_connection(conn).update(is_registered='t')

# This updates User objects on the default connection within
# a transaction.
conn = db.get_connection() # Equivalent to get_connection('default')
conn.begin()
User.objects.update(is_registered='t')
conn.commit()

# This updates User objects on the "auth" connection within
# a transaction.
conn = db.get_connection('auth')
conn.begin()
User.objects.with_connection(conn).update(is_registered='t')
conn.commit()
"""

For backwards compatibility, we can still keep the legacy decorators
-- transaction.commit_on_success(), etc. -- and they'd just work on
the default connection. But we'd encourage people to use this new API.

My proposal is not necessarily to get this in Django 1.1, but to get
it in trunk at the very least. I'm selfishly motivated by my own
project to get this done ASAP, so I'm very happy to do the
development.

Adrian

Brian Rosner

unread,
Mar 13, 2009, 7:20:57 PM3/13/09
to django-d...@googlegroups.com

On Mar 13, 2009, at 4:51 PM, Adrian Holovaty wrote:

> * Transactions are managed via methods on connection objects. NOT via
> some strange decorator and magic global django.db.transaction variable
> that comes out of thin air.

I agree that the global functions are not anything to write home
about. However, I disagree that the decorators need to be written off
so quickly. I like the idea that transaction management can be done on
this new connection object. I see the decorators are still useful and
I use them all the time. They can be modified to not only work on the
default connection, but couldn't they take a connection as an argument?

@transaction.commit_on_success(conn)

When no arguments are given it can simply fallback to the default
connection. The decorators provide, IMHO, a quick way to wrap a
function in some sort of transaction scheme that work really well.

Brian Rosner
http://oebfare.com

James Bennett

unread,
Mar 13, 2009, 7:31:47 PM3/13/09
to django-d...@googlegroups.com
On Fri, Mar 13, 2009 at 5:51 PM, Adrian Holovaty <adr...@holovaty.com> wrote:
> My proposal is not necessarily to get this in Django 1.1, but to get
> it in trunk at the very least. I'm selfishly motivated by my own
> project to get this done ASAP, so I'm very happy to do the
> development.

Since 1.1's only about a month away and we need to focus on finishing
up the features planned for it and squashing bugs before the release,
might it be better to manage this work, for now, in a branch (either
in the main SVN repo, or on an external DVCS mirror like github or
bitbucket)?


--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Jacob Kaplan-Moss

unread,
Mar 13, 2009, 8:00:31 PM3/13/09
to django-d...@googlegroups.com
On Fri, Mar 13, 2009 at 5:51 PM, Adrian Holovaty <adr...@holovaty.com> wrote:
> My proposal is not necessarily to get this in Django 1.1, but to get
> it in trunk at the very least. I'm selfishly motivated by my own
> project to get this done ASAP, so I'm very happy to do the
> development.

Like James, I'm concerned with getting a 1.1 release that's as
high-quality as possible, and I'm concerned that a big change like
this late in the game could be too destabilizing to hit our (already
delayed) release timeline. On top of that, it rubs me the wrong way to
make our community go through a whole feature proposal process only to
drop a big feature in at the last minute.

We faced a similar decision with aggregation support in the run up to
1.0: it was *mostly* done by feature freeze, but we opted to hold it
to give more time for testing and for the feature set to mature.
Personally, I think it worked out great: 1.0 avoided delays or
destabilization, we got an "easy win" feature for 1.1, and having that
feature nearly ready helped us have a nice short window between 1.0
and 1.1.

I'd say that robust multiple database APIs could be a similar "easy
win" for 1.2 if we start a branch now and get it merged during the 1.2
window. If that branch stays tightly locked to trunk as we stabilize
things for 1.1 it's entirely possible that the branch could be stable
enough for those of us who're used to bleeding edge releases to just
use instead of trunk. I probably will, at least.

As for the specific API itself: I think I need to chew it over a bit.
Seems nice and simple, but I'd like to run through the various
multiple-database use cases I've encountered and think about how
they'd work. In general I'm pretty happy with the direction: I agree
with the annoyance of the global transaction management stuff, and I'd
love to say "good riddance" to it.

Jacob

qwerty

unread,
Mar 13, 2009, 8:22:44 PM3/13/09
to django-d...@googlegroups.com
2009/3/14 Adrian Holovaty <adr...@holovaty.com>

What about having an attribute in the Meta class of the model that let's the model have a default connection for executing the 4 most common different operations in each conneciton, something like

class MyModel(models.Model)
    class Meta:
        select_conn = "default"
        insert_conn = "write_conn"
        update_conn = "write_conn"
        delete_conn = "write_conn"

This would make it easy to use Django in a single-master/multi-slave scenario.
--
http://blog.cuerty.com

Malcolm Tredinnick

unread,
Mar 13, 2009, 8:29:52 PM3/13/09
to django-d...@googlegroups.com
On Sat, 2009-03-14 at 19:52 +1930, qwerty wrote:
[...]

> What about having an attribute in the Meta class of the model that
> let's the model have a default connection for executing the 4 most
> common different operations in each conneciton, something like
>
> class MyModel(models.Model)
> class Meta:
> select_conn = "default"
> insert_conn = "write_conn"
> update_conn = "write_conn"
> delete_conn = "write_conn"

Urgh. :-(

That's four attributes, not one. It doesn't seem to have anything to do
with transactions, either. Please go back and read the long thread from
last September on multi-db support before going down the "design a
multi-db API" path. We've already been over a lot of the requirements
and options here. This really isn't the thread to revisit that (I would
hope).

Tying a model to a particular (set of) databases connection(s) at
declaration time is unnecessarily tight coupling. If you change the
connection configuration -- an operational/runtime issue -- you need to
edit all your model source code to keep up. We can avoid doing that.

Malcolm

Adrian Holovaty

unread,
Mar 13, 2009, 9:05:39 PM3/13/09
to django-d...@googlegroups.com
On Fri, Mar 13, 2009 at 7:00 PM, Jacob Kaplan-Moss
<jacob.ka...@gmail.com> wrote:
> Like James, I'm concerned with getting a 1.1 release that's as
> high-quality as possible, and I'm concerned that a big change like
> this late in the game could be too destabilizing to hit our (already
> delayed) release timeline. On top of that, it rubs me the wrong way to
> make our community go through a whole feature proposal process only to
> drop a big feature in at the last minute.

Whoop, I should've been much more sensitive to the 1.1 deadline in how
I presented this. I'm guilty of caring too much about the particular
feature and not enough about how it fits into timelines and particular
Django releases.

Can you blame me? Multiple-database support is dead sexy. :-)

Sounds like the best way for me to work on this without disrupting the
1.1 momentum is to set up a dedicated branch. I'll post a note here
when I've got that up and running.

Adrian

Shai Berger

unread,
Mar 13, 2009, 10:19:54 PM3/13/09
to django-d...@googlegroups.com
Hi, a nitpick and two material issues:

On Saturday 14 March 2009, Brian Rosner wrote:
> On Mar 13, 2009, at 4:51 PM, Adrian Holovaty wrote:
> > * Transactions are managed via methods on connection objects. NOT via
> > some strange decorator and magic global django.db.transaction variable
> > that comes out of thin air.
>
> I agree that the global functions are not anything to write home
> about. However, I disagree that the decorators need to be written off
> so quickly. I like the idea that transaction management can be done on
> this new connection object. I see the decorators are still useful and
> I use them all the time. They can be modified to not only work on the
> default connection, but couldn't they take a connection as an argument?
>
> @transaction.commit_on_success(conn)
>

Nitpick: The decorators are called at import time, when there is no connection
object available. They can take a connection name. That, in turn, may cause
unwanted dependencies between settings.py and application code (it could be
acceptable, much like an application requiring a specific setting is
acceptable, but it is a point to consider).

> When no arguments are given it can simply fallback to the default
> connection. The decorators provide, IMHO, a quick way to wrap a
> function in some sort of transaction scheme that work really well.
>

First issue:

Besides the decorators, which Brian suggests to salvage, the current global
transaction management also supports the transaction middleware; and it will
be a little harder to resurrect the latter under the proposed scheme.

Core committers: Does your annoyance with current global transaction
management also apply to this middleware?

Second issue:

I understand the spirit that prefers code that is fast and simple, to code
that, by default, behaves correctly in fringe cases. However, I think that
when judging such cases, one should also take into account the costs, to
users, of making the code correct. Ticket #9964[1] provides an example where
this cost is relatively low (at least for now; add "transaction.set_dirty()"
calls in easily-identifiable places in your code). Multiple connections
without distributed transactions is a case where this cost is high -- the
database distribution is often not at the developer's control, and if a
distributed transaction is required, this may mean executing obscure,
engine-specific SQL (classic use-case for this: transfer some asset between
sharded accounts).

Thanks for your attention,

Shai.


[1] http://code.djangoproject.com/ticket/9964

Glenn Maynard

unread,
Apr 5, 2009, 11:24:14 PM4/5/09
to Django developers
I'm very interested in a cleaner transaction interface. I just wrote
a contextmanager to do the usual "run this code in a transaction" bit,
and it took a day and a half instead of a few minutes.

The goals were typical: to be able to make SQL calls atomically,
without caring about whether a transaction is already running or not,
without disrupting any transaction the caller may already have open
(including on exception), and without the caller even having to know
that you're using the connection.

Here's the code I ended up with, including some tests. It depends
quite a lot on implementation details of db.transaction. I don't
really understand why db.transaction is this complicated, instead of
just having an ordinary transaction.begin/commit/rollback interface.
I guess part of it is so postgresql_psycopg2 can support autocommit
when not in a transaction (since that's the only thing that ever
happens when enter/leave are called); personally, I think the whole
"automatic transactions" design of DB-API was a major design error...

(The test is a bit of a hack; I didn't want to create a whole app for
this, and I don't want to create a real model in my site that's only
used by a test.)

http://zewt.org/~glenn/transactions/sdb.py
http://zewt.org/~glenn/transactions/tests.py

Andreas

unread,
Apr 6, 2009, 5:00:37 PM4/6/09
to Django developers
I dont know if this has been covered in some of the mentioned previous
multi db support threads but how is it supposed to work with admin?

Alex Gaynor

unread,
Apr 6, 2009, 5:04:02 PM4/6/09
to django-d...@googlegroups.com


On Mon, Apr 6, 2009 at 5:00 PM, Andreas <andr...@gmail.com> wrote:

I dont know if this has been covered in some of the mentioned previous
multi db support threads but how is it supposed to work with admin?


It seems rather orthagonal to the admin to me(at least as far as it can be).  The Admin executes queries using the Django ORM, so all the hooks that have been proposed can be fully utilised by any queries the admin does.  The admin doesn't need special hooks, other than the queryset method which already exists, and even if it did those are fully outside the scope of my proposal.

Alex

--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Alex Gaynor

unread,
Apr 6, 2009, 5:04:42 PM4/6/09
to django-d...@googlegroups.com
Wow, I'm a fantastic fool, somehow I thought this was my multi-db thread.  Please ignore my previous message.

Andreas

unread,
Apr 6, 2009, 5:22:52 PM4/6/09
to Django developers
I guess this just proves there's too many multi db threads and that we
are many who're happy Adrian is making it happening. :)

On Apr 6, 11:04 pm, Alex Gaynor <alex.gay...@gmail.com> wrote:
> On Mon, Apr 6, 2009 at 5:04 PM, Alex Gaynor <alex.gay...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages