1.2 Proposal: Database savepoint refactoring

4 views
Skip to first unread message

Richard Davies

unread,
Aug 11, 2009, 11:24:38 AM8/11/09
to Django developers
Hi all,

I've got two open tickets against database savepoints (#11156 and
#9205) and think this is an area which we should take a look at for
1.2 - it is currently inconsistent, IMHO.


Savepoints are described here:
http://docs.djangoproject.com/en/dev/topics/db/transactions/#savepoints

There are two reasons that they exist:

1) As a feature in their own right.

2) To work around database exceptions invalidating PostgreSQL
transactions:
http://docs.djangoproject.com/en/dev/topics/db/transactions/#handling-exceptions-within-postgresql-transactions


At present, Django sometimes automatically uses savepoints in case 2,
and sometimes expects the user to do so. I'd like to improve the
consistency of this behaviour:


2a) Django already automatically uses savepoints inside some methods
(e.g. QuerySet.get_or_create). This means that a database exception
can never break the PostgreSQL transaction.

Here there's an efficiency issue on Oracle - this use of savepoints
should be PostgreSQL-specific, and isn't (see #11156).


2b) There are other similar methods, in which Django does not
automatically use savepoints (e.g. QuerySet.create, Model.save) but in
which database exceptions can easily occur and break the PostgreSQL
transaction - for instance when saving a model with a conflict on a
unique field.

Here it's a consistency issue. To make things easy for the end users,
I believe that Django should automatically add PostgreSQL-specific
savepoints in these cases too, just like in QuerySet.get_or_create
(see #9205).


Cheers,

Richard.

Karen Tracey

unread,
Aug 11, 2009, 11:59:46 AM8/11/09
to django-d...@googlegroups.com
On Tue, Aug 11, 2009 at 11:24 AM, Richard Davies <richard...@elastichosts.com> wrote:

Hi all,

I've got two open tickets against database savepoints (#11156 and
#9205) and think this is an area which we should take a look at for
1.2 - it is currently inconsistent, IMHO.


I've no opinion on the Oracle issue but I do on the consistency issue.

I don't agree the current savepoint use within Django is inconsistent.  As far as I can tell, savepoints are used internally in the one case where Django itself catches and suppresses an IntegrityError.  If there are other cases where Django internally prevents propagation of a database-level error to the application code without using savepoints to ensure that the connection remains in a usable state despite the error that has been hidden from the app, that would be inconsistent.  But near as I can tell you are not pointing out any places like that?

Rather the patch on #9205 strikes me as introducing inconsistency, unless you are telling me that it fixes ALL cases where an IntegrityError at the database level results in an an an unusable PostgreSQL db connection.  Right now it is pretty simple: if your app code catches a database error it must do something to restore the DB connection to a usable state.  With the patch for #9205 would we be able to switch that to state that the app code will never have to worry about clearing the error on the DB connection when it catches a database error?  If not then we would have less consistency than before.

Karen

Richard Davies

unread,
Aug 11, 2009, 1:21:51 PM8/11/09
to Django developers
> I don't agree the current savepoint use within Django is inconsistent.  As
> far as I can tell, savepoints are used internally in the one case where
> Django itself catches and suppresses an IntegrityError.
...
> Right now it is pretty simple: if your app code catches a database error it must
> do something to restore the DB connection to a usable state.  With the patch
> for #9205 would we be able to switch that to state that the app code will
> never have to worry about clearing the error on the DB connection when it
> catches a database error?  If not then we would have less consistency than
> before.

Thanks Karen - that's an interesting take on the #9205 situation. I
agree that our current situation is that "if your app code catches a
database error it must do something to restore the DB connection to a
usable state" and that is a consistent position.

However, this need to "do something" is specific to PostgreSQL - on
all other databases which I'm aware of the problem does not arise and
the DB connection always returns in a usable state (someone please
correct me if I'm wrong here...).

That doesn't strike me as nice - effectively we end up with PostgreSQL-
specific code in the app code. You can see many examples of this in
the current test suite where various calls to create(), save(), etc.
are wrapped in savepoints for the benefit of PostgreSQL only. This
also makes it easy to write a Django app (or Django core code!) which
tests and runs fine on other databases, but is buggy on PostgreSQL
since these savepoint wrappers were omitted.


I'd much prefer to move towards a uniform position of "the Django
framework will always return a usable DB connection". We're already
there on all non-PostgreSQL databases. We're already there on
PostgreSQL for some calls (like get_or_create). I think that we can
easily get there in the remainder (like create, save).

Does this fix ALL cases where an IntegrityError results in an unusable
PostgreSQL db connection? I'd need more work here to really believe
this. Certainly I believe that we can get ALL wrapping uses of
savepoints out of the test suite, which would be a practical
demonstration of getting to or very close to this goal!

Cheers,

Richard.

Malcolm Tredinnick

unread,
Aug 11, 2009, 10:57:33 PM8/11/09
to django-d...@googlegroups.com
On Tue, 2009-08-11 at 10:21 -0700, Richard Davies wrote:
> > I don't agree the current savepoint use within Django is inconsistent. As
> > far as I can tell, savepoints are used internally in the one case where
> > Django itself catches and suppresses an IntegrityError.
> ...
> > Right now it is pretty simple: if your app code catches a database error it must
> > do something to restore the DB connection to a usable state. With the patch
> > for #9205 would we be able to switch that to state that the app code will
> > never have to worry about clearing the error on the DB connection when it
> > catches a database error? If not then we would have less consistency than
> > before.
>
> Thanks Karen - that's an interesting take on the #9205 situation. I
> agree that our current situation is that "if your app code catches a
> database error it must do something to restore the DB connection to a
> usable state" and that is a consistent position.

This isn't a new interpretation, by the way. It's the same position I've
explained to you, Richard, every time you've opened a ticket to
automatically add savepoint() handling to Model.save() or tried to
introduce savepoint() elsewhere in the code. Karen's understanding and
expectation completely matches mine: get_or_create() needs to do
transaction handling the way it does to handle some real-world race
conditions that must be handled inside the model.


> However, this need to "do something" is specific to PostgreSQL - on
> all other databases which I'm aware of the problem does not arise and
> the DB connection always returns in a usable state (someone please
> correct me if I'm wrong here...).

Just because it's the only database in Django's core that uses a single
connection status, doesn't mean it's the only database in the world that
does so. PostgreSQL's particular connection handling has advantages and
disadvantages (one of the disadvantages being the connection handling),
but the ability for all cursors to operate in the same state is handy.

>
> That doesn't strike me as nice - effectively we end up with PostgreSQL-
> specific code in the app code.

Don't think of it as PostgreSQL-specific. Think of it as handling
connections that use this particular state model. The calls are no-ops
on other databases anyway.

Savepoints definitely aren't a perfect solution, since we've never
officially said we no longer support PostgreSQL 7.x (although at some
point I'm going to propose we say that for the Django 1.2/1.3 timeframe)
and savepoints don't exist in that version.

> You can see many examples of this in
> the current test suite where various calls to create(), save(), etc.
> are wrapped in savepoints for the benefit of PostgreSQL only. This
> also makes it easy to write a Django app (or Django core code!) which
> tests and runs fine on other databases, but is buggy on PostgreSQL
> since these savepoint wrappers were omitted.
>
>
> I'd much prefer to move towards a uniform position of "the Django
> framework will always return a usable DB connection".

"...and the user has no choice how to handle the problem." That's why we
don't do this currently. Good that there's a discussion going on around
the trade-offs, but that alternate side needs acknowldgement and
addressing as well.

> We're already
> there on all non-PostgreSQL databases. We're already there on
> PostgreSQL for some calls (like get_or_create). I think that we can
> easily get there in the remainder (like create, save).
>
> Does this fix ALL cases where an IntegrityError results in an unusable
> PostgreSQL db connection? I'd need more work here to really believe
> this.

That would be an absolute minimum for a proposal like this. It's all or
nothing. Right now we have completely consistent behaviour. Changing it
to be inconsistent would be a regression.

Regards,
Malcolm

Reply all
Reply to author
Forward
0 new messages