Postgresql transaction aborts, despite being in autocommit mode

165 views
Skip to first unread message

Richard Davies

unread,
Sep 19, 2008, 5:48:16 PM9/19/08
to Django developers
Hi all,

I believe that I have found a bug in Django 1.0 for both Postgresql
backends, but would very much appreciate your thoughts in case I am
misunderstanding something... The test case seems clear enough that
this must have been noticed before, and there is an existing open
ticket and patch for a related problem.

I have a simple model:

================
class Test(models.Model):
data1 = models.IntegerField(unique=True)
data2 = models.IntegerField()
================

and a view using it:

================
def render(request):
a = Test()
a.data1 = 1
a.data2 = 10
a.save()

b = Test()
b.data1 = 1
b.data2 = 20
try:
b.save()
except IntegrityError:
# Expected since data1 not unique
pass

c = Test()
c.data1 = 2
c.data2 = 30
c.save()

return HttpResponse('Test')
================

All of this code is run without transaction middleware, which I
believe means that every database operation should auto-commit
individually, regardless of whether the database supports
transactions.


Correct behavior is for the view to return 'Test' and leave objects
'a' and 'c' in the database. I observe this on both MySQL with MyISAM
tables and MySQL with InnoDB tables (which supports transactions).

With Postgresql*, an internal error is raised by the line 'c.save()',
with the text 'current transaction is aborted, commands ignored until
end of transaction block'. Only object 'a' is left in the database.


I can work around the Postgresql behaviour by adding an explicit
savepoint and rollback around 'b.save()', but surely this should not
be necessary? I am operating in autocommit mode, so there should be no
transaction present. Equally, MySQL InnoDB does not require these.


I believe this boils down to the same issue as ticket [3460] - both
the psycopg1 and psycopg2 backends are using the wrong isolation
level, as a result of which the Django SQL is wrapped by psycopg
inside an implicit transaction. I have tried using the [3460] patch,
which gives me the Postgresql behaviour that I expected.


Your thoughts very much appreciated - assuming I understand the
meaning of "autocommit" this seems quite a fundamental problem.

Best regards,

Richard.


* My test is with Postgresql 8.1.4 and Psycopg2 2.0.7. However, if my
diagnosis is correct, then this will hold for all versions and also
for the Psycopg1 backend.

Jacob Kaplan-Moss

unread,
Sep 19, 2008, 7:12:30 PM9/19/08
to django-d...@googlegroups.com
Hi Richard --

What you've described is documented behavior for PostgreSQL (try
googling for "commands ignored until end of transaction block") --
Postgres does not alow *any* commands in a transaction after a
database error; you have to commit or rollback first.

Jacob

Collin Grady

unread,
Sep 19, 2008, 9:05:09 PM9/19/08
to django-d...@googlegroups.com
This is related to http://code.djangoproject.com/ticket/3460

--
Collin Grady

Richard Davies

unread,
Sep 20, 2008, 2:47:41 AM9/20/08
to Django developers
Hi Jacob,

I agree that this is documented behavior for PostgreSQL
_transactions_.

The reason that I think it's a bug is that I shouldn't be in a
transaction at all - as I understand http://docs.djangoproject.com/en/dev/topics/db/transactions/
, the default behaviour should be auto-commit in which each individual
change is committed independently and individually, and there is no
transaction taking place. This is what I see from MySQL InnoDB.

My problem is that I can't get this auto-commit behavior from
PostgreSQL with Django - either in the default mode or even when I
explicitly use TransactionMiddleware with the @transaction.autocommit
decorator or set DISABLE_TRANSACTION_MANAGEMENT to True.


My suggestion is that changing the psycopg isolation level to zero (as
per ticket 3460) will give true auto-commit behaviour for PostgreSQL,
since otherwise psycopg wraps the Django SQL in a transaction block.
Note both psycopg1 and psycopg2 both provide an explicit function to
switch them into autocommit mode by setting the isolation level to
zero (search for 'autocommit' in http://www.initd.org/svn/psycopg/psycopg2/trunk/lib/psycopg1.py
and http://www.initd.org/svn/psycopg/psycopg1/trunk/connection.c
respectively).

Cheers,

Richard.


On Sep 20, 12:12 am, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:
> Hi Richard --
>
> What you've described is documented behavior for PostgreSQL (try
> googling for "commands ignored until end of transaction block") --
> Postgres does not allow *any* commands in a transaction after a

Malcolm Tredinnick

unread,
Sep 21, 2008, 4:25:45 AM9/21/08
to django-d...@googlegroups.com

On Fri, 2008-09-19 at 23:47 -0700, Richard Davies wrote:
> Hi Jacob,
>
> I agree that this is documented behavior for PostgreSQL
> _transactions_.
>
> The reason that I think it's a bug is that I shouldn't be in a
> transaction at all - as I understand http://docs.djangoproject.com/en/dev/topics/db/transactions/
> , the default behaviour should be auto-commit in which each individual
> change is committed independently and individually, and there is no
> transaction taking place.

That documentation says it is similar to auto-commit. It doesn't say it
actually is auto-commit. There are explicit transaction commits in the
code (grep for commit_unless_managed()).

> This is what I see from MySQL InnoDB.
>
> My problem is that I can't get this auto-commit behavior from
> PostgreSQL with Django - either in the default mode or even when I
> explicitly use TransactionMiddleware with the @transaction.autocommit
> decorator or set DISABLE_TRANSACTION_MANAGEMENT to True.
>
>
> My suggestion is that changing the psycopg isolation level to zero (as
> per ticket 3460) will give true auto-commit behaviour for PostgreSQL,
> since otherwise psycopg wraps the Django SQL in a transaction block.

Whilst it's true that this would give auto-commit behaviour, there's a
valid alternate side to this that doesn't seem to be addressed: whether
that's actually a good idea or not.

One reason not to do this is that it simply isn't standard behaviour for
Python database adaptors (they must be in non-autocommit mode
initially). So there's a principle of least-surprise thing going on.

As I mentioned to Collin at the code sprint in Portland, I think it'd be
a good idea to make sure we expose the ability to turn on auto-commit,
but I don't really like making it the default. In any case, providing
the ability that's can be controlled via, for example, a setting is
certainly the first step here. That's pretty independent of the whatever
the default might end up being. That's really the improvement needed to
#3460 at the moment -- separating adding functionality from changing the
default.

Regards,
Malcolm


meta...@gmail.com

unread,
Sep 21, 2008, 10:32:08 AM9/21/08
to Django developers
> I believe this boils down to the same issue as ticket [3460] - both
> the psycopg1 and psycopg2 backends are using the wrong isolation
> level, as a result of which the Django SQL is wrapped by psycopg
> inside an implicit transaction. I have tried using the [3460] patch,
> which gives me the Postgresql behaviour that I expected.

Yes, This is exactly what [3460] solves, aside from the inefficiency
of implicit transactions. Collin and I tried very hard to convince
Jacob and others that this should be included in 1.0, but they
disagreed and many of us must still maintain private forks of Django
to get around this. I'll note that this isn't just a problem with
Django, but with any naive use of Python's DB-API. For ease of use,
DB-API wraps everything in a transaction. The end result is an API
that works well for common use cases and is easy for developers to
understand. It is also highly inefficient for Web applications (since
they most often do nothing transactional) and gets you into trouble in
error cases.

> Whilst it's true that this would give auto-commit behaviour, there's a
> valid alternate side to this that doesn't seem to be addressed: whether
> that's actually a good idea or not.
>
> One reason not to do this is that it simply isn't standard behaviour for
> Python database adaptors (they must be in non-autocommit mode
> initially). So there's a principle of least-surprise thing going on.

The principle of least surprise is for the database to act like a
database. That means not wrapping things in implicit transactions
which will surprise you on errors when the abstraction is no longer
transparent. I find it less surprising if my database time is spent
on inefficiently crafted queries instead of on overhead that I cannot
control except through patching Django internals.

Yes, DB-API has this default. We can argue all day about whether it
is the right default. I believe that it exists because mysql does not
have transactions (or at least did not back then) and the person who
designed this API designed it for that case. Transactions were an
afterthought; they cannot be controlled explicitly by the API as
written. Even psycopg2's autocommit is specific to that
implementation.

For Django, not using the default behavior is a clear win. It doesn't
create weird errors where hte implicit transactions are exposed. It
is enormously more efficient when doing large amounts of queries that
aren't transactional. It makes the explicit transaction control
actually do what you intend (eg, autocommit decorator actually
autocommits).

> As I mentioned to Collin at the code sprint in Portland, I think it'd be
> a good idea to make sure we expose the ability to turn on auto-commit,
> but I don't really like making it the default. In any case, providing
> the ability that's can be controlled via, for example, a setting is
> certainly the first step here. That's pretty independent of the whatever
> the default might end up being. That's really the improvement needed to
> #3460 at the moment -- separating adding functionality from changing the
> default.

I do not understand your reasoning here. You seem to be saying "the
DB-API guys were way smarter than all of us, and they must have had a
good reason for doing it this way". This means Django does not do
what it is capable of doing (ie, broken autocommit decorator) and
remains inefficient in common cases. To my knowledge Perl and Java's
database APIs do nothing like this. I suspect (but haven't verified)
that Ruby is similar. Python is alone here in this implicit
transaction garbage.

Why have transactions _at all_ if they are always provided for you? I
don't understand that part either. If you think users expect all
these implicit transactions, it seems superfluous to have any explicit
transaction handling in Django at all. Is it realy the user's
expectation to have and use nested transactions?

I think if you actually exposed the transactional overhead and the
connection overhead in the django.db.queries array, you'd find that
more of your users would be demanding fixes here. The fact that you
hide this means people aren't even aware of the problems.

jack.

Ivan Sagalaev

unread,
Sep 21, 2008, 11:42:31 AM9/21/08
to django-d...@googlegroups.com
meta...@gmail.com wrote:
> It is also highly inefficient for Web applications (since
> they most often do nothing transactional) and gets you into trouble in
> error cases.

I don't want to get into the argument about ticket 3460 itself but I
just don't get this paragraph... What is special in web applications
that makes them not require transactions? And more, how transactions --
a mechanism for preventing DB to end up in an inconsistent state -- can
get you into error cases?

FWIW, we're developing a typical web service[1] that, basically, lets
people to say "I'm going to this event". And we rely on transactions
very much because we heavily use denormalized data like keeping count of
people for an event. The only sane way of making sure that those stored
counts are sync'd to real counts of people in another table is to do
changes to them in a single transaction. It's just one example anyway...

[1]: http://kuda.yandex.ru/

Richard Davies

unread,
Sep 21, 2008, 2:32:25 PM9/21/08
to Django developers
> One reason not to do this is that it simply isn't standard behaviour for
> Python database adaptors (they must be in non-autocommit mode
> initially). So there's a principle of least-surprise thing going on.

I'm not convinced by the argument of least-surprise. As you said,
there
are explicit commit_unless_managed() calls inside the Django internals
(e.g. inside save() ) - but a Django _user_ (rather than developer)
would
not have read this code before calling save().

So, a Django user who expects Python database adaptors to be
non-autocommit is _already_ being surprised by the fact that save()
behaves as if it is.


By designing default Django behaviour to be "similar to auto-commit",
I think we've already decided that the DB-API default is wrong.

It's now a case of how Django implements the "similar to auto-commit"
behavior most cleanly and efficiently. I think the answer is:
- Get auto-commit connections from Python database adaptors by
specifically requesting them
- Delete all of the commit_unless_managed() calls
- Change enter_transaction_management() / transaction.managed(True)
to explicitly issue an SQL BEGIN
- Change leave_transaction_management() / transaction.commit /
transaction.rollback to explicitly issue SQL COMMIT / ROLLBACK

I suspect that this will significantly simplify the Django codebase
vs.
trying to simulate both auto-commit and transaction behaviors on top
of a transactional connection - it would certainly remove most of the
mechanisms in django/db/transaction.py which track state, dirty, etc.

Cheers,

Richard.

Richard Davies

unread,
Sep 21, 2008, 4:12:27 PM9/21/08
to Django developers
> > It is also highly inefficient for Web applications (since
> > they most often do nothing transactional) and gets you into trouble in
> > error cases.
>
> I don't want to get into the argument about ticket 3460 itself but I
> just don't get this paragraph... What is special in web applications
> that makes them not require transactions? And more, how transactions --
> a mechanism for preventing DB to end up in an inconsistent state -- can
> get you into error cases?

On the first question, I'd suspect that many pages of many (simple)
web applications just have a one-to-one mapping between the web page
and a single line of SQL on a normalized database - e.g. a page
listing all events (single SELECT), a page listing all people at an
event (single SELECT), a form to add an event (single INSERT), etc.
Wrapping these in transactions is pure overhead.

On the second question, one example is my initial bug report starting
this thread (admittedly PostgreSQL specific). Object 'c' is not saved
when you would have expected it to be (and it is on other database
backends). The problem isn't that transactions are bad in themselves,
but that the code runs in a imperfect simulation of an autocommit
environment on top of an implicit transaction (which the user would
not be expecting).

Cheers,

Richard.

Ivan Sagalaev

unread,
Sep 21, 2008, 5:46:56 PM9/21/08
to django-d...@googlegroups.com
Richard Davies wrote:
> On the first question, I'd suspect that many pages of many (simple)
> web applications just have a one-to-one mapping between the web page
> and a single line of SQL on a normalized database - e.g. a page
> listing all events (single SELECT), a page listing all people at an
> event (single SELECT), a form to add an event (single INSERT), etc.
> Wrapping these in transactions is pure overhead.

When it comes to overhead... As far as I know PostgreSQL in autocommit
mode will wrap each statement (even SELECT) in an implicit transaction.

From
http://it.toolbox.com/blogs/database-soup/postgresql-application-performance-tips-part-1-13172:

> While you may think that you are not using transactions for singleton
> read-only SELECT statement, in fact every single statement in
> PostgreSQL is in a transaction. In the absence of an explicit
> transaction, the statement itself is an implicit transaction.

So what Django does now (and what DB API suggests) by starting and
explicit transaction up to the first save():

BEGIN;
SELECT ...
UPDATE ...
COMMIT;

is actually more efficient then equivalent in auto-commit mode:

implicit begin;
SELECT ...
implicit commit;
implicit begin;
UPDATE ...
implicit commit;

And the more statements you have the worse auto-commit is.

So if this ticket should be fixed then only for consistency with other
DBs, not for performance reasons. But I actually think that performance
is more important here. It's not hard to do explicit rollback anyway in
your view code if you're recovering from an exception.

Malcolm Tredinnick

unread,
Sep 21, 2008, 8:18:01 PM9/21/08
to django-d...@googlegroups.com

On Sun, 2008-09-21 at 11:32 -0700, Richard Davies wrote:
[...]

> I suspect that this will significantly simplify the Django codebase
> vs.
> trying to simulate both auto-commit and transaction behaviors on top
> of a transactional connection - it would certainly remove most of the
> mechanisms in django/db/transaction.py which track state, dirty, etc.

It will also leave things vulnerable to having only some of your objects
updated, some of them deleted, some of them saved with no way to restore
to a known state. This proposed change is a bit more complicated than
the patches in #3460, since it's kind of a good thing to preserve
integrity of operations for things like saving, deleting and updating.
All of those involve more than one SQL operation in the general case.

As I mentioned previously, we're always going to need some transactional
management for stuff like that, so this is going to go a lot further
more easily when we have a patch that allows the behaviour to be
controlled. Then we can worry about what the default setting might be.
At the moment #3460 provides a good way to end up with inconsistent data
due to interactions with other transactions and an inability to roll
back to the start. The fact that none of the many patches on that ticket
even touch save/update/delete handling has to be an error.

Malcolm

Richard Davies

unread,
Sep 22, 2008, 1:38:43 AM9/22/08
to Django developers
Malcolm Tredinnick wrote:
> As I mentioned to Collin at the code sprint in Portland, I think it'd be
> a good idea to make sure we expose the ability to turn on auto-commit,
> but I don't really like making it the default. In any case, providing
> the ability that's can be controlled via, for example, a setting is
> certainly the first step here. That's pretty independent of the whatever
> the default might end up being. That's really the improvement needed to
> #3460 at the moment -- separating adding functionality from changing the
> default.

...and then repeated the point in a later mail, so I think that we
should try to write a patch which a) makes 3460 a configuration option
as suggested, and b) supports transactions on top of the autocommit
connection when explicitly requested (3460 currently doesn't, so parts
of the test suite currently fail when it is applied).

I'd appreciate people's thoughts on the following proposal of how to
do this. Particularly Malcolm, Jack and Collin since they seem the
main contributors to date.


1) Mechanism to make it a configuration option

Various people on the 3460 ticket tracker have suggested using [6064],
but that isn't yet in trunk, and also wasn't generally agreed on the
tracker (e.g. does not allow custom Python). So, let's not do that.

We need two things
- a way to create the backend in autocommit mode: I suggest a flag
'native_autocommit' for DATABASE_OPTIONS
- a way for the running Django to tell that the backend is in
autocommit mode: I suggest a flag 'uses_native_autocommit' for
BaseDatabaseFeatures

In both cases I'm deliberately choosing mechanisms and words so that
other backends could also later be changed to support an autocommit
mode.


2) Transaction support when explicitly requested

The current 3640 patch leaves a Django which does not support any
transaction behavior other than autocommit (leading to various test
suite failures). The problem is that django/db/transaction.py also
needs to be rewritten to actually create transactions (i.e. issue SQL
BEGIN/COMMIT or BEGIN/ROLLBACK) on top of the connection when the user
explicitly requests them. This is also a better and more general
solution that the current hack to one specific case in loaddata, which
could then also be removed.

We'll essentially need two independent versions of transaction.py -
the existing one for when Django runs on top of an implicit
transaction, and a new one which actually creates transactions for
when it doesn't. Implementations would be very different, but the
interface and semantics can be identical, I think (except for
rollback_unless_managed(), which cannot be supported, so would have to
be removed from both versions and uses rewritten in some other form).
I propose that we write the new version and test
'uses_native_autocommit' at the start of each function to decide which
implementation to use.

Malcolm:
- You previously wrote: "This proposed change is a bit more
complicated than the patches in #3460" - I do agree, but I think this
is the only way to get 3460 to pass the transaction parts of the test
suite! Do you see a simpler alternative?
- You also wrote "The fact that none of the many patches on that
ticket even touch save/update/delete handling has to be an error" -
again I agree, but think that all of the required changes to save/
update/delete handling can be isolated away into transaction.py, as
above. Do you see other problems?


As I say, I'd appreciate people thoughts on this proposal - would this
leave us with a patch for 3460 which will be accepted into trunk? Do
people see problems/improvements to my proposal? This will be quite a
bit of code for someone to write, so I'd like the design clear and
agreed first.

Cheers,

Richard.

Richard Davies

unread,
Sep 22, 2008, 1:52:52 AM9/22/08
to Django developers
Ivan Sagalaev wrote:
> When it comes to overhead... As far as I know PostgreSQL in autocommit
> mode will wrap each statement (even SELECT) in an implicit transaction.
>
> http://it.toolbox.com/blogs/database-soup/postgresql-application-performance-tips-part-1-13172
...
> So what Django does now ... is actually more efficient then equivalent in auto-commit mode

That's an interesting article, and I follow your logic. However,
Jack's original patch follows from profiling work he did on his
Chesspark site (see http://metajack.wordpress.com/2007/07/25/do-you-know-what-your-database-is-doing/
) in which he claimed a 2-3x speed improvement from changing to auto-
commit mode. I haven't done the equivalent testing myself, and would
be interested in an explanation of why your article and his tests give
such different conclusions.

Ivan Sagalaev

unread,
Sep 22, 2008, 3:13:19 AM9/22/08
to django-d...@googlegroups.com
Richard Davies wrote:
> That's an interesting article, and I follow your logic. However,
> Jack's original patch follows from profiling work he did on his
> Chesspark site (see http://metajack.wordpress.com/2007/07/25/do-you-know-what-your-database-is-doing/
> ) in which he claimed a 2-3x speed improvement from changing to auto-
> commit mode. I haven't done the equivalent testing myself, and would
> be interested in an explanation of why your article and his tests give
> such different conclusions.

I've done a quick non-scientific test. Here's the code:

c = psycopg2.connect(...)

c.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = c.cursor()
now = datetime.now()
for x in range(1000):
cursor.execute('select created from cicero_article where id
= %s', [x])
res = cursor.fetchall()
c.commit()
print datetime.now() - now

For the record, 'select' there does a fetch by primary key from a ~6500
record table (i.e. -- fast). The results of executing 1000 selects are:

ISOLATION_LEVEL_AUTOCOMMIT: ~0.29 sec
ISOLATION_LEVEL_READ_COMMITTED: ~0.26 sec

So the default level is indeed faster but I have to confess I'm a lousy
tester so I encourage anybody to conduct their own experiment.

Also this observation matches psycopg2 docs[1]:

> `ISOLATION_LEVEL_READ_COMMITTED` This is the default value. A new
> transaction is started at the first `.execute()` command on a cursor
> and at each new `.execute()` after a `.commit()` or a `.rollback()`.

Which conflicts with what 3460 states[2]:

> The difference between autocommit and read committed is that read
> committed in psycopg2 puts all statements inside a BEGIN/END block
> (or BEGIN/ROLLBACK or BEGIN/COMMIT).

Jack, can you comment?

[1]: http://www.initd.org/svn/psycopg/psycopg2/trunk/doc/extensions.rst
[2]: http://code.djangoproject.com/ticket/3460

meta...@gmail.com

unread,
Sep 22, 2008, 9:52:02 AM9/22/08
to Django developers
> I don't want to get into the argument about ticket 3460 itself but I
> just don't get this paragraph... What is special in web applications
> that makes them not require transactions? And more, how transactions --
> a mechanism for preventing DB to end up in an inconsistent state -- can
> get you into error cases?

We applications tend to be read heavy. Transactions are still needed;
I never claimed they weren't. But most queries do not need to be part
of a transaction to maintain consistency. For example:

SELECT username FROM auth_user

needs no BEGIN/COMMIT pair. Adding one will still work, but causes a
fair amount of overhead on top of the normal query.


The weird error case occurs because the developer has not explicitly
stated they want a transaction, and yet they find they are in one.
When an error occurs in a transaction, you must rollback and start a
new one. When an error occurs outside a transaction, it is enough to
retry. This is where the "helpful" DB-API abstraction starts to
leak. It doesn't shield you from these two different error recovery
scenarios, as Richard found out.

jack.

meta...@gmail.com

unread,
Sep 22, 2008, 10:03:26 AM9/22/08
to Django developers
> When it comes to overhead... As far as I know PostgreSQL in autocommit
> mode will wrap each statement (even SELECT) in an implicit transaction.

So what? Should we wrap it in 5 more just for giggles? Each one adds
overhead.

> > While you may think that you are not using transactions for singleton
> > read-only SELECT statement, in fact every single statement in
> > PostgreSQL is in a transaction. In the absence of an explicit
> > transaction, the statement itself is an implicit transaction.

I think the terminology is confused here. The "implicit transaction"
i talked about in DB-API is that the database connector actually adds
a BEGIN/END pair. What Josh is saying here is that each statement has
significant round trip overhead, not that PostgreSQL adds BEGIN/END
pairs. We actually know that it doesn't, since the error cases inside
a BEGIN/COMMIT or BEGIN/END are different than when not inside a
transaction.

Of course the SQL specification requires that single statements be
consistent, which is something like a transaction, but this is true of
mysql as well.

> So what Django does now (and what DB API suggests) by starting and
> explicit transaction up to the first save():
>
> BEGIN;
> SELECT ...
> UPDATE ...
> COMMIT;
>
> is actually more efficient then equivalent in auto-commit mode:
>
> implicit begin;
> SELECT ...
> implicit commit;
> implicit begin;
> UPDATE ...
> implicit commit;

Incorrect. There is no implicit begin/commit in postgresql.
PostgreSQL treats a single statement similarly to a transaction, but
it does not add BEGIN/COMMIT which causes further overhead over single
statements.

Look, I"m not arguing from theory here. When we hit the wall on
database performance we did a lot of analysis on where the hit came
from. Some 30% or more of our database time was just in useless BEGIN
and COMMIT statements that were wrapping SELECTS. This is a real
problem, regardless of any extra behavior of PostgreSQL.


> So if this ticket should be fixed then only for consistency with other
> DBs, not for performance reasons. But I actually think that performance
> is more important here. It's not hard to do explicit rollback anyway in
> your view code if you're recovering from an exception.

Django with #3460 is _significantly_ faster when the query load is
high. If I ran stock Django on Chesspark, the database overhead would
cripple the site.
I didn't just poke around in the Django internals trying to fix
supposed problems. I found these problems because I was explicitly
trying to fix certain performance issues. I used a query analyzer to
find them. You won't find them from the django side because it
doesn't actually show you all the queries that happen (it masks all
the overhead queries).

jack.

meta...@gmail.com

unread,
Sep 22, 2008, 10:11:40 AM9/22/08
to Django developers

> It will also leave things vulnerable to having only some of your objects
> updated, some of them deleted, some of them saved with no way to restore
> to a known state. This proposed change is a bit more complicated than
> the patches in #3460, since it's kind of a good thing to preserve
> integrity of operations for things like saving, deleting and updating.
> All of those involve more than one SQL operation in the general case.

I don't object to you wanting more work on #3460. You are probably
right that it isn't perfect.

However, Django also misuses transactions in other cases (exactly the
ones you mention). A transaction does not guarantee consistency of
multiple operations, at least not in the default READ COMMITTED
transaction mode. Django is written assuming that the database
transaction mode is SERIALIZED, which basically never true (someone
would have to explicitly configure postgresql for this, and I'm
guessing no one actually does aside from places like banks).

READ COMMITTED mode means that each statement sees a consistent view
of the database up to and included any transactions that were
committed _while the current one was executing_. This means in a
transaction like:

BEGIN;
SELECT username, password FROM auth_user WHERE id = 1;
UPDATE auth_user SET username = 'foo', password = 'bar' WHERE id = 1;
COMMIT;

will do the wrong thing. Note that this is basically how Django does
all its attribute updates. Let's assume that the Django code that
generated this was

user.password = 'bar'
user.save()

The problem here is that the SELECT statement will read all
attributes, and before UPDATE is executed, another transaction may
have changed the username column. Now the UPDATE actually reverts
such a change because Django does not keep track of which attributes
have changed. (Collin advocated very strongly for this patch in
Portland, but it was also not included in 1.0). So here Django
actually causes data loss due to the race condition, even though
transactions are used.

So please don't assume you're safe just because you are in a
transaction. READ COMMITTED mode clearly does not work as Django
database developers imagined. To me this is a far more serious bug
than #3460, and I'm surprised it is not causing problems for others.
It took us a long time to track this one down.

jack.

Michael Radziej

unread,
Sep 22, 2008, 10:18:47 AM9/22/08
to django-d...@googlegroups.com
On Mon, Sep 22, meta...@gmail.com wrote:

> Django with #3460 is _significantly_ faster when the query load is
> high. If I ran stock Django on Chesspark, the database overhead would
> cripple the site.

I'd really like to see a kind of benchmark that I can try out to check your measurements.
A previous poster found your claims to be not justified.

I'm very curious were the overhead comes from. The database more or less
needs to use transactions internally anyway, even for a select, to give you
a consistent view for a select. It cannot know that you get only one (1)
result. So, is the overhead just from parsing the transaction statements?

Michael

--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company

Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Hansjochen Klenk -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689

meta...@gmail.com

unread,
Sep 22, 2008, 10:21:00 AM9/22/08
to Django developers
> I've done a quick non-scientific test. Here's the code:
> ...
> So the default level is indeed faster but I have to confess I'm a lousy
> tester so I encourage anybody to conduct their own experiment.

I appreciate the test case here. Note that we were in the millions of
queries a day before this became noticeable. The overhead is a
handful of microseconds on an individual call. For the average Django
user, I think the error case that Richard ran into is the more
compelling reason. But certainly performance is also improved.

> Also this observation matches psycopg2 docs[1]:
>
> > `ISOLATION_LEVEL_READ_COMMITTED` This is the default value.  A new
> > transaction is started at the first `.execute()` command on a cursor
> > and at each new `.execute()` after a `.commit()` or a `.rollback()`.
>
> Which conflicts with what 3460 states[2]:
>
> > The difference between autocommit and read committed is that read
> > committed in psycopg2 puts all statements inside a BEGIN/END block
> > (or BEGIN/ROLLBACK or BEGIN/COMMIT).

Ok, I should have been a little more clear here. If all your
statements are non-transactional (say SELECTs in Django), then my
statement is true. If Django does a multiple part statement, it will
be inside a single BEGIN/ROLLBACK or BEGIN/COMMIT pair. However, most
queries from Django are singletons, and singletons are always wrapped
in BEGIN/COMMIT, etc pairs.

Does that make sense now?

jack.

James Bennett

unread,
Sep 22, 2008, 10:34:12 AM9/22/08
to django-d...@googlegroups.com
On Mon, Sep 22, 2008 at 9:11 AM, meta...@gmail.com <meta...@gmail.com> wrote:
> However, Django also misuses transactions in other cases (exactly the
> ones you mention). A transaction does not guarantee consistency of
> multiple operations, at least not in the default READ COMMITTED
> transaction mode. Django is written assuming that the database
> transaction mode is SERIALIZED, which basically never true (someone
> would have to explicitly configure postgresql for this, and I'm
> guessing no one actually does aside from places like banks).

I didn't see anywhere that Malcolm claimed a transaction will
magically make concurrency issues go away all on its own, and I'm a
bit put off by the fact that you're here:

1. Overloading the word "integrity" with multiple meanings and
choosing which one you want to use at a given moment, and
2. Doing a lot of dancing from one issue to another (one moment you're
talking about BEGIN/COMMIT causing overhead, the next you're arguing
about transaction isolation).

Can we figure out what this thread is about, and stick to it, please?


--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Richard Davies

unread,
Sep 22, 2008, 1:13:43 PM9/22/08
to Django developers
> Can we figure out what this thread is about, and stick to it, please?

As the person who originally started the thread ;-)

I'd like to agree a design for a patch which will be accepted into
Django trunk and will enable us to close the #3460 ticket.

The current #3460 patch needs more work. I've proposed a route forward
http://groups.google.com/group/django-developers/msg/48500109ac5e514d
and I'd much appreciate people (especially Malcolm, Jack, Collin, but
others welcome too) replying to that message with their comments.


There is an interesting side-discussion between Ivan/me/Jack/Michael,
about whether the underlying Postgresql change actually improves
performance. It would be great if someone can provide a simple test
case demonstrating that it does. However, that's not the only reason
to want the patch, so we should work towards a final patch anyway, and
can benchmark that (on the test suite, say) once it's ready for check
in.

Cheers,

Richard.

Michael Radziej

unread,
Sep 23, 2008, 5:28:12 AM9/23/08
to django-d...@googlegroups.com
On Mon, Sep 22, Richard Davies wrote:

>
> > Can we figure out what this thread is about, and stick to it, please?
>
> As the person who originally started the thread ;-)
>
> I'd like to agree a design for a patch which will be accepted into
> Django trunk and will enable us to close the #3460 ticket.

There is no way to agree about the "#3460 ticket", because the ticket does
not really state what you want to solve, but how you want to change Django.
It's like going to the dentist and telling him to extract 5 left top (or
however this terminology works). Well ... in the role of the dentist, we're
asking you: Where does it really hurt?

The ticket and the following discussion clames that:

(a) django used the wrong transaction isolation level

(b) it's unefficient that there are so many BEGIN/COMMITs

(c) it is about that postgresql does not allow any actions after an
error before you finish the transaction


Now, please, pick only one pain, and try not to confuse us with too many
stuff at once. We can deal with the other teeth later.

Malcolm Tredinnick

unread,
Sep 23, 2008, 7:01:46 AM9/23/08
to django-d...@googlegroups.com

On Mon, 2008-09-22 at 07:11 -0700, meta...@gmail.com wrote:
[...]

I've snipped the explanation of serialization levels. That's known
stuff. It's also pretty much irrelevant to this discussion, but since
you insist, I've addressed why the current behaviour is also a valid
possibility, below.

You completely missed my point that if there's some integrity or other
database error in the sequence of multiple updates or deletes (and when
Django does updates to multiple tables or deletes, it extracts the id
values first), we can currently roll back to the start of the sequence
and people don't have half-update or half-deleted sequences of objects.
Straight auto-commit behaviour loses that unless there are a lot more
changes to the code than any patch I've seen currently.

> This means in a
> transaction like:
>
> BEGIN;
> SELECT username, password FROM auth_user WHERE id = 1;
> UPDATE auth_user SET username = 'foo', password = 'bar' WHERE id = 1;
> COMMIT;
>
> will do the wrong thing. Note that this is basically how Django does
> all its attribute updates.

You say "wrong", but relating that to how Django's save() works and
saying that therefore loading/saving is wrong is drawing an unwarranted
conclusion. Django's saving works exactly as intended with the current
code: all the models attribute values at the time of saving are stored
into the database. Sure, you'd like the option to only have the changed
values saved, which is something the rest of us are certainly interested
in implementing as well. Again, it probably won't be the default, since
it would change behaviour in ways that are almost impossible to detect
for thousands of users, but as an option it's not a bad idea. I have a
couple of different implementations of save-only-changes code floating
around that I've been thinking about (incorporating the various attempts
in the ticket addressed to this) and I'm sure something will land before
1.1

> Let's assume that the Django code that
> generated this was
>
> user.password = 'bar'
> user.save()
>
> The problem here is that the SELECT statement will read all
> attributes, and before UPDATE is executed, another transaction may
> have changed the username column. Now the UPDATE actually reverts
> such a change because Django does not keep track of which attributes
> have changed.

In other words, the object doing the second write wrote the values it
had. Once you accept that saving a model writes out all the attribute
values, this is pretty easy to understand. I agree we could document
this more clearly, but it's hardly mysterious behaviour (If any one says
"I assumed differently", that doesn't make it mysterious. It means that
making assumptions is usually a bad idea.)

> (Collin advocated very strongly for this patch in
> Portland, but it was also not included in 1.0).

If you look at a calendar, you shouldn't be at all surprised by this.
The feature freeze date was weeks before the Portland sprint. Again,
this is an option for alternative behaviour that you're talking about,
not a no-brainer bug in Django. Acknowledge that there's a second side
to this, please.

> So here Django
> actually causes data loss due to the race condition, even though
> transactions are used.

It's not data loss if the intended mode is that the save() call writes
out everything. That works correctly and reliably under the given
hypothesis.

> So please don't assume you're safe just because you are in a
> transaction.

As mentioned elsewhere in other messages in this thread, you're
overloading the term "safe" here and it's taking the thread way off
track.

All through this conversation you are continually failing to acknowledge
that there is alternate side to these technical issues, as with all
non-trivial changes. At no point is anybody saying that things like
auto-commit shouldn't be possible. But it's not "the wrong serialisation
level" or anything like that. It's one way of working that happens to be
different from what Django makes possible at the moment.

Every single non-trivial decision has both advantages and disadvantages.
Trade-offs are all over the place and to successfully work out what to
do requires a proper examination of both sides.

> READ COMMITTED mode clearly does not work as Django
> database developers imagined.

Really?! This is just getting silly. Your mind-reading qualification is
from which university? I cannot speak for any other developer (since I
sadly don't have such a such a qualification), but read committed works
exactly as I understand it -- no imagination involved, since it's been
pretty well documented for years in PostgreSQL. It can produce
non-repeatable reads but doesn't allow uncommitted reads.

You can foam all you like about how incompetent all the developers are
(and that's really what this is coming down to, since nothing at all
here is really new information), but that simply isn't the case. It's
simply unprofessional, as maintainers and contributors (i.e. anybody
contributing to any thread on django-dev) to go jumping around saying
"it's all wrong, we must change", when a bit of thought shows that that
isn't the case that it's all wrong. The current behaviour does make
sense in a number of situations. Alternate behaviour also makes sense
(and that's never been in doubt), so creating extra possibilities
without harming existing code -- which is why changing the default is a
separate issue and something I'm pretty strongly against -- requires
examining why and how the current behaviour does what it does and what
the consequences for everybody would be if that changed.

Regards,
Malcolm

Richard Davies

unread,
Sep 23, 2008, 7:05:30 AM9/23/08
to Django developers
Michael Radziej wrote:
> There is no way to agree about the "#3460 ticket", because the ticket does
> not really state what you want to solve, but how you want to change Django.
...
> Now, please, pick only one pain

I certainly agree that we've been all over the place on this thread!
And my thinking has changed during the course of the discussion.

I now think that there are several people each feeling different pains
(I was c, I think Jack was originally mostly b), and each believing
that their particular pain would be solved by the same technical
feature for Django. I'll call this feature 'native autocommit' - i.e.
getting a connection from the database backend using the autocommit
isolation level, and then creating explicit transaction blocks only
when needed.

What I'd like to do is step back from the pains for now, and produce a
patch which adds an optional 'native autocommit' to the Django
Postgresql backends. We'll see later which and how many of the pains
it fixes, rather than arguing now. If it's very successful, maybe it
will become the default behaviour for Postgresql and be implemented
for other databases. If not, then at least the people who want it can
simply turn the option on, rather than having to maintain out-of-trunk
patches.

Hopefully this is in the spirit of Malcolm Trediinnick's comment:
> I think it'd be a good idea to make sure we expose the ability to turn on
> auto-commit, but I don't really like making it the default. In any case,
> providing the ability that's can be controlled via, for example, a setting
> is certainly the first step here. That's pretty independent of the whatever
> the default might end up being.

So, to reiterate. I'm thinking in terms of a new optional 'native
autocommit' feature for the Django Postgresql backends. When the user
chooses this option, the database connection which Django receives
from the backend should be in "raw" autocommit mode, not wrapped in a
transaction by the driver/backend, and Django would create explicit
transaction blocks on top of this connection only when needed. I
believe that this feature will require both:
- Changes to the Postgresql backends to optionally return the
autocommit connection
- Changes elsewhere in Django (mostly db/transaction.py, I think),
which notice when an autocommit connection is in use and then create
explicit transaction blocks on top of this connection only when needed

Please can we drop the discussion of pains for now, and just talk
about how this new optional feature should best be implemented? My
proposal is at http://groups.google.com/group/django-developers/msg/48500109ac5e514d
- please read it mentally replacing 'ticket: 3460' with 'feature:
option for native autocommit', then let's discuss!

Cheers,

Richard.

meta...@gmail.com

unread,
Sep 23, 2008, 12:32:09 PM9/23/08
to Django developers
> You completely missed my point that if there's some integrity or other
> database error in the sequence of multiple updates or deletes (and when
> Django does updates to multiple tables or deletes, it extracts the id
> values first), we can currently roll back to the start of the sequence
> and people don't have half-update or half-deleted sequences of objects.
> Straight auto-commit behaviour loses that unless there are a lot more
> changes to the code than any patch I've seen currently.

We're all agreed htat #3460 needs to support explicit DJango
transactions. I never intended otherwise. I always thought I had
added this to the patch, but apparently that didn't happen. I never
advocated removing transactions from Django altogether. I only want
things that aren't explicitly in a transaction to really not be in a
transaction. Richard has taken up the flag here, and I'll let him run
with it.

> You say "wrong", but relating that to how Django's save() works and
> saying that therefore loading/saving is wrong is drawing an unwarranted
> conclusion.

Django's saving can trample data unintentionally, and indeed, we've
seen this in production. Trampling data is an extremely hard to find
problem since it amounts to a race condition. Since this could be
easily avoided in several different ways, I consider this
implementation incorrect and therefor wrong. From an outside
perspective it appears that Django's "mental" model o fa transactional
database is a database in serialied mode. If put in serialized mode,
Django doesn't trample data and is fine.

> Django's saving works exactly as intended with the current
> code: all the models attribute values at the time of saving are stored
> into the database. Sure, you'd like the option to only have the changed
> values saved, which is something the rest of us are certainly interested
> in implementing as well. Again, it probably won't be the default, since
> it would change behaviour in ways that are almost impossible to detect
> for thousands of users, but as an option it's not a bad idea. I have a
> couple of different implementations of save-only-changes code floating
> around that I've been thinking about (incorporating the various attempts
> in the ticket addressed to this) and I'm sure something will land before
> 1.1

I don't understand this mysterious use case for rewriting unchanged
values to the database. It seems to me that you are saying that a
patch to fix this data trampling bug might introduce new bugs, and so
it should be optional. If there is a use case for rewriting back read
values that don't change, please state it. Otherwise, I think making
this an optional fix is silly. I'm completely surprised that anyone
on the Django team would ship a framework that can trample data when
the issue has been known about for over a year. Clearly I am missing
some piece of information.

> In other words, the object doing the second write wrote the values it
> had. Once you accept that saving a model writes out all the attribute
> values, this is pretty easy to understand. I agree we could document
> this more clearly, but it's hardly mysterious behaviour (If any one says
> "I assumed differently", that doesn't make it mysterious. It means that
> making assumptions is usually a bad idea.)

This is not something you can work around. You just have to accept
that data will get trampled and that arbitrary other code can undo any
database changes you make. This isn't really acceptable, especially
with easy solutions available.

> If you look at a calendar, you shouldn't be at all surprised by this.
> The feature freeze date was weeks before the Portland sprint. Again,
> this is an option for alternative behaviour that you're talking about,
> not a no-brainer bug in Django. Acknowledge that there's a second side
> to this, please.

I will acknolwedge that you are claiming there is a second side, but
you have not explicitly stated what that is. I have given pretty
detailed scenarios for how these bugs affect us and how they can be
fixed. Others have done the same and offered similar fixes. All I've
seen so far is hand waving that the current way is fine and that it
doesn't need fixing, but that our fixes may be added as non-default
options at some nebulous later date.

So, without some supporting evidence, it seems to some of us that this
is a clear and simple bug in Django with easy solutions (ours was
dirty tracking, but SELECT FOR UPDATE also works).

> >   So here Django
> > actually causes data loss due to the race condition, even though
> > transactions are used.
>
> It's not data loss if the intended mode is that the save() call writes
> out everything. That works correctly and reliably under the given
> hypothesis.

Since there is no way to detect that you will blow away data this is
data loss. This is not something that the application programmer can
work around except by serializing postgresql or having a global
database lock for all Django application code (and probably any other
database using code). The whole point of transactions RDBMS is to
deal with problems just like this one.

> > So please don't assume you're safe just because you are in a
> > transaction.
>
> As mentioned elsewhere in other messages in this thread, you're
> overloading the term "safe" here and it's taking the thread way off
> track.

You (and James Bennett) keep harping on this semantic game. "You're
overloading the meanings of X or Y!". My position is clear. I do not
consider any framework "safe" which includes a race condition which
can result in blowing away my updates. Django does this. Ergo,
Django is not "safe". Perhaps you want me to call it "data-safe" or
include a glossary of terms at the beginning of all e-mails?

> All through this conversation you are continually failing to acknowledge
> that there is alternate side to these technical issues, as with all
> non-trivial changes. At no point is anybody saying that things like
> auto-commit shouldn't be possible. But it's not "the wrong serialisation
> level" or anything like that. It's one way of working that happens to be
> different from what Django makes possible at the moment.

These are real bugs. Data loss is not trivial. Please state the case
for the alternative side, because clearly I have missed this. So far
this seem to be "we might break something fixing these bugs". Or
maybe it's "some of our users depend on these bugs".

> Every single non-trivial decision has both advantages and disadvantages.
> Trade-offs are all over the place and to successfully work out what to
> do requires a proper examination of both sides.

Agreed. I have clearly stated the advantages of fixing this data loss
issue and issues with auto-commit. I have yet to hear anyone give a
good reason for not having these things. I am not a Django core
developer and fully admit I may not know the disadvantages. Please
enlighten me so we can stop yelling at each other and make the
software better for everyone else.

> You can foam all you like about how incompetent all the developers are
> (and that's really what this is coming down to, since nothing at all
> here is really new information), but that simply isn't the case.

I never said any of you were incompetent. Running a project of this
size (and quality) is hard. You will make mistakes. Man up and take
the criticism when you get it instead of going into uber-defensive
mode. I'm not perfect either and I'm sure you can find stupid bugs
in my software as well.

> The current behaviour does make
> sense in a number of situations.

Again, I ask: such as? Aside from people being dependent on the
different error semantics implicit transactions what use case is there
for not fixing these issues (as opposed to adding them as non-default
options). In what case does someone need the extraneous writes in
Django's save() impmlementation which can undo previous updates?

jack.

meta...@gmail.com

unread,
Sep 23, 2008, 12:41:42 PM9/23/08
to Django developers
> I think that we
> should try to write a patch which a) makes 3460 a configuration option
> as suggested, and b) supports transactions on top of the autocommit
> connection when explicitly requested (3460 currently doesn't, so parts
> of the test suite currently fail when it is applied).

+1 from me. I do hope that this becomes the default behavior in the
future.

> I'd appreciate people's thoughts on the following proposal of how to
> do this. Particularly Malcolm, Jack and Collin since they seem the
> main contributors to date.
>
> 1) Mechanism to make it a configuration option
>
> Various people on the 3460 ticket tracker have suggested using [6064],
> but that isn't yet in trunk, and also wasn't generally agreed on the
> tracker (e.g. does not allow custom Python). So, let's not do that.

I think using custom python shifts the problem to the user, which is
not great. Considering how much debate there has been on this issue,
how can we expect users to understand when this custom python would be
needed? I suspect people will be loathe to change the defaults unless
they know why they should.

> We need two things
> - a way to create the backend in autocommit mode: I suggest a flag
> 'native_autocommit' for DATABASE_OPTIONS
> - a way for the running Django to tell that the backend is in
> autocommit mode: I suggest a flag 'uses_native_autocommit' for
> BaseDatabaseFeatures
>
> In both cases I'm deliberately choosing mechanisms and words so that
> other backends could also later be changed to support an autocommit
> mode.

Seems reasonable to me.

> 2) Transaction support when explicitly requested
>
> The current 3640 patch leaves a Django which does not support any
> transaction behavior other than autocommit (leading to various test
> suite failures).

This was an omission on my part. I made similar patches to Twisted
Python's adbapi which did redo the transaction behavior as well, and I
always thought my patch for #3460 had this. It was not my intention
that Django should abandon transactions, and it seems that this
misunderstanding was the cause of some of the grief.

Regarding your fixing of transactions:

This is exactly why I've been advocating for using autocommit and
doing away with implicit transactions. It's far simpler to understand
and requires less code (ie, transactions only happen when Django makes
them happen, and only one transaction class is needed).

If it must be an option, then there will likely need to be multiple
transaction backends, but it has been a while since I looked at the
implementation details here, and Django 1.0 has changed significantly
anyway (or perhaps in this specific area not?), so I can't offer much
advice here. What you've suggested sounds reasonable.

jack.

Michael Radziej

unread,
Sep 23, 2008, 2:07:39 PM9/23/08
to django-d...@googlegroups.com
On Tue, Sep 23, meta...@gmail.com wrote:

>
> > You completely missed my point that if there's some integrity or other
> > database error in the sequence of multiple updates or deletes (and when
> > Django does updates to multiple tables or deletes, it extracts the id
> > values first), we can currently roll back to the start of the sequence
> > and people don't have half-update or half-deleted sequences of objects.
> > Straight auto-commit behaviour loses that unless there are a lot more
> > changes to the code than any patch I've seen currently.
>
> We're all agreed htat #3460 needs to support explicit DJango
> transactions. I never intended otherwise. I always thought I had
> added this to the patch, but apparently that didn't happen. I never
> advocated removing transactions from Django altogether. I only want
> things that aren't explicitly in a transaction to really not be in a
> transaction. Richard has taken up the flag here, and I'll let him run
> with it.

Yeah, but you propose to make autocommit mode the default mode,
i.e. an incompatible change that can affect existing applications in a very
subtle way that could be very hard for some users to find.

>
> > You say "wrong", but relating that to how Django's save() works and
> > saying that therefore loading/saving is wrong is drawing an unwarranted
> > conclusion.
>
> Django's saving can trample data unintentionally, and indeed, we've
> seen this in production. Trampling data is an extremely hard to find
> problem since it amounts to a race condition. Since this could be
> easily avoided in several different ways, I consider this
> implementation incorrect and therefor wrong. From an outside
> perspective it appears that Django's "mental" model o fa transactional
> database is a database in serialied mode. If put in serialized mode,
> Django doesn't trample data and is fine.

So, is ticket #3460 about the lost update problem, i.e. one user saves data
and another user overwrites it unintentionally? Or, is it about performance?
About transaction handling in general? About postgresql insisting to roll
back a transaction after an error?

If you want to solve the lost update problem, fine. Let's discuss it. But,
please, make up your mind what issue you want to solve. And then let's find
out how to solve it, and don't assume a priori that autocommit transactions
are necessarily the solution (which they clearly are *not* if applied alone,
and not even when save() updates only modified data).

We really need that you Richard agree on an issue before it makes sense to
continue this thread.

Ludvig Ericson

unread,
Sep 23, 2008, 9:27:34 PM9/23/08
to django-d...@googlegroups.com
On Sep 23, 2008, at 18:32, meta...@gmail.com wrote:
> Django's saving can trample data unintentionally, and indeed, we've
> seen this in production. Trampling data is an extremely hard to find
> problem since it amounts to a race condition. Since this could be
> easily avoided in several different ways, I consider this
> implementation incorrect and therefor wrong. From an outside
> perspective it appears that Django's "mental" model o fa transactional
> database is a database in serialied mode. If put in serialized mode,
> Django doesn't trample data and is fine.
> [repeat message ten times.]

The benefit is having a known state of single objects in the database.

Not a great benefit, I agree - but it's not wrong, or even unexpected.
You tell a model to save itself. It does so.

Even if you perceive this as misbehavior, wrong, ill-written and
whatnot, keep it to yourself. From an objective point of view, it is an
implementation solving the problem with storage of objects in a
database. It might not do so in a way that accommodates your needs, nor
mine to be quite frank - but I accept this as an implementation.

It'd seem the discussion has taken a turn to semantics, and thus this
message.

Richard Davies

unread,
Sep 29, 2008, 6:25:14 AM9/29/08
to Django developers
I'm going to make one final post in this thread, since we didn't quite
reach a conclusion.

Malcolm Tredinnick wrote:
> As I mentioned to Collin at the code sprint in Portland, I think it'd be
> a good idea to make sure we expose the ability to turn on auto-commit,
> but I don't really like making it the default. In any case, providing
> the ability that's can be controlled via, for example, a setting is
> certainly the first step here. That's pretty independent of the whatever
> the default might end up being. That's really the improvement needed to
> #3460 at the moment -- separating adding functionality from changing the
> default.

Richard Davies wrote:
> I think that we should try to write a patch which a) makes 'native
> autocommit' a configuration option as suggested, and b) supports
> transactions on top of the 'native autocommit' connection when explicitly
> requested (the current 3460 patch doesn't, so parts of the test suite
> currently fail when it is applied).

Metajack wrote:
> +1 from me

So, do we agree that it makes sense for me to take the current 3460
patch and complete it in the manner which I described in
http://groups.google.com/group/django-developers/msg/48500109ac5e514d
? I'd particularly like Malcolm to confirm if my description there
matches his discussion with Collin.

If we're agreed, then I'll take over ownership of that ticket from
Collin (unless you want to do it, Collin?), and I'll add 'optional
native autocommit' to the Django 1.1 features list.

Cheers,

Richard.

----

One technical point:

When I started writing in this thread, I hadn't appreciated that
rollback() and rollback_unless_managed() are used by Django outside
explicit user-started transaction management in order to attempt to
keep things consistent after failures. How do people think that we
should handle these when 'native autocommit' is on? The options seem
to be:
a) Make them no-ops and accept the failures
b) Work out which points they roll back to, and add explicit 'BEGINs'
at those points

Malcolm Tredinnick

unread,
Sep 29, 2008, 8:33:41 PM9/29/08
to django-d...@googlegroups.com

On Mon, 2008-09-29 at 03:25 -0700, Richard Davies wrote:
[...]
> So, do we agree that it makes sense for me to take the current 3460
> patch and complete it in the manner which I described in
> http://groups.google.com/group/django-developers/msg/48500109ac5e514d
> ?

I'd be amazed (and fairly disappointed) if two copies of the code in
transaction.py was required. I'm fairly sure it can all be done by
examining the current transaction handling state inside existing
methods. But give it a shot however you like. As usual with patches, no
guarantees apply beforehand as to whether they'll be committed or not
because we often/usually have to look at an implementation before
knowing if it's going to work long-term. However, if I was doing this
(and if no reasonable alternative pops up in the next couple of months I
probably will), I'd be trying very hard to avoid any code duplication at
all. I realise it sounds fuzzy to say "my intuition says", but it looks
like it should be possible to do checks for whether we're in auto-commit
mode or not and have all the existing calls just behave normally --
including allowing manual transaction starts regardless of the mode.

The implementation here will probably turn out to be easy. The design
will be the hard bit.

Regards,
Malcolm

Reply all
Reply to author
Forward
0 new messages