#13870: Correctly handling database isolation (in PostgreSQL)

719 views
Skip to first unread message

Patryk Zawadzki

unread,
Sep 5, 2010, 7:52:17 AM9/5/10
to django-d...@googlegroups.com
Some time ago I've reported ticket #13870¹. Here's a brief explanation
of the problem:

PostgreSQL has a concept called "isolation levels". These are various
types of meta-transactions.

1. There's level 0 which means "off" and results in ghost reads
(SELECT returning data not yet commited).
2. The most useful level is 1 which prevents ghost-reading uncommited
data yet does not guarantee atomicity (it's possible for two identical
SELECTs to return different results).
3. There's also a third "serializable" level that guarantees full
atomicity but it's not used by Django.

In psycopg(2) the default isolation level is 0 or "no isolation at
all". If enabled, a meta-transaction is designed to work as follows:

1. The first query you execute results in an implicit "BEGIN" being
called internally by PostgreSQL
2. Any consecutive queries are executed in the same transaction
3. You terminate the meta-transaction by calling connection.commit()
or connection.rollback()
4. The first query you execute results in an implicit "BEGIN" being
called internally by PostgreSQL
5. Any consecutive queries are executed in the same transaction
6. You terminate the meta-transaction by calling connection.commit()
or connection.rollback()

...and so on.

The problem is that this is not true for Django. The backend
initializes the meta-transaction at connection time and never bothers
to terminate it. As the transaction is merely a ghost-read-preventing
one, it does not stop data from being saved in the database, but it
does result in a parmanently open transaction. This both wastes
resources on the server and prevents any other client from touching
the database structure (any ALTER or VACUUM command will hang waiting
for the remaining transactions to finish).

The ticket contains a naïve workaround but as described there, I feel
a proper solution would look like this:

1. Introduce IsolationMiddleware that does something like this (pseudocode):

class IsolationMiddleware(object):
def process_request(self, request):
for db in database_connections:
db.enter_isolation_block()
def process_response(self, request, response):
for db in database_connections:
db.leave_isolation_block()

2. Make the middleware default and describe it in the migration guide.

3. Introduce no-op enter_isolation_block() and leave_isolation_block()
that just "pass"

4. Override both methods in the pgsql backends (pseudocode):

def enter_isolation_block(self):
if self.isolation_level:
self.connection.set_isolation_level(self.isolation_level)

def leave_isolation_block(self):
if self.isolation_level and self.connection.get_transaction_status() == \
psycopg2.extensions.TRANSACTION_STATUS_INTRANS:
if self.commited:
self.connection.commit()
else:
self.connection.rollback()

5. Remove code that sets isolation level in the connection initialization code.

¹ http://code.djangoproject.com/ticket/13870

--
Patryk Zawadzki

Kirit Sælensminde (kayess)

unread,
Sep 6, 2010, 5:20:29 AM9/6/10
to Django developers
Strange. We use Postgres and don't see any problem with this. We do
encounter complications occasionally with the lack of composition of
Django transaction handling, but other than that find the transaction
handling adequate.

Are you actually using transactions in your code? You need to be. The
transaction middleware does a good enough job for most things, but for
external processing you will need to carefully design your transaction
handling. With the transaction middleware we see fully isolated
updates until the transaction is committed using psycopg2.


Kirit

On Sep 5, 6:52 pm, Patryk Zawadzki <pat...@pld-linux.org> wrote:
> The problem is that this is not true for Django. The backend
> initializes the meta-transaction at connection time and never bothers
> to terminate it. As the transaction is merely a ghost-read-preventing
> one, it does not stop data from being saved in the database, but it
> does result in a parmanently open transaction. This both wastes
> resources on the server and prevents any other client from touching
> the database structure (any ALTER or VACUUM command will hang waiting
> for the remaining transactions to finish).
>
> The ticket contains a naïve workaround but as described there, I feel
> a proper solution would look like this:
>
> 1. Introduce IsolationMiddleware that does something like this (pseudocode):

[snip]

Patryk Zawadzki

unread,
Sep 6, 2010, 6:20:39 AM9/6/10
to django-d...@googlegroups.com
On Mon, Sep 6, 2010 at 11:20 AM, Kirit Sælensminde (kayess)
<kirit.sae...@gmail.com> wrote:
> Strange. We use Postgres and don't see any problem with this. We do
> encounter complications occasionally with the lack of composition of
> Django transaction handling, but other than that find the transaction
> handling adequate.
>
> Are you actually using transactions in your code? You need to be. The
> transaction middleware does a good enough job for most things, but for
> external processing you will need to carefully design your transaction
> handling. With the transaction middleware we see fully isolated
> updates until the transaction is committed using psycopg2.

I'm not sure you understand the problem at all. The problem is not
lack of the isolation. The problem is permanent isolation. The
isolating transaction is never terminated, thus remaining alive for
indefinite amounts of time.

You can check this by creating a fresh project using psycopg2 and
creating a model. Then write a view that queries the database and
invoke it. Now, leaving the server running, open up your database
shell. The pg_stat_activity table will report "<IDLE> in transaction"
for hours.

BTW: This has nothing to do with Django transactions or TransactionMiddleware.

--
Patryk Zawadzki

bur...@gmail.com

unread,
Sep 6, 2010, 7:35:44 AM9/6/10
to django-d...@googlegroups.com
Hi Patryk,

This was done to make Django faster, so it doesn't create connection
to database every time new SQL is executed.
Do you mean you wanted to set up timeouts for idle database connections?
I guess, nobody asked this before.

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>

--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com

Patryk Zawadzki

unread,
Sep 6, 2010, 7:42:15 AM9/6/10
to django-d...@googlegroups.com
On Mon, Sep 6, 2010 at 1:35 PM, bur...@gmail.com <bur...@gmail.com> wrote:
> Hi Patryk,
>
> This was done to make Django faster, so it doesn't create connection
> to database every time new SQL is executed.

What was done? Commiting the isolating transaction of a connection
does not terminate it.

> Do you mean you wanted to set up timeouts for idle database connections?
> I guess, nobody asked this before.

No, I think I stated quite clearly what my proposal was.

http://initd.org/psycopg/docs/extensions.html#isolation-level-constants

--
Patryk Zawadzki

Robert Gravsjö

unread,
Sep 6, 2010, 8:00:32 AM9/6/10
to django-d...@googlegroups.com

Patryk Zawadzki skrev 2010-09-06 12.20:
> On Mon, Sep 6, 2010 at 11:20 AM, Kirit Sælensminde (kayess)
> <kirit.sae...@gmail.com> wrote:

--- snip ---

>
> You can check this by creating a fresh project using psycopg2 and
> creating a model. Then write a view that queries the database and
> invoke it. Now, leaving the server running, open up your database
> shell. The pg_stat_activity table will report "<IDLE> in transaction"
> for hours.

I'm not sure what you think you are doing but if you end up with "<IDLE>
in transaction" that means you have not commited your transactions.

For instance, open two connections with psql and run BEGIN in one and
then take a look at pg_stat_activity you will have that connection
marked as idle in transaction.

I believe you are confused about isolation levels. They control the
visibility of transactions.

Regards,
roppert

Patryk Zawadzki

unread,
Sep 6, 2010, 9:29:08 AM9/6/10
to django-d...@googlegroups.com
On Mon, Sep 6, 2010 at 2:00 PM, Robert Gravsjö <rob...@blogg.se> wrote:
> I'm not sure what you think you are doing but if you end up with "<IDLE>
>  in transaction" that means you have not commited your transactions.

See below.

> For instance, open two connections with psql and run BEGIN in one and
> then take a look at pg_stat_activity you will have that connection
> marked as idle in transaction.
>
> I believe you are confused about isolation levels. They control the
> visibility of transactions.

Maybe I am indeed confused but quoting the psycopg documentation¹:

"psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED:

This is the default value. A new transaction is started at the first
execute() command on a cursor and at each new execute() after a
commit() or a rollback(). The transaction runs in the PostgreSQL READ
COMMITTED isolation level."

The isolating transaction keeps going on until you either (1) commit,
(2) rollback or (3) disconnect. Django only commits/rollbacks the
transactions it explicitly starts, it does not care about the
implicitly started isolating transaction. That's what results in
"<IDLE> in transaction" and I can reproduce it with a two-line view
that does a simple SELECT with no transaction middleware involved.

The problem only exists when Django sets isolation level to 1, if you
use the deprecated "autocommit" setting, you will not be affected.

¹ http://initd.org/psycopg/docs/extensions.html#isolation-level-constants

--
Patryk Zawadzki

Robert Gravsjö

unread,
Sep 6, 2010, 10:47:55 AM9/6/10
to django-d...@googlegroups.com

Patryk Zawadzki skrev 2010-09-06 15.29:
> On Mon, Sep 6, 2010 at 2:00 PM, Robert Gravsjö<rob...@blogg.se> wrote:
>> I'm not sure what you think you are doing but if you end up with "<IDLE>
>> in transaction" that means you have not commited your transactions.
>
> See below.
>
>> For instance, open two connections with psql and run BEGIN in one and
>> then take a look at pg_stat_activity you will have that connection
>> marked as idle in transaction.
>>
>> I believe you are confused about isolation levels. They control the
>> visibility of transactions.
>
> Maybe I am indeed confused but quoting the psycopg documentation¹:
>
> "psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED:
>
> This is the default value. A new transaction is started at the first
> execute() command on a cursor and at each new execute() after a
> commit() or a rollback(). The transaction runs in the PostgreSQL READ
> COMMITTED isolation level."
>
> The isolating transaction keeps going on until you either (1) commit,
> (2) rollback or (3) disconnect. Django only commits/rollbacks the
> transactions it explicitly starts, it does not care about the
> implicitly started isolating transaction. That's what results in
> "<IDLE> in transaction" and I can reproduce it with a two-line view
> that does a simple SELECT with no transaction middleware involved.

Can you please show me the code you're running to reproduce this?

Regards,
roppert

Patryk Zawadzki

unread,
Sep 6, 2010, 1:12:52 PM9/6/10
to django-d...@googlegroups.com
On Mon, Sep 6, 2010 at 4:47 PM, Robert Gravsjö <rob...@blogg.se> wrote:
> Patryk Zawadzki skrev 2010-09-06 15.29:
>> The isolating transaction keeps going on until you either (1) commit,
>> (2) rollback or (3) disconnect. Django only commits/rollbacks the
>> transactions it explicitly starts, it does not care about the
>> implicitly started isolating transaction. That's what results in
>> "<IDLE>  in transaction" and I can reproduce it with a two-line view
>> that does a simple SELECT with no transaction middleware involved.
> Can you please show me the code you're running to reproduce this?

Right, I misremembered the original problem. I've now found the
testing environment.

The problem is not with regular views but with Celery tasks,
long-running management commands such as daemons and any other place
where you access the ORM from outside of the usual
request→dispatcher→view→response flow. These cases all end up with an
isolating transaction spanning their whole life span. In case of
daemons it results in permanently blocking the database structure (for
example causing "VACUUM FULL" to hang).

It would be more useful if you could explicitly
enter_isolation_block() and leave_isolation_block() as needed
(currently there is no way to commit the isolating transaction other
than doing a raw SQL query or accessing psycopg internals).

Sorry about the confusion, it's not really related to views.

--
Patryk Zawadzki

Jordan

unread,
Sep 6, 2010, 1:09:07 PM9/6/10
to django-d...@googlegroups.com
On Mon, 2010-09-06 at 16:47 +0200, Robert Gravsjö wrote:
> Can you please show me the code you're running to reproduce this?
>
> Regards,
> roppert

I've experienced what Patryk is describing here. It seems that the
Django ORM, when not explicitly in a transaction, doesn't commit or
rollback after each query.

You can reproduce this simply in the dbshell. Go into your favorite
project and try this (replacing `myproject`, `myapp`, and `MyModel`
appropriately):

from myproject.myapp import models
models.MyModel.objects.all()[:1]

Now, in a superuser database connection, check the results of "select *
from pg_stat_activity." You should see the database connection from the
dbshell stuck in the "<IDLE> in transaction" state.

Regards,
Jordan

signature.asc

Gabriel Hurley

unread,
Sep 6, 2010, 10:25:14 PM9/6/10
to Django developers
I ran into this issue with daemons/scheduled jobs and "idle in
transaction" in Django quite a while ago. I have a nice writeup of the
problem and solution on this StackOverflow question if anybody's
interested:

http://stackoverflow.com/questions/1303654/threaded-django-task-doesnt-automatically-handle-transactions-or-db-connections

Ultimately it's a matter of being aware of what Django will, and will
not, do for you. The lack of a request/response cycle from the daemon
means that the usual connection-closing mechanism
[signals.request_finished.connect(close_connection)] never gets
triggered. Being a *web framework* it seems reasonable to me that it's
built around requests/responses...

- Gabriel
>  signature.asc
> < 1KViewDownload

Thomas Guettler

unread,
Sep 7, 2010, 2:29:46 AM9/7/10
to django-d...@googlegroups.com

Patryk Zawadzki wrote:
> On Mon, Sep 6, 2010 at 4:47 PM, Robert Gravsjö <rob...@blogg.se> wrote:
>> Patryk Zawadzki skrev 2010-09-06 15.29:
>>> The isolating transaction keeps going on until you either (1) commit,
>>> (2) rollback or (3) disconnect. Django only commits/rollbacks the
>>> transactions it explicitly starts, it does not care about the
>>> implicitly started isolating transaction. That's what results in
>>> "<IDLE> in transaction" and I can reproduce it with a two-line view
>>> that does a simple SELECT with no transaction middleware involved.
>> Can you please show me the code you're running to reproduce this?
>
> Right, I misremembered the original problem. I've now found the
> testing environment.
>
> The problem is not with regular views but with Celery tasks,
> long-running management commands such as daemons and any other place
> where you access the ORM from outside of the usual

> request→dispatcher→view→response flow....

I use the TransactionMiddleware for requests, and
scripts which are started from the shell use the commit_on_success decorator.
Utility methods that are used in both ways (shell and request) don't use
transaction handling methods.

Long running tasks are an exception, here I use something like this:

for i, .. in enumerate(...):
if i%1000==0:
commit()

If you have a daemon that lives forever, I would do it like this (untested
if this does not leave an idle transaction open):
The place where the daemon hangs around if nothing can be done lives must
not use the ORM. If there is something to be done, the daemon calls methods
that use the commit_on_success decorators.

Thomas


--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

Patryk Zawadzki

unread,
Sep 7, 2010, 2:49:07 AM9/7/10
to django-d...@googlegroups.com
On Tue, Sep 7, 2010 at 8:29 AM, Thomas Guettler <h...@tbz-pariv.de> wrote:
> If you have a daemon that lives forever, I would do it like this (untested
> if this does not leave an idle transaction open):
> The place where the daemon hangs around if nothing can be done lives must
> not use the ORM. If there is something to be done, the daemon calls methods
> that use the commit_on_success decorators.

Unfortunately you don't always get to choose what to call with what
decorators. For example we have a DBus daemon that acts as an RPC
server. Creating a separate function for each select is not really
practical :)

--
Patryk Zawadzki

David De La Harpe Golden

unread,
Sep 7, 2010, 8:57:41 AM9/7/10
to django-d...@googlegroups.com
On 06/09/10 14:29, Patryk Zawadzki wrote:

> The problem only exists when Django sets isolation level to 1, if you
> use the deprecated "autocommit" setting, you will not be affected.
>

Uh. Is this deprecated? At what layer? Since when? I've been using it*
happily for a while (modulo that one known "make sure to read once
before writing" issue with django 1.1), was not aware of any deprecation
of it?

* i.e. django's postgresql_psycopg2 autocommit -> True setting which
amounts to setting isolation level 0 => autocommit at the psycopg2 level
(you can still enter transaction management when you want to).
http://docs.djangoproject.com/en/dev/ref/databases/#autocommit-mode

Patryk Zawadzki

unread,
Sep 7, 2010, 9:05:08 AM9/7/10
to django-d...@googlegroups.com
On Tue, Sep 7, 2010 at 2:57 PM, David De La Harpe Golden
<david.delah...@ichec.ie> wrote:
> On 06/09/10 14:29, Patryk Zawadzki wrote:
>> The problem only exists when Django sets isolation level to 1, if you
>> use the deprecated "autocommit" setting, you will not be affected.
> Uh. Is this deprecated? At what layer? Since when?  I've been using it*
> happily for a while (modulo that one known "make sure to read once
> before writing" issue with django 1.1), was not aware of any deprecation
> of it?

Not deprecated in Django sense. I don't think it's going away any time
soon. Just that it's been discouraged for a while at the
psycopg/database level.

--
Patryk Zawadzki

Patryk Zawadzki

unread,
Sep 7, 2010, 10:47:33 AM9/7/10
to django-d...@googlegroups.com
On Mon, Sep 6, 2010 at 7:12 PM, Patryk Zawadzki <pat...@pld-linux.org> wrote:
> It would be more useful if you could explicitly
> enter_isolation_block() and leave_isolation_block() as needed
> (currently there is no way to commit the isolating transaction other
> than doing a raw SQL query or accessing psycopg internals).

Another usecase I just came along:

Say I have to generate some unique string for the database. Something like that:

potential = base = slugify(obj.bar)
suffix = 0
while True:
if not Baz.objects.filter(foo=potential).exists():
obj.foo = potential
obj.save()
break
suffix += 1
potential = '%s-%s' % (base, suffix)

Except it's possible that another process or thread commits an
identical object right between the call to exists() and the save().
What I'd really want to do is something closer to this:

potential = base = slugify(obj.bar)
suffix = 0
found = False
while not found:
enter_isolation_block(ISOLATION_SERIALIZABLE)
if not Baz.objects.filter(foo=potential).exists():
obj.foo = potential
obj.save()
found = True
leave_isolation_block()
suffix += 1
potential = '%s-%s' % (base, suffix)

Of course with new python versions we can get the extra sugar coating:

with db.isolation(ISOLATION_SERIALIZABLE):
# ...

--
Patryk Zawadzki

Robert Gravsjö

unread,
Sep 7, 2010, 11:16:14 AM9/7/10
to django-d...@googlegroups.com

Patryk Zawadzki skrev 2010-09-07 16.47:
> On Mon, Sep 6, 2010 at 7:12 PM, Patryk Zawadzki<pat...@pld-linux.org> wrote:
>> It would be more useful if you could explicitly
>> enter_isolation_block() and leave_isolation_block() as needed
>> (currently there is no way to commit the isolating transaction other
>> than doing a raw SQL query or accessing psycopg internals).
>
> Another usecase I just came along:

I don't think this is the appropriate forum for this discussion.

>
> Say I have to generate some unique string for the database. Something like that:

Then you either generate some form of UUID for that or let the database
handle it in a transaction safe manner (like using SERIAL in the case of
PostgreSQL).

You don't want to serialize access to the database in a multi-user
application.

Regards,
roppert

David De La Harpe Golden

unread,
Sep 7, 2010, 12:42:54 PM9/7/10
to django-d...@googlegroups.com
On 07/09/10 16:16, Robert Gravsj� wrote:


>> with db.isolation(ISOLATION_SERIALIZABLE):
>> # ...
>>
>
> You don't want to serialize access to the database in a multi-user
> application.

Just in case: serializable transaction != serializing access - among
serializable level transactions, the property of /looking like/ they
occurred serially in some consistent order is supposed to be maintained.
Maintaining that property may well result in quite a lot of actual
serialization and transactions needing retries I suppose, depending on
sophistication of the implementation.

Anyway. Postgresql's "serializable" isn't "complete" presently, like
oracle it's actually "snapshot" isolation:
http://www.postgresql.org/docs/9.0/static/transaction-iso.html#MVCC-SERIALIZABILITY
http://wiki.postgresql.org/wiki/Serializable
http://en.wikipedia.org/wiki/Isolation_(database_systems)#SERIALIZABLE
http://en.wikipedia.org/wiki/Snapshot_isolation

Django doesn't currently have support for setting it AFAICS (and there's
currently an assert restricting to level 0/1 in _set_isolation_level()),
but maybe it would be sensible to add some support (not necessarily very
fine-grained, perhaps just a django databases OPTIONS = {'isolation':
'snapshot'} a bit like 'autocommit' that means to use psycopg2 level
2|(0/2) instead of 1|(0/1) [autocommit off|on] for that database).

'snapshot' just being pedantic. 'serializable' would be consistent with
postgresql/oracle usage of course.


David De La Harpe Golden

unread,
Sep 7, 2010, 12:54:41 PM9/7/10
to django-d...@googlegroups.com
On 07/09/10 14:05, Patryk Zawadzki wrote:

> Not deprecated in Django sense. I don't think it's going away any time
> soon. Just that it's been discouraged for a while at the
> psycopg/database level.

Well, maybe "discouraged" is fair, but it does exist and you have the
option (and if you use TransactionMiddleware with it on you can still
get isolation around views, with nothing stopping you entering
transaction management outside views either...).

Also as Gabriel H. mentioned, there's also calling connection.close() to
think about, for celery specifically perhaps you could connect a handler
for celery signal task_postrun [1] to do so for you, a bit like as is
done for the web case.

So. There are certainly other definite annoyances with django
transaction handling (what Kirit S. described as lack of composition,
see also [2]) that might merit some changes, but I'm not sure there's
immediate need for django-level changes here specifically.

[1]
http://ask.github.com/celery/reference/celery.signals.html#celery.signals.task_postrun

[2] https://docs.google.com/View?id=dgxrkmrk_2hmk4fmhg

Reply all
Reply to author
Forward
0 new messages