PostgreSQL has a concept called "isolation levels". These are various
types of meta-transactions.
1. There's level 0 which means "off" and results in ghost reads
(SELECT returning data not yet commited).
2. The most useful level is 1 which prevents ghost-reading uncommited
data yet does not guarantee atomicity (it's possible for two identical
SELECTs to return different results).
3. There's also a third "serializable" level that guarantees full
atomicity but it's not used by Django.
In psycopg(2) the default isolation level is 0 or "no isolation at
all". If enabled, a meta-transaction is designed to work as follows:
1. The first query you execute results in an implicit "BEGIN" being
called internally by PostgreSQL
2. Any consecutive queries are executed in the same transaction
3. You terminate the meta-transaction by calling connection.commit()
or connection.rollback()
4. The first query you execute results in an implicit "BEGIN" being
called internally by PostgreSQL
5. Any consecutive queries are executed in the same transaction
6. You terminate the meta-transaction by calling connection.commit()
or connection.rollback()
...and so on.
The problem is that this is not true for Django. The backend
initializes the meta-transaction at connection time and never bothers
to terminate it. As the transaction is merely a ghost-read-preventing
one, it does not stop data from being saved in the database, but it
does result in a parmanently open transaction. This both wastes
resources on the server and prevents any other client from touching
the database structure (any ALTER or VACUUM command will hang waiting
for the remaining transactions to finish).
The ticket contains a naïve workaround but as described there, I feel
a proper solution would look like this:
1. Introduce IsolationMiddleware that does something like this (pseudocode):
class IsolationMiddleware(object):
def process_request(self, request):
for db in database_connections:
db.enter_isolation_block()
def process_response(self, request, response):
for db in database_connections:
db.leave_isolation_block()
2. Make the middleware default and describe it in the migration guide.
3. Introduce no-op enter_isolation_block() and leave_isolation_block()
that just "pass"
4. Override both methods in the pgsql backends (pseudocode):
def enter_isolation_block(self):
if self.isolation_level:
self.connection.set_isolation_level(self.isolation_level)
def leave_isolation_block(self):
if self.isolation_level and self.connection.get_transaction_status() == \
psycopg2.extensions.TRANSACTION_STATUS_INTRANS:
if self.commited:
self.connection.commit()
else:
self.connection.rollback()
5. Remove code that sets isolation level in the connection initialization code.
¹ http://code.djangoproject.com/ticket/13870
--
Patryk Zawadzki
I'm not sure you understand the problem at all. The problem is not
lack of the isolation. The problem is permanent isolation. The
isolating transaction is never terminated, thus remaining alive for
indefinite amounts of time.
You can check this by creating a fresh project using psycopg2 and
creating a model. Then write a view that queries the database and
invoke it. Now, leaving the server running, open up your database
shell. The pg_stat_activity table will report "<IDLE> in transaction"
for hours.
BTW: This has nothing to do with Django transactions or TransactionMiddleware.
--
Patryk Zawadzki
This was done to make Django faster, so it doesn't create connection
to database every time new SQL is executed.
Do you mean you wanted to set up timeouts for idle database connections?
I guess, nobody asked this before.
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>
>
--
Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
MSN: bu...@live.com
What was done? Commiting the isolating transaction of a connection
does not terminate it.
> Do you mean you wanted to set up timeouts for idle database connections?
> I guess, nobody asked this before.
No, I think I stated quite clearly what my proposal was.
http://initd.org/psycopg/docs/extensions.html#isolation-level-constants
--
Patryk Zawadzki
Patryk Zawadzki skrev 2010-09-06 12.20:
> On Mon, Sep 6, 2010 at 11:20 AM, Kirit Sælensminde (kayess)
> <kirit.sae...@gmail.com> wrote:
--- snip ---
>
> You can check this by creating a fresh project using psycopg2 and
> creating a model. Then write a view that queries the database and
> invoke it. Now, leaving the server running, open up your database
> shell. The pg_stat_activity table will report "<IDLE> in transaction"
> for hours.
I'm not sure what you think you are doing but if you end up with "<IDLE>
in transaction" that means you have not commited your transactions.
For instance, open two connections with psql and run BEGIN in one and
then take a look at pg_stat_activity you will have that connection
marked as idle in transaction.
I believe you are confused about isolation levels. They control the
visibility of transactions.
Regards,
roppert
See below.
> For instance, open two connections with psql and run BEGIN in one and
> then take a look at pg_stat_activity you will have that connection
> marked as idle in transaction.
>
> I believe you are confused about isolation levels. They control the
> visibility of transactions.
Maybe I am indeed confused but quoting the psycopg documentation¹:
"psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED:
This is the default value. A new transaction is started at the first
execute() command on a cursor and at each new execute() after a
commit() or a rollback(). The transaction runs in the PostgreSQL READ
COMMITTED isolation level."
The isolating transaction keeps going on until you either (1) commit,
(2) rollback or (3) disconnect. Django only commits/rollbacks the
transactions it explicitly starts, it does not care about the
implicitly started isolating transaction. That's what results in
"<IDLE> in transaction" and I can reproduce it with a two-line view
that does a simple SELECT with no transaction middleware involved.
The problem only exists when Django sets isolation level to 1, if you
use the deprecated "autocommit" setting, you will not be affected.
¹ http://initd.org/psycopg/docs/extensions.html#isolation-level-constants
--
Patryk Zawadzki
Patryk Zawadzki skrev 2010-09-06 15.29:
> On Mon, Sep 6, 2010 at 2:00 PM, Robert Gravsjö<rob...@blogg.se> wrote:
>> I'm not sure what you think you are doing but if you end up with "<IDLE>
>> in transaction" that means you have not commited your transactions.
>
> See below.
>
>> For instance, open two connections with psql and run BEGIN in one and
>> then take a look at pg_stat_activity you will have that connection
>> marked as idle in transaction.
>>
>> I believe you are confused about isolation levels. They control the
>> visibility of transactions.
>
> Maybe I am indeed confused but quoting the psycopg documentation¹:
>
> "psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED:
>
> This is the default value. A new transaction is started at the first
> execute() command on a cursor and at each new execute() after a
> commit() or a rollback(). The transaction runs in the PostgreSQL READ
> COMMITTED isolation level."
>
> The isolating transaction keeps going on until you either (1) commit,
> (2) rollback or (3) disconnect. Django only commits/rollbacks the
> transactions it explicitly starts, it does not care about the
> implicitly started isolating transaction. That's what results in
> "<IDLE> in transaction" and I can reproduce it with a two-line view
> that does a simple SELECT with no transaction middleware involved.
Can you please show me the code you're running to reproduce this?
Regards,
roppert
Right, I misremembered the original problem. I've now found the
testing environment.
The problem is not with regular views but with Celery tasks,
long-running management commands such as daemons and any other place
where you access the ORM from outside of the usual
request→dispatcher→view→response flow. These cases all end up with an
isolating transaction spanning their whole life span. In case of
daemons it results in permanently blocking the database structure (for
example causing "VACUUM FULL" to hang).
It would be more useful if you could explicitly
enter_isolation_block() and leave_isolation_block() as needed
(currently there is no way to commit the isolating transaction other
than doing a raw SQL query or accessing psycopg internals).
Sorry about the confusion, it's not really related to views.
--
Patryk Zawadzki
I've experienced what Patryk is describing here. It seems that the
Django ORM, when not explicitly in a transaction, doesn't commit or
rollback after each query.
You can reproduce this simply in the dbshell. Go into your favorite
project and try this (replacing `myproject`, `myapp`, and `MyModel`
appropriately):
from myproject.myapp import models
models.MyModel.objects.all()[:1]
Now, in a superuser database connection, check the results of "select *
from pg_stat_activity." You should see the database connection from the
dbshell stuck in the "<IDLE> in transaction" state.
Regards,
Jordan
Patryk Zawadzki wrote:
> On Mon, Sep 6, 2010 at 4:47 PM, Robert Gravsjö <rob...@blogg.se> wrote:
>> Patryk Zawadzki skrev 2010-09-06 15.29:
>>> The isolating transaction keeps going on until you either (1) commit,
>>> (2) rollback or (3) disconnect. Django only commits/rollbacks the
>>> transactions it explicitly starts, it does not care about the
>>> implicitly started isolating transaction. That's what results in
>>> "<IDLE> in transaction" and I can reproduce it with a two-line view
>>> that does a simple SELECT with no transaction middleware involved.
>> Can you please show me the code you're running to reproduce this?
>
> Right, I misremembered the original problem. I've now found the
> testing environment.
>
> The problem is not with regular views but with Celery tasks,
> long-running management commands such as daemons and any other place
> where you access the ORM from outside of the usual
> request→dispatcher→view→response flow....
I use the TransactionMiddleware for requests, and
scripts which are started from the shell use the commit_on_success decorator.
Utility methods that are used in both ways (shell and request) don't use
transaction handling methods.
Long running tasks are an exception, here I use something like this:
for i, .. in enumerate(...):
if i%1000==0:
commit()
If you have a daemon that lives forever, I would do it like this (untested
if this does not leave an idle transaction open):
The place where the daemon hangs around if nothing can be done lives must
not use the ORM. If there is something to be done, the daemon calls methods
that use the commit_on_success decorators.
Thomas
--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de
Unfortunately you don't always get to choose what to call with what
decorators. For example we have a DBus daemon that acts as an RPC
server. Creating a separate function for each select is not really
practical :)
--
Patryk Zawadzki
> The problem only exists when Django sets isolation level to 1, if you
> use the deprecated "autocommit" setting, you will not be affected.
>
Uh. Is this deprecated? At what layer? Since when? I've been using it*
happily for a while (modulo that one known "make sure to read once
before writing" issue with django 1.1), was not aware of any deprecation
of it?
* i.e. django's postgresql_psycopg2 autocommit -> True setting which
amounts to setting isolation level 0 => autocommit at the psycopg2 level
(you can still enter transaction management when you want to).
http://docs.djangoproject.com/en/dev/ref/databases/#autocommit-mode
Not deprecated in Django sense. I don't think it's going away any time
soon. Just that it's been discouraged for a while at the
psycopg/database level.
--
Patryk Zawadzki
Another usecase I just came along:
Say I have to generate some unique string for the database. Something like that:
potential = base = slugify(obj.bar)
suffix = 0
while True:
if not Baz.objects.filter(foo=potential).exists():
obj.foo = potential
obj.save()
break
suffix += 1
potential = '%s-%s' % (base, suffix)
Except it's possible that another process or thread commits an
identical object right between the call to exists() and the save().
What I'd really want to do is something closer to this:
potential = base = slugify(obj.bar)
suffix = 0
found = False
while not found:
enter_isolation_block(ISOLATION_SERIALIZABLE)
if not Baz.objects.filter(foo=potential).exists():
obj.foo = potential
obj.save()
found = True
leave_isolation_block()
suffix += 1
potential = '%s-%s' % (base, suffix)
Of course with new python versions we can get the extra sugar coating:
with db.isolation(ISOLATION_SERIALIZABLE):
# ...
--
Patryk Zawadzki
Patryk Zawadzki skrev 2010-09-07 16.47:
> On Mon, Sep 6, 2010 at 7:12 PM, Patryk Zawadzki<pat...@pld-linux.org> wrote:
>> It would be more useful if you could explicitly
>> enter_isolation_block() and leave_isolation_block() as needed
>> (currently there is no way to commit the isolating transaction other
>> than doing a raw SQL query or accessing psycopg internals).
>
> Another usecase I just came along:
I don't think this is the appropriate forum for this discussion.
>
> Say I have to generate some unique string for the database. Something like that:
Then you either generate some form of UUID for that or let the database
handle it in a transaction safe manner (like using SERIAL in the case of
PostgreSQL).
You don't want to serialize access to the database in a multi-user
application.
Regards,
roppert
>> with db.isolation(ISOLATION_SERIALIZABLE):
>> # ...
>>
>
> You don't want to serialize access to the database in a multi-user
> application.
Just in case: serializable transaction != serializing access - among
serializable level transactions, the property of /looking like/ they
occurred serially in some consistent order is supposed to be maintained.
Maintaining that property may well result in quite a lot of actual
serialization and transactions needing retries I suppose, depending on
sophistication of the implementation.
Anyway. Postgresql's "serializable" isn't "complete" presently, like
oracle it's actually "snapshot" isolation:
http://www.postgresql.org/docs/9.0/static/transaction-iso.html#MVCC-SERIALIZABILITY
http://wiki.postgresql.org/wiki/Serializable
http://en.wikipedia.org/wiki/Isolation_(database_systems)#SERIALIZABLE
http://en.wikipedia.org/wiki/Snapshot_isolation
Django doesn't currently have support for setting it AFAICS (and there's
currently an assert restricting to level 0/1 in _set_isolation_level()),
but maybe it would be sensible to add some support (not necessarily very
fine-grained, perhaps just a django databases OPTIONS = {'isolation':
'snapshot'} a bit like 'autocommit' that means to use psycopg2 level
2|(0/2) instead of 1|(0/1) [autocommit off|on] for that database).
'snapshot' just being pedantic. 'serializable' would be consistent with
postgresql/oracle usage of course.
> Not deprecated in Django sense. I don't think it's going away any time
> soon. Just that it's been discouraged for a while at the
> psycopg/database level.
Well, maybe "discouraged" is fair, but it does exist and you have the
option (and if you use TransactionMiddleware with it on you can still
get isolation around views, with nothing stopping you entering
transaction management outside views either...).
Also as Gabriel H. mentioned, there's also calling connection.close() to
think about, for celery specifically perhaps you could connect a handler
for celery signal task_postrun [1] to do so for you, a bit like as is
done for the web case.
So. There are certainly other definite annoyances with django
transaction handling (what Kirit S. described as lack of composition,
see also [2]) that might merit some changes, but I'm not sure there's
immediate need for django-level changes here specifically.
[1]
http://ask.github.com/celery/reference/celery.signals.html#celery.signals.task_postrun