psycopg2, connection setup performance, and other issues

266 views
Skip to first unread message

meta...@gmail.com

unread,
Feb 7, 2007, 6:21:29 PM2/7/07
to Django developers
We're using Django quite heavily at Chesspark.com, but have been
having severe load problems of late. I suspected database
bottlenecking and I turned on query logging and analyzed the results.

To my surprise, 25% of all database time is the call to SET TIME
ZONE. Another 10-15% is the isolation level setup/teardown. For
reference, there were about 600,000 total queries during a one hour
period. 22% of the queries were SET TIME ZONE by count. BEGIN; SET
TRANSACTION ISOLATION and END where about another 12%. The most
common query (that did useful work) in the whole code was a
ModelObject.objects.get(id=blah), which represented about 5% of the
total count.

The first can be fixed by a simple patch to only call SET TIME ZONE
when the connection is set up. The time zone setting will remain in
effect for any cursor created from that connection. As for the second
problem, it seems to me that Django wants to be in the AUTOCOMMIT
isolation level most of the time. There is no need for a transaction
block on a SELECT call. If a transaction block is needed it's easy
enough to do your own begin/commit or begin/rollback pair.

Also, modelobj.save() seems to do an existence test (SELECT 1 FROM...)
followed by an UPDATE or an INSERT. This will fail even in READ
COMMITTED isolation mode in some cases, as another transaction can
complete during the call to SELECT that creates the row. This is
probably an edge case, See http://www.postgresql.org/docs/8.1/static/
sql-update.html for an example of how to do this correctly.

And finally, it would be really nice if django's DatabaseWrapper
passed cursor parameters to the underlying engine. Then at least one
could pass dict cursor factories into psycopg2 when using the low
level interface.

Have others run into this issue? Are the above suggestions workable?
I already have patches for SET TIME ZONE and for the cursor factory
issue. I'm happy to create a patch for AUTOCOMMIT isolation mode, but
I don't know what other things that might affect. I suspect because
READ COMMITED isolation mode is not fully isolated, it wouldn't make
any difference.

jack.

Michael Radziej

unread,
Feb 8, 2007, 2:39:46 AM2/8/07
to django-d...@googlegroups.com

meta...@gmail.com schrieb:


> We're using Django quite heavily at Chesspark.com, but have been
> having severe load problems of late. I suspected database
> bottlenecking and I turned on query logging and analyzed the results.
>
> To my surprise, 25% of all database time is the call to SET TIME
> ZONE. Another 10-15% is the isolation level setup/teardown. For
> reference, there were about 600,000 total queries during a one hour
> period. 22% of the queries were SET TIME ZONE by count. BEGIN; SET
> TRANSACTION ISOLATION and END where about another 12%. The most
> common query (that did useful work) in the whole code was a
> ModelObject.objects.get(id=blah), which represented about 5% of the
> total count.
>
> The first can be fixed by a simple patch to only call SET TIME ZONE
> when the connection is set up. The time zone setting will remain in
> effect for any cursor created from that connection.

Sounds like a valid approach. Is anyone aware of a problem? I only
remember that there were strange bugs if you don't set it up with each
connection. Please go ahead and file a ticket for it.

There was similar fix for the mysql backend, regarding "SET NAMES".

> As for the second
> problem, it seems to me that Django wants to be in the AUTOCOMMIT
> isolation level most of the time. There is no need for a transaction
> block on a SELECT call. If a transaction block is needed it's easy
> enough to do your own begin/commit or begin/rollback pair.

Autocommit isn't an isolation level, as far as I can see, it's only a
special mode of the psql command line client. I think the isolation
level should be at least READ COMMITTED, but what is currently used? In
any case, it's sufficient to set it once, as SET TIME ZONE.

> Also, modelobj.save() seems to do an existence test (SELECT 1 FROM...)
> followed by an UPDATE or an INSERT. This will fail even in READ
> COMMITTED isolation mode in some cases, as another transaction can
> complete during the call to SELECT that creates the row. This is
> probably an edge case, See http://www.postgresql.org/docs/8.1/static/
> sql-update.html for an example of how to do this correctly.

This case isn't treated properly in the generic views, but how you treat
it is ultimately your own responsibility. You can simply re-evaluate the
validation when you get an error from the database. I guess this is a
web developer attitude ... users are used to resend their post in case
of strange error messages from the server ;-) People coming from the GUI
are used to treat such collisions more carefully since the average GUI
client doesn't have a refresh button.

> And finally, it would be really nice if django's DatabaseWrapper
> passed cursor parameters to the underlying engine. Then at least one
> could pass dict cursor factories into psycopg2 when using the low
> level interface.

I lost you there. What are you trying to do?

> Have others run into this issue? Are the above suggestions workable?
> I already have patches for SET TIME ZONE and for the cursor factory
> issue. I'm happy to create a patch for AUTOCOMMIT isolation mode, but
> I don't know what other things that might affect. I suspect because
> READ COMMITED isolation mode is not fully isolated, it wouldn't make
> any difference.

Postgresql doesn't even have a transaction isolation level below READ
COMMITTED.

http://www.postgresql.org/docs/8.1/static/sql-set-transaction.html

"""
SET TRANSACTION transaction_mode [, ...]
SET SESSION CHARACTERISTICS AS TRANSACTION transaction_mode [, ...]

where transaction_mode is one of:

ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED |
READ UNCOMMITTED }
READ WRITE | READ ONLY
"""

later:

"""
The SQL standard defines two additional levels, READ UNCOMMITTED and
REPEATABLE READ. In PostgreSQL READ UNCOMMITTED is treated as READ
COMMITTED, while REPEATABLE READ is treated as SERIALIZABLE.
"""


Michael

meta...@gmail.com

unread,
Feb 8, 2007, 11:31:48 AM2/8/07
to Django developers, f...@initd.org
> Sounds like a valid approach. Is anyone aware of a problem? I only
> remember that there were strange bugs if you don't set it up with each
> connection. Please go ahead and file a ticket for it.

I'll file a ticket with the patch. We applied a patch yesterday for
this to our local copy and its been humming along all day.

> > As for the second
> > problem, it seems to me that Django wants to be in the AUTOCOMMIT
> > isolation level most of the time. There is no need for a transaction
> > block on a SELECT call. If a transaction block is needed it's easy
> > enough to do your own begin/commit or begin/rollback pair.
>
> Autocommit isn't an isolation level, as far as I can see, it's only a
> special mode of the psql command line client. I think the isolation
> level should be at least READ COMMITTED, but what is currently used? In
> any case, it's sufficient to set it once, as SET TIME ZONE.

Not in postgresql, but in psycopg2 it is. You can set the isolation
level to 3 different settings, autocommit, read committed, and
serializable. Currently django's psycopg2 adaptor is using read
committed, which is also the default psycopg2 setting. It's also
postgresql's default setting. The differene between autocommit and
read committed for psycopg2 is that in read committed mode every
cursor created in psycopg2 does: BEGIN; SET TRANSACTION ISOLATION
LEVEL READ COMMITTED whenever the first execute on the cursor happens,
and then it does an END/COMMIT/ROLLBACK when the cursor goes out of
scope, gets committed or gets rolledback. You won't see this in
django's query log (you won't see set timezone either afaik), but you
will see it in the query logs if you turn those on in postgresql. In
autocommit mode psycopg2 does not do these transaction boundaries for
you. Unless you need transaction boundaries (sometimes you do) this
is pure overhead.

I also have a patch that switches psycopg2 to autocommit mode, which
is essentially the same read committed mode on the postgresql side,
but psycopg2 won't do these transaction boundaries for you. The patch
has been running in production for us very well for a day so far.
Database load is less than a third of what it was with the SET
TIMEZONE and teh read committed mode of psycopg2.

Perhaps this is a bug in psycopg2 or an undocumented feature. I've
copied Federico on this message so he can chime in.

> > Also, modelobj.save() seems to do an existence test (SELECT 1 FROM...)
> > followed by an UPDATE or an INSERT. This will fail even in READ
> > COMMITTED isolation mode in some cases, as another transaction can
> > complete during the call to SELECT that creates the row. This is

> > probably an edge case, Seehttp://www.postgresql.org/docs/8.1/static/


> > sql-update.html for an example of how to do this correctly.
>
> This case isn't treated properly in the generic views, but how you treat
> it is ultimately your own responsibility. You can simply re-evaluate the
> validation when you get an error from the database. I guess this is a
> web developer attitude ... users are used to resend their post in case
> of strange error messages from the server ;-) People coming from the GUI
> are used to treat such collisions more carefully since the average GUI
> client doesn't have a refresh button.

I'll file a bug for it then. Even if it's a wishlist type item, it
probably shouldn't be forgotten about.

> > And finally, it would be really nice if django's DatabaseWrapper
> > passed cursor parameters to the underlying engine. Then at least one
> > could pass dict cursor factories into psycopg2 when using the low
> > level interface.
>
> I lost you there. What are you trying to do?

from django.db import connection
c = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
c.execute(BLAH)
print c.fetchone()['key']

For example. Django's DatabaseWrapper does not pass args and kwargs
downstream to the underlying dbapi adaptor, so the above call will
fail, even though psycopg2 will accept such a parameter. There really
should be a standard dictcursor in django which abstracts this, but
since that seemed unimplemented I tried to use psycopg2's directly.

Also, when using cursors directly, one often has to call
connection._commit() and connnection._rollback(). This feels very
strange right now because of the prefix underscore, which usually
signifies a private method.

I should note here that we are using django in a server application
built on Twisted (we also use it on the web app portion), so we use a
lot of these lowlevel calls because the ORM doesn't really seem
sophisticated enough to abstract complex queries where the resulting
data is not data for one of the model objects and looping through the
models generates really inefficient SQL. For lots of queries it works
beautifully though and has helped us get code running faster, even if
we have to go back and change some things to raw SQL later on. I
don't know that the _rollback/_commit functions would really be needed
by people using the ORM inside of the web app framework, but certainly
if you using the ORM on its own, those are commonly used.

Regards,
jack.

Michael Radziej

unread,
Feb 8, 2007, 12:49:08 PM2/8/07
to django-d...@googlegroups.com
meta...@gmail.com:

>> Sounds like a valid approach. Is anyone aware of a problem? I only
>> remember that there were strange bugs if you don't set it up with each
>> connection. Please go ahead and file a ticket for it.
>
> I'll file a ticket with the patch. We applied a patch yesterday for
> this to our local copy and its been humming along all day.

Since the effect of SET TIME ZONE is reverted when the transaction
aborts, you need to COMMIT after setting it.

Regarding SET TRANSACTION ISOLATION LEVEL, this only works for the
current transaction and thus needs to be repeated. Of course, it
wouldn't be necessary at all if the default isolation level in the
database server has been set up to READ COMMITTED ...

> [ Regarding "transaction isolation level AUTOCOMMIT" ]


> Not in postgresql, but in psycopg2 it is. You can set the isolation
> level to 3 different settings, autocommit, read committed, and
> serializable.
> Currently django's psycopg2 adaptor is using read
> committed, which is also the default psycopg2 setting. It's also
> postgresql's default setting. The differene between autocommit and
> read committed for psycopg2 is that in read committed mode every
> cursor created in psycopg2 does: BEGIN; SET TRANSACTION ISOLATION
> LEVEL READ COMMITTED
> whenever the first execute on the cursor happens,
> and then it does an END/COMMIT/ROLLBACK when the cursor goes out of
> scope, gets committed or gets rolledback. You won't see this in
> django's query log (you won't see set timezone either afaik), but you
> will see it in the query logs if you turn those on in postgresql. In
> autocommit mode psycopg2 does not do these transaction boundaries for
> you. Unless you need transaction boundaries (sometimes you do) this
> is pure overhead.

So AUTOCOMMIT means "don't create BEGIN/COMMIT" statements, which
proably leads to an autocommit mode, since without BEGIN, each
statement will immediately cause a COMMIT. But isn't it strange to
subsume it under the isolation levels? Well, never mind.

> I also have a patch that switches psycopg2 to autocommit mode, which
> is essentially the same read committed mode on the postgresql side,
> but psycopg2 won't do these transaction boundaries for you. The patch
> has been running in production for us very well for a day so far.
> Database load is less than a third of what it was with the SET
> TIMEZONE and teh read committed mode of psycopg2.

I don't know. Django can do transaction management (see e.g. the
transaction middleware), and to do this something needs to issue
BEGIN TRANSACTION statents. Django seems not to do this, so I guess
it relies on the psycopg backend. I doubt that this would work in
Autocommit mode. If you want to digg this up, try to use the
transaction middleware with Autocommit mode.

I'm a bit cautious with the "works fine for me" argument in these
cases. This all might be no problem for *you*. If you don't use any
transaction management, your default isolation level is READ
COMMITTED, anyway, and the default timezone is the same as specified
in Django settings, then you don't need any statements to set them
at all. If you want to make sure, test it with transaction
management, set the default database isolaton level to SERIALIZABLE
and the TIMEZONE to something far away. Then make "problematic"
queries and see if it still works ...

An optimization that could find out if the default settings of the
database server are alright and then skip the settings would solve
the problems better, IMO.

> Perhaps this is a bug in psycopg2 or an undocumented feature. I've
> copied Federico on this message so he can chime in.

Let's see :-)

>>> Also, modelobj.save() seems to do an existence test (SELECT 1 FROM...)
>>> followed by an UPDATE or an INSERT. This will fail even in READ
>>> COMMITTED isolation mode in some cases, as another transaction can
>>> complete during the call to SELECT that creates the row. This is
>>> probably an edge case, Seehttp://www.postgresql.org/docs/8.1/static/
>>> sql-update.html for an example of how to do this correctly.
>> This case isn't treated properly in the generic views, but how you treat
>> it is ultimately your own responsibility. You can simply re-evaluate the
>> validation when you get an error from the database. I guess this is a
>> web developer attitude ... users are used to resend their post in case
>> of strange error messages from the server ;-) People coming from the GUI
>> are used to treat such collisions more carefully since the average GUI
>> client doesn't have a refresh button.
>
> I'll file a bug for it then. Even if it's a wishlist type item, it
> probably shouldn't be forgotten about.

Fine!

There's nothing wrong with using custom queries, Django does not try
to put everything under the hood of its ORM. If you can propose a
way to extend Django so that you can use the features of psycopg2
more directly without sacrificing usability for other backends, it
would be a nice contribution!

You can control transactions in detail with Django's ORM, too, for
example with the @transaction.commit_manually decorator. See

http://www.djangoproject.com/documentation/transactions/

Or have you already tried this?


Cheers,

Michael


--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100

http://www.noris.de - The IT-Outsourcing Company

meta...@gmail.com

unread,
Feb 8, 2007, 4:06:38 PM2/8/07
to Django developers
> Since the effect of SET TIME ZONE is reverted when the transaction
> aborts, you need to COMMIT after setting it.

I'll add that to the patch after I test this case. Certainly this is
true if the default isolation mode isn't changed.

> Regarding SET TRANSACTION ISOLATION LEVEL, this only works for the
> current transaction and thus needs to be repeated. Of course, it
> wouldn't be necessary at all if the default isolation level in the
> database server has been set up to READ COMMITTED ...

Postgresql doesn't have a transaction isolation level less than read
committed. Read committed is the default isolation level, and the
only other level is serializable which is only going to be used in
special applications and would probably take a ton of work to get
django to use (since you have to be willing to retry transactions that
get serialization errors). So unless someone has set their database
to serializable isolation mode by default this command is effectively
a no-op. It still takes a few milliseconds to parse/execute, but
those add up quickily over a few million calls.

> > [ Regarding "transaction isolation level AUTOCOMMIT" ]

> So AUTOCOMMIT means "don't create BEGIN/COMMIT" statements, which
> proably leads to an autocommit mode, since without BEGIN, each
> statement will immediately cause a COMMIT. But isn't it strange to
> subsume it under the isolation levels? Well, never mind.

I think so. That's a question for the psycopg2 developers though, not
for django.

> I don't know. Django can do transaction management (see e.g. the
> transaction middleware), and to do this something needs to issue
> BEGIN TRANSACTION statents. Django seems not to do this, so I guess
> it relies on the psycopg backend. I doubt that this would work in
> Autocommit mode. If you want to digg this up, try to use the
> transaction middleware with Autocommit mode.

If django's middleware is broken, then we need to fix it. Leaving
this up to the underlying adapter is both asking for trouble (what if
psycopg2 2.1 changes this behavior?) and makes django horribly
inefficient using this adapter. The autocommit patch and the SET
TIMEZONE patch gave us a factor of 2 performance increase. That's a
lot of overhead to avoid fixing a bug elsewhere.

That said, I don't think this breaks django out of the box, but maybe
some people are relying on this broken behavior. None of our web apps
which use the middleware were affected by this change, and the only
instance I could find where there could be a problem was in the save()
call, which as I already stated has the same issue in a transaction
block or outside it. read committed isolation mode does not mean that
each statement sees the same database. It only means that the rest of
the database won't see any changes a transaction makes until it is
finished.

> I'm a bit cautious with the "works fine for me" argument in these
> cases. This all might be no problem for *you*. If you don't use any
> transaction management, your default isolation level is READ
> COMMITTED, anyway, and the default timezone is the same as specified
> in Django settings, then you don't need any statements to set them
> at all. If you want to make sure, test it with transaction
> management, set the default database isolaton level to SERIALIZABLE
> and the TIMEZONE to something far away. Then make "problematic"
> queries and see if it still works ...

One would hope that calling SET TIMEZONE and SET TRANSACTION ISOLATION
LEVEL once per connection would be plenty. I think that even if this
needs to be called every connection, django needs to special case this
somehow since serializable databases are (one assumes, especially
reading postgresql documentation) in the vast minority. 50% overhead
seems like an unfair price to pay.

> An optimization that could find out if the default settings of the
> database server are alright and then skip the settings would solve
> the problems better, IMO.

Agreed. I think you don't even need to test. Using autocommit mode
and calling SET TIMEZONE and SET TRANSATION ISOLATION LEVEL once per
connection is probably all you need. I will try and test this today
to make sure.

> There's nothing wrong with using custom queries, Django does not try
> to put everything under the hood of its ORM. If you can propose a
> way to extend Django so that you can use the features of psycopg2
> more directly without sacrificing usability for other backends, it
> would be a nice contribution!

The two line patch is here:
http://code.djangoproject.com/ticket/3461

> You can control transactions in detail with Django's ORM, too, for
> example with the @transaction.commit_manually decorator. See
>
> http://www.djangoproject.com/documentation/transactions/
>
> Or have you already tried this?

We're not using the ORM inside a web application in this instance. So
the transaction middleware is not used and a view decorator would not
be useful for us. However, I see now that the commit() and rollback()
are just somewhere different now under transaction instead of
connection. It's easy enough for us to change our code.

Btw, my other patches are:

autocommit: http://code.djangoproject.com/ticket/3460
set timezone: http://code.djangoproject.com/ticket/3459

jack.

meta...@gmail.com

unread,
Feb 8, 2007, 5:09:58 PM2/8/07
to Django developers
I've updated

http://code.djangoproject.com/ticket/3459

and

http://code.djangoproject.com/ticket/3460

with an improved patch that fixes any issues with weird default
database settings. For the SET TIMEZONE the patch will not have any
effect on django. For the autocommit patch, the implied BEGIN/END
blocks are no longer there, but this shouldn't cause a problem. Any
transactions started by django will be by default READ COMMITTED
regardless of the database's default isolation level or the isolation
level set by psycopg2.

I've also attached a small test file to each bug to try and
demonstrate correctness. Please let me know if I've missed anything.

jack.

Reply all
Reply to author
Forward
0 new messages