Implementing postgres database timeouts

1,589 views
Skip to first unread message

RahulDave

unread,
Feb 2, 2009, 1:16:14 PM2/2/09
to Django developers
Hi Folks,
I have a django astronomy application which fronts a 73 million row
main table and 150 million row secondary table database, highly
indexed of-course. The back-end database is postgres. The application
itself is two-fold, a web services backend which can tolerate
extremely long queries as its intended to be used in batch mode in
astronomical pipelines, and a web UI which uses aspects of the web
services to provide information in a fast interactive away about
multiple stars. Queries may be on time of observation, position,
brightness, etc.

Postgres has the ability to set timeouts by doing the following before
a long running query:
lightcurvedb4=# set session statement_timeout=5000;
LOG: statement: set session statement_timeout=5000;
LOG: statement: set session statement_timeout=5000;
SET
lightcurvedb4=# select * from maindb_astobject;
LOG: statement: select * from maindb_astobject;
LOG: statement: select * from maindb_astobject;
ERROR: canceling statement due to statement timeout
STATEMENT: select * from maindb_astobject;
ERROR: canceling statement due to statement timeout
lightcurvedb4=# reset statement_timeout;
LOG: statement: reset statement_timeout;
LOG: statement: reset statement_timeout;
RESET

My first question is , how could I wrap each django query in such a
set of stataemtns. I could figure that its the web ui from a custom
header or user agent and set this , but its not something I could juck
on to a .extra() type call.

When I tried to do the query in the django shell i found that django
returns an empty queryset on timeout. I'm using postgresql_psycopg2,
which since version 2.0.7 supports QueryCanceledError for timeouts.
I'd like to add support for this to the django postgresql_psycopg2
driver, but am not at all sure which file I ought to add it too. I'll
be experimenting to fnd out but thought I'd ask if someone here
knew... (I'm using Django 1.0).

Thanks a ton,
Rahul

RahulDave

unread,
Feb 2, 2009, 4:48:22 PM2/2/09
to Django developers
Thought I should perhaps be more illustrative of the problem.
At django shell:
--------------
In [2]: from lcdb.maindb.models import LCOT

In [3]: from lcdb.maindb.models import Snippet

In [4]: bset=LCOT.objects.all()

In [5]: bset
Out[5]: []

In [6]:
------------
meanwhile at postgres logs:
LOG: statement: SET DATESTYLE TO 'ISO'
LOG: statement: SHOW client_encoding
LOG: statement: SHOW default_transaction_isolation
LOG: statement: BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
LOG: statement: SET TIME ZONE E'America/Chicago'
LOG: statement: SELECT version()
LOG: statement: SELECT "maindb_lcot"."id", "maindb_lcot"."source_id",
"maindb_lcot"."astobject_id", "maindb_lcot"."band_id",
"maindb_lcot"."when", "maindb_lcot"."whenend",
"maindb_lcot"."created_at", "maindb_lcot"."updated_at" FROM
"maindb_lcot"
ERROR: canceling statement due to statement timeout
STATEMENT: SELECT "maindb_lcot"."id", "maindb_lcot"."source_id",
"maindb_lcot"."astobject_id", "maindb_lcot"."band_id",
"maindb_lcot"."when", "maindb_lcot"."whenend",
"maindb_lcot"."created_at", "maindb_lcot"."updated_at" FROM
"maindb_lcot"

So you can see a ProgrammingError or QueryCancelledError is not
thrown. Ofcourse I should also be wrapping this in a transaction
rollback but thats not releavant to the crux of this post...

Rahul

Malcolm Tredinnick

unread,
Feb 3, 2009, 1:25:32 AM2/3/09
to django-d...@googlegroups.com

If the timeout is independent of the query itself, you could write your
own database backend. Inherit all the basic stuff from an existing
backend and change the database connection stuff to send across the
timeout settings.

If the timeout needs to change based on the query context, you'll need
to use a custom QuerySet each time, since you need a custom Query object
(in the future -- say, Django 1.2 timeframes -- this should be easier,
since replacing the Query object used en-masse will become practical).
You need to replace the Query.execute() method.

Longer-term (since this is django-dev, the longer-term solution is the
only interesting one :-) ), it might be interesting to come up with an
API for setting things like this on a per-query basis. Would be worth
investigating what other database wrappers and servers can do in this
respect so that we have a suitably generic wrapper. Shouldn't be at all
intrusive on the fat part of the user-base curve (since they won't use
it and it will be a no-op) and will be useful at the far right-hand
side.

> When I tried to do the query in the django shell i found that django
> returns an empty queryset on timeout.

Django doesn't know anything at all about the timeout. It's the database
API warapper (postgresql_psycopg2) that is returning with no results.
Django is interpreting that as the normal "no results returned" case.

Regards,
Malcolm

RahulDave

unread,
Feb 3, 2009, 10:03:52 AM2/3/09
to Django developers
Thanks for the reply!

On Feb 3, 1:25 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
....
>
> > My first question is , how could I wrap eachdjangoquery in such a
> > set of stataemtns. I could figure that its the web ui from a custom
> > header or user agent and set this , but its not something I could juck
> > on to a .extra() type call.
>
> If thetimeoutis independent of the query itself, you could write your
> own database backend. Inherit all the basic stuff from an existing
> backend and change the database connection stuff to send across thetimeoutsettings.
>

This may be the easiest thing to do; since the timeout is really only
dependent on the user-agent context. The making sure that a
transaction is rolled back could be done in this context too and I
dont land up polluting the actual backend either.

> If thetimeoutneeds to change based on the query context, you'll need
> to use a custom QuerySet each time, since you need a custom Query object
> (in the future -- say,Django1.2 timeframes -- this should be easier,
> since replacing the Query object used en-masse will become practical).
> You need to replace the Query.execute() method.
>
> Longer-term (since this isdjango-dev, the longer-term solution is the
> only interesting one :-) ), it might be interesting to come up with an
> API for setting things like this on a per-query basis. Would be worth
> investigating what other database wrappers and servers can do in this
> respect so that we have a suitably generic wrapper. Shouldn't be at all
> intrusive on the fat part of the user-base curve (since they won't use
> it and it will be a no-op) and will be useful at the far right-hand
> side.

Most certainly, this would be rather cool. I think mysql supports
timeouts directly on the 'command-line', at the very least.

>
> > When I tried to do the query in thedjangoshell i found thatdjango
> > returns an empty queryset ontimeout.
>
> Djangodoesn't know anything at all about thetimeout. It's the database
> API warapper (postgresql_psycopg2) that is returning with no results.Djangois interpreting that as the normal "no results returned" case.
>

Aah, i think i have been very confused here and looking at the wrong
code. I was under the impression that psycopg2 itself would return a
ProgrammingError or Query Canceled error to the
django backend which was then eating it up, but it looks like I(or the
psycopg2 docs) might be wrong in this assumption. I'll investigate.
> Regards,
> Malcolm

Thanks,
Rahul

Malcolm Tredinnick

unread,
Feb 3, 2009, 10:39:41 PM2/3/09
to django-d...@googlegroups.com
On Tue, 2009-02-03 at 07:03 -0800, RahulDave wrote:
> Thanks for the reply!
>
> On Feb 3, 1:25 am, Malcolm Tredinnick <malc...@pointy-stick.com>
> wrote:
> ....
> >
> > > My first question is , how could I wrap eachdjangoquery in such a
> > > set of stataemtns. I could figure that its the web ui from a custom
> > > header or user agent and set this , but its not something I could juck
> > > on to a .extra() type call.
> >
> > If thetimeoutis independent of the query itself, you could write your
> > own database backend. Inherit all the basic stuff from an existing
> > backend and change the database connection stuff to send across thetimeoutsettings.
> >
>
> This may be the easiest thing to do; since the timeout is really only
> dependent on the user-agent context.

No, then it won't make sense. You don't have access to anything like
"user-agent" at the database backend level. It's not tied to web
requests. You could tie it to a setting, but that's the level of
granularity you have.

[Okay, you could shove stuff in threadlocals and be very careful about
managing the value and pass things around that way. Debugging will be
something almost entirely unlike fun in some cases, I'd imagine.]

Regards,
Malcolm

RahulDave

unread,
Feb 4, 2009, 1:41:20 PM2/4/09
to Django developers
The Plot thickens a bit with whats happening in the case of the eaten
up QueryCanceled Error. Basically,
somewhere in the list protocol, the QueryCancelledError thrown by
psycopg2 gets eaten up!

Here's what I did(django 1.0), in django/db/models/query.py

143 def __repr__(self):
144 import pdb
145 pdb.set_trace()
146 a=list(self)
147 return repr(a)
148 #return repr(list(self))

So i am having pdb set up and step through:

[rahul@tscmachine v4]$ python testexc.py
DEBUG> False
> /n/home/rahul/v4/django/db/models/query.py(146)__repr__()
-> a=list(self)
(Pdb) s
--Call--
> /n/home/rahul/v4/django/db/models/query.py(163)__iter__()
-> def __iter__(self):
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(164)__iter__()
-> if self._result_cache is None:
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(165)__iter__()
-> self._iter = self.iterator()
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(166)__iter__()
-> self._result_cache = []
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(167)__iter__()
-> if self._iter:
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(168)__iter__()
-> return self._result_iter()
(Pdb) n
--Return--
> /n/home/rahul/v4/django/db/models/query.py(168)__iter__()-><generat...xb99cb40>
-> return self._result_iter()
(Pdb) n
--Call--
> /n/home/rahul/v4/django/db/models/query.py(150)__len__()
-> def __len__(self):
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(154)__len__()
-> if self._result_cache is None:
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(159)__len__()
-> elif self._iter:
(Pdb) n
> /n/home/rahul/v4/django/db/models/query.py(160)__len__()
-> self._result_cache.extend(list(self._iter))
(Pdb) n
QueryCanceledError: QueryCan...eout\n',)
> /n/home/rahul/v4/django/db/models/query.py(160)__len__()
-> self._result_cache.extend(list(self._iter))
(Pdb) c
[]

Notice the error being thrown and then a normal completion on
continue.

The testing program is thus:

[rahul@tscmachine models]$ more ~/v4/testexc.py
from lcdb.maindb.models import LCOT

b=LCOT.objects.all()
try:
print b
except:
print "exce"

the result is:

[rahul@tscmachine v4]$ python testexc.py
[]

The maindb_lcot table has 150 million rows and I have set a timeout of
10 seconds which does get hit...

I'm delving into the iterator protocol used by django and why it calls
__len__ but perhaps someone more familiar with the workings of
query.py could point me in the right direction?

Rahul

On Feb 3, 10:39 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Karen Tracey

unread,
Feb 4, 2009, 2:07:43 PM2/4/09
to django-d...@googlegroups.com
On Wed, Feb 4, 2009 at 1:41 PM, RahulDave <rahu...@gmail.com> wrote:

The Plot thickens a bit with whats happening in the case of the eaten
up QueryCanceled Error. Basically,
somewhere in the list protocol, the QueryCancelledError thrown by
psycopg2 gets eaten up!

[snip details]

I'm delving into the iterator protocol used by django and why it calls
__len__ but perhaps someone more familiar with the workings of
query.py could point me in the right direction?

I can't help with details of why __len__ is call but I can note that __len__ swallowing exceptions is a known Python problem:

http://bugs.python.org/issue1242657

I believe t was originally reported for Python 2.3, fixed in 2.4/2.5, broke again in 2.6 and it appears it will be fixed again in 2.7.

Karen
Reply all
Reply to author
Forward
0 new messages