Django “Script timed out before returning headers”

5,597 views
Skip to first unread message

Chase

unread,
Apr 14, 2011, 3:18:00 PM4/14/11
to modwsgi
I have a custom Django app that's becoming unresponsive
intermittently. About once every couple of days between three servers,
serving about 10,000 requests a day. When it happens, it never
recovers. I can leave it there for hours, and it will not server any
more requests.


In the apache logs, I see see the following:

Apr 13 11:45:07 www3 apache2[27590]: **successful view render here**
...
Apr 13 11:47:11 www3 apache2[24032]: [error] server is within
MinSpareThreads of MaxClients, consider raising the MaxClients setting
Apr 13 11:47:43 www3 apache2[24032]: [error] server reached MaxClients
setting, consider raising the MaxClients setting
...
Apr 13 11:50:34 www3 apache2[27617]: [error] [client 10.177.0.204]
Script timed out before returning headers: django.wsgi
(repeated 100 times, exactly)


I am running:

apache version 2.2, using the worker MPM
wsgi version 2.8
SELinux NOT installed
lxml package being used, infrequently
Ubuntu 10.04


apache config:

WSGIDaemonProcess site-1 user=django group=django threads=50
WSGIProcessGroup site-1
WSGIScriptAlias / /somepath/django.wsgi /somepath/django.wsgi


wsgi config:

import os, sys
sys.path.append('/home/django')
os.environ['DJANGO_SETTINGS_MODULE'] = 'myapp.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()


When this happens, I can kill the wsgi process and the server will
recover.

>ps aux|grep django # process is running as user "django"
django 27590 5.3 17.4 908024 178760 ? Sl Apr12 76:09 /usr/
sbin/apache2 -k start
>kill -9 27590


This leads me to believe that the problem is a known issue:

"(deadlock-timeout) Defines the maximum number of seconds allowed to
pass before the daemon process is shutdown and restarted after a
potential deadlock on the Python GIL has been detected. The default is
300 seconds. This option exists to combat the problem of a daemon
process freezing as the result of a rouge Python C extension module
which doesn't properly release the Python GIL when entering into a
blocking or long running operation."


However, I'm not sure why this condition is not clearing
automatically. I do see that the script timeout occurs exactly 5
minutes after the last successful page render, so the deadlock-timeout
is getting triggered. But it does not actually kill the process.

I'm thinking of switching to MPM/prefork, but I'm not sure if that
should have any effect, given that I'm in daemon mode already.

Carl Nobile

unread,
Apr 14, 2011, 4:35:40 PM4/14/11
to mod...@googlegroups.com
It looks like you are running a single process with 50 threads, I think you should use more processes with less threads something like this:

WSGIDaemonProcess site-1 user=django group=django processes=5 threads=10 maximum-requests=1000

WSGIProcessGroup site-1
WSGIScriptAlias / /somepath/django.wsgi /somepath/django.wsgi

The 'maximum-requests=1000' will kill each thread after a 1000 requests and create a new one, this helps to keep memory leaks to a minimum.

~Carl


--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To post to this group, send email to mod...@googlegroups.com.
To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.




--
-------------------------------------------------------------------------------
Carl J. Nobile (Software Engineer)
carl....@gmail.com
-------------------------------------------------------------------------------

Graham Dumpleton

unread,
Apr 14, 2011, 5:09:17 PM4/14/11
to mod...@googlegroups.com
On 15 April 2011 06:35, Carl Nobile <carl....@gmail.com> wrote:
> It looks like you are running a single process with 50 threads, I think you
> should use more processes with less threads something like this:
>
> WSGIDaemonProcess site-1 user=django group=django processes=5 threads=10
> maximum-requests=1000
> WSGIProcessGroup site-1
> WSGIScriptAlias / /somepath/django.wsgi /somepath/django.wsgi
>
> The 'maximum-requests=1000' will kill each thread after a 1000 requests and
> create a new one, this helps to keep memory leaks to a minimum.

Sorry to correct you Carl, but that isn't quite how it works.

I'll respond in more detail later to original question. Still 7am here
and just got off phone from a work meeting. So need to wake up a bit
more first. :-)

Graham

Graham Dumpleton

unread,
Apr 14, 2011, 6:30:05 PM4/14/11
to mod...@googlegroups.com

They likely aren't being killed because there isn't actually a
deadlock of a single thread which hasn't release the GIL.

In other words, what the dead lock timeout will not protect against is
threads calling into C code, releasing the GIL and then deadlocking in
C code.

In your case, the problem is going to be the lxml module. This module
is known not to work in Python sub interpreters properly.
Specifically, the lxml can release the GIL and then attempt to do a
callback into Python code. To do this, it uses the simplified GIL
state API in Python to reacquire the GIL, but that API is only
supposed to be used if running in the main Python interpreter and not
a sub interpreter. When used in a sub interpreter, the code will
deadlock on trying to reacquire the Python GIL.

That lxml is a problem is documented in:

http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Multiple_Python_Sub_Interpreters

The solution, since you are only delegating one application to that
mod_wsgi daemon process group, is to add:

WSGIApplicationGroup %{GLOBAL}

This will force the application to run in the main Python interpreter
and avoid the shortcomings of lxml module.

As how you might protect against this sort of deadlock in C code when
GIL isn't locked, the only way is to use 'inactivity-timeout'. This
will cause a restart when there has been no new requests and/or no
reading of request content or generation of response content for that
timeout period. So, this could be used as a fail safe, but if your
application is used in frequently, it will also have the affect of
causing your idle process to be restarted after the timeout period as
well.

BTW, in worst cases, for detecting what process is doing, one can use either:

http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB

> I'm thinking of switching to MPM/prefork, but I'm not sure if that
> should have any effect, given that I'm in daemon mode already.

Prefork for some people has been causing subtle problems and I would
avoid it if you can.

Graham

Graham Dumpleton

unread,
Apr 14, 2011, 6:38:26 PM4/14/11
to mod...@googlegroups.com
On 15 April 2011 07:09, Graham Dumpleton <graham.d...@gmail.com> wrote:
> On 15 April 2011 06:35, Carl Nobile <carl....@gmail.com> wrote:
>> It looks like you are running a single process with 50 threads, I think you
>> should use more processes with less threads something like this:
>>
>> WSGIDaemonProcess site-1 user=django group=django processes=5 threads=10
>> maximum-requests=1000
>> WSGIProcessGroup site-1
>> WSGIScriptAlias / /somepath/django.wsgi /somepath/django.wsgi
>>
>> The 'maximum-requests=1000' will kill each thread after a 1000 requests and
>> create a new one, this helps to keep memory leaks to a minimum.
>
> Sorry to correct you Carl, but that isn't quite how it works.

The small correction is that once that number of threads is reached
for the whole process, irrespective of how many threads are running in
the process, then the process as a whole is killed off and restarted.
It isn't done at individual thread level within ongoing process.

The maximum-requests option should be avoided in production processes
if at all possible because the quicker requests come through the more
frequently the process will restart, which is likely the last thing
you want to happen when under load.

As to number of processes/threads, as Carl pointed out, OP should
avoid having high numbers of threads in a single process and instead
create multiple processes with a small number of threads.

For most people, the default of 15 threads per process is likely
overkill with that many concurrent requests never actually occurring,
so increasing it with no good reason is not a good idea. If you have
the memory available, possibly better off going to 3 processes each
with 5 threads only.

Graham

Carl Nobile

unread,
Apr 14, 2011, 7:04:04 PM4/14/11
to mod...@googlegroups.com
I didn't know that lxml was an issue. I will keep that in mind. Does the C version of ElementTree have the same issues?
Though lxml is a better package because it implements more of the XML spec like true xpath and XSLT.

~Carl

Graham Dumpleton

unread,
Apr 14, 2011, 7:49:34 PM4/14/11
to mod...@googlegroups.com
Don't know of any problems with ElementTree.

From memory, the problem with lxml is because it uses SWIG to generate
Python wrappers for C internals and it is SWIG that uses simplified
GIL state API when doing callbacks.

Thus, this problem generally affects anything that uses SWIG and which
is doing callbacks from C code into Python.

Even if my memory is bad and lxml doesn't use SWIG, the issue with
SWIG still stands.

Graham

Chase

unread,
Apr 15, 2011, 11:04:47 AM4/15/11
to modwsgi
Wow, lots of good info. Thanks guys! I have made the
"WSGIApplicationGroup %{GLOBAL}" change for now; we'll see if that
clears it up over the next week or so.

As for running in prefork, I have not made that change yet. But here
is the documentation that lead me to believe this was preferred:

http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango

"Now, traditional wisdom in respect of Django has been that it should
perferably only be used on single threaded servers. This would mean
for Apache using the single threaded 'prefork' MPM on UNIX systems and
avoiding the multithreaded 'worker' MPM."

Also, the older modpython docs also advised this:

http://docs.djangoproject.com/en/dev/howto/deployment/modpython/?from=olddocs

"Django requires Apache 2.x and mod_python 3.x, and you should use
Apache’s prefork MPM, as opposed to the worker MPM."

Can you link to a discussion of the subtle problems reported with
prefork? Thanks again,

-Chase


On Apr 14, 6:30 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
>  http://code.google.com/p/modwsgi/wiki/ApplicationIssues#Multiple_Pyth...
>
> The solution, since you are only delegating one application to that
> mod_wsgi daemon process group, is to add:
>
>   WSGIApplicationGroup %{GLOBAL}
>
> This will force the application to run in the main Python interpreter
> and avoid the shortcomings of lxml module.
>
> As how you might protect against this sort of deadlock in C code when
> GIL isn't locked, the only way is to use 'inactivity-timeout'. This
> will cause a restart when there has been no new requests and/or no
> reading of request content or generation of response content for that
> timeout period. So, this could be used as a fail safe, but if your
> application is used in frequently, it will also have the affect of
> causing your idle process to be restarted after the timeout period as
> well.
>
> BTW, in worst cases, for detecting what process is doing, one can use either:
>
>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_...
>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...

Graham Dumpleton

unread,
Apr 16, 2011, 12:08:46 AM4/16/11
to mod...@googlegroups.com
On 16 April 2011 01:04, Chase <chase....@gmail.com> wrote:
> Wow, lots of good info. Thanks guys! I have made the
> "WSGIApplicationGroup %{GLOBAL}" change for now; we'll see if that
> clears it up over the next week or so.
>
> As for running in prefork, I have not made that change yet. But here
> is the documentation that lead me to believe this was preferred:
>
> http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango
>
> "Now, traditional wisdom in respect of Django has been that it should
> perferably only be used on single threaded servers. This would mean
> for Apache using the single threaded 'prefork' MPM on UNIX systems and
> avoiding the multithreaded 'worker' MPM."
>
> Also, the older modpython docs also advised this:
>
> http://docs.djangoproject.com/en/dev/howto/deployment/modpython/?from=olddocs
>
> "Django requires Apache 2.x and mod_python 3.x, and you should use
> Apache’s prefork MPM, as opposed to the worker MPM."
>
> Can you link to a discussion of the subtle problems reported with
> prefork? Thanks again,

That section was more relevant when Django 1.0 had only just come out,
which was the first version of Django for which the core was
supposedly thread safe.

Anyway, the MPM you use isn't particularly relevant as you are using
daemon mode and not embedded mode. Which MPM you use is only critical
if you are using embedded mode.

In daemon mode you have the arbitrary ability to control
processes/threads based on whether your application is thread safe.

For related reading see:

http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html

BTW, the IntegrationWithDjango page in the wiki is likely to be
completely removed at some point in the near future and I will stop
providing details for specific frameworks to cover where frameworks
don't themselves provide enough information. I have already removed
the pages for most of the other frameworks already. End result is that
the frameworks themselves will need to provide decent documentation
themselves to cover any idiosyncrasies that exist in setting up their
framework to work with mod_wsgi which are due to issues or design
decisions related to their framework and which are nothing to do with
mod_wsgi. I have had enough of trying to document these framework
specific subtleties and framework authors tend to express a belief
that their own documentation is already more than adequate even though
from what I have seen people still get tripped up when they follow
only the documentation provided by the framework. So, I will be
devoting my time elsewhere now and not worrying about documenting
stuff related to the frameworks or actively assisting users of
frameworks on forums related to those frameworks or on general forums
such as StackOverflow. Instead, if it is a framework specific issue,
you will need to seek help from the developers or the community for
that framework.

Graham

Chase

unread,
Apr 16, 2011, 7:56:49 AM4/16/11
to modwsgi
The problem persists. I have removed our calls to lxml; they were not
critical. We'll see what effect that has going forward.

-Chase


On Apr 16, 12:08 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> On 16 April 2011 01:04, Chase <chase.seib...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > Wow, lots of good info. Thanks guys! I have made the
> > "WSGIApplicationGroup %{GLOBAL}" change for now; we'll see if that
> > clears it up over the next week or so.
>
> > As for running in prefork, I have not made that change yet. But here
> > is the documentation that lead me to believe this was preferred:
>
> >http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango
>
> > "Now, traditional wisdom in respect of Django has been that it should
> > perferably only be used on single threaded servers. This would mean
> > for Apache using the single threaded 'prefork' MPM on UNIX systems and
> > avoiding the multithreaded 'worker' MPM."
>
> > Also, the older modpython docs also advised this:
>
> >http://docs.djangoproject.com/en/dev/howto/deployment/modpython/?from...
>
> > "Django requires Apache 2.x and mod_python 3.x, and you should use
> > Apache’s prefork MPM, as opposed to the worker MPM."
>
> > Can you link to a discussion of the subtle problems reported with
> > prefork? Thanks again,
>
> That section was more relevant when Django 1.0 had only just come out,
> which was the first version of Django for which the core was
> supposedly thread safe.
>
> Anyway, the MPM you use isn't particularly relevant as you are using
> daemon mode and not embedded mode. Which MPM you use is only critical
> if you are using embedded mode.
>
> In daemon mode you have the arbitrary ability to control
> processes/threads based on whether your application is thread safe.
>
> For related reading see:
>
>  http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading
>  http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usa...

Chase

unread,
Apr 23, 2011, 1:38:04 PM4/23/11
to modwsgi
Changed the config from 1 process 50 threads to 3 processed 5 threads.
That seems to have solved it, or at least made it much less likely.

-Chase

Graham Dumpleton

unread,
Apr 24, 2011, 6:51:17 PM4/24/11
to mod...@googlegroups.com
That many threads was never a good idea.

A possible reason why you are seeing less problems with only 5 threads
in a process is that your code or a third party C extension is not
thread safe and are perhaps deadlocking.

You really need to ascertain when process threads are starting to hang and use:

http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB

to work out what it is doing at that time.

Graham

rwman

unread,
Jun 5, 2011, 5:24:10 PM6/5/11
to modwsgi
Is there a way to make apache work even when such deadlock occur? Can
a process be killed and restarted automatically? I know, it is not a
solution for actual problem and should be solved by eliminating
deadlock, but the goal is to make production server work while
debugging the problem.
I tried all options of modwsgi that seemed relevant, but could not
achieve stable apache counficuration. It stuck after some time for
about 5 hours.

On Apr 25, 2:51 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> That many threads was never a good idea.
>
> A possible reason why you are seeing less problems with only 5 threads
> in a process is that your code or a third party C extension is not
> thread safe and are perhaps deadlocking.
>
> You really need to ascertain when process threads are starting to hang and use:
>
>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_...
>  http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_C...
>
> to work out what it is doing at that time.
>
> Graham
>
> ...
>
> read more »

Graham Dumpleton

unread,
Jun 5, 2011, 8:56:55 PM6/5/11
to mod...@googlegroups.com
Can you explain clearly your original symptoms. The message 'Script
timed out before returning headers' which the subject of this
discussion gave can happen in a number of circumstances and some are
not related to deadlocks.

On 6 June 2011 05:24, rwman <someuni...@gmail.com> wrote:
> Is there a way to make apache work even when such deadlock occur?

When using daemon mode it has for some time had the ability to detect
timeouts and it should kill off process after 300 seconds. This
doesn't apply if using embedded mode, so when explaining your original
problem you should explain the configuration you are using and
preferably post the mod_wsgi bits from the Apache configuration.

There are some extreme cases where a third party Python extension
module might defeat the deadlock detection, but the extension module
would need to be doing things it probably shouldn't be doing. The dead
lock timeout also will not kick in your code is simply looping or suck
in database queries that take a long time.

> Can a process be killed and restarted automatically?

For true deadlocks, that is what the deadlock detection of daemon mode
does. There is also an optionally enabled inactivity timeout failsafe
as well that can be turned on which helps to recover from non deadlock
cases where request handlers are looping in stuck in database queries.

> I know, it is not a
> solution for actual problem and should be solved by eliminating
> deadlock, but the goal is to make production server work while
> debugging the problem.
> I tried all options of modwsgi that seemed relevant, but could not
> achieve stable apache counficuration. It stuck after some time for
> about 5 hours.

Without an explanation of your original problem, it isn't clear that
you are having a deadlock problem. It could be that you have request
handlers that are getting on loops and never completing, thereby using
up all the request handler threads.

So, give your current configuration and what other variations you have
used, so can see what you are doing and confirm whether using embedded
mode or daemon mode. Also indicate if using Apache prefork or worker
MPM and whether PHP being used in same Apache web server.

Indicate whether you have looked at inactivity-timeout option for
WSGIDaemonProcess and whether you have at least seen deadlock-timeout
option, although the latter defaults to on anyway.

Also indicate whether you have tried adding any variant of the code as
explained in:

http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces

to try and get the daemon process to dump Python stack traces when it
does get stuck so you might work out what it is doing.

You could also try extracting C stack traces as explained in:

http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB

Graham

Reply all
Reply to author
Forward
0 new messages