Sorry for the slow reply, been on long holiday.
Why are you using maximum-requests and inactivity-timeout?
I would always recommended against using these options if there is no
real requirement as unnecessary restarts should always be avoided
because the startup costs of loading up fat Python web applications
can significantly impact server performance if your server is under
constant load.
IOW, the last thing you want to happen if you are under loading,
especially a load spike, is to restart the application due to hitting
maximum-requests.
Anyway, I don't recollect specifically a case where daemon process
wouldn't come back after being killed off. Are you absolutely sure,
from running 'ps' and checking process IDs that they didn't exist in
any state in the process table? Are the process perhaps still there
and haven't actually shutdown properly?
The message:
[Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting process
'XXusername-omittedXX'.
indicates there certainly wasn't a clean shutdown, possibly because of
a threading dead lock or hung request thread. However, when you get
that message there is a C level call to exit() immediately after so
the process should die.
Technically calling exit() doesn't absolutely guarantee shutdown as a
C level atexit() handler could block but neither Apache or mod_wsgi
should be setting up any of those.
The problem with maximum-requests and inactivity-timeout is that they
are a self initiated restart and a graceful one at that. There is
therefore no absolute failsafe if exit() doesn't work as there would
if it was an Apache restart. In the case of an Apache restart, the
Apache parent process will send a SIGKILL to the child processes if
they haven't exited after 3 seconds. For a self initiated restart, the
parent process doesn't know it is restarting so if something prevents
the process from properly dying, then it will not be notified and
replace the process.
In summary, validate for sure whether the daemon process exist still
or not in the process table and in what state.
If they do still exist, try and use gdb as described in:
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Debugging_Crashes_With_GDB
to work out where they are blocked.
That all said, don't recollect seeing this specific issue with
maximum-requests, have seen similar outcome of processes not handling
requests in the past. Generally this was always because of application
level code blocking exhausting threads. There were some unexplained
cases, but this seemed to vanish with no changes to mod_wsgi. There
was suspicion though at the time that for some reason the distro
Apache was causing problems. From memory this was for Apache in patch
version range around 15-17 of 2.2 and you are using 16. Is there a
reason why you can't upgrade to a newer Apache version than what you
are using?
Graham
On 24 September 2012 10:19, Greg Lowe <
greg...@gmail.com> wrote:
> I've recently run into some stability issues with three django sites that we
> host. These are running on Apache and Mod_wsgi, and are low traffic sites.
> I’m more concerned with reliability than maximising throughput.
>
> I’ve included a summary of what has been observed so far below. I would
> greatly appreciate any advice on how to diagnose and/or fix this problem.
>
> The sites are configured to use daemon processes with a maximum-requests
> setting. Apache is running in prefork mode. i.e. WSGIDaemonProcess
> usernamehere user=usernamehere group=usernamehere processes=2 threads=25
> maximum-requests=100 inactivity-timeout=60 display-name=%{GROUP}
>
> We’ve experienced a number of outages where both of the daemon processes
> disappear, and are not recreated by mod_wsgi. Restarting Apache causes them
> to be restarted correctly.
>
> Occasionally I’ve seen the sites being served from only a single daemon
> process although it’s configured to use two. An apache restart fixes this.
>
> Our server admin has managed to reproduce the problem on our test
> environment by running apache bench at around 100 requests per second for
> ~70k requests. (He’s also done this with INFO level logging turned on - see
> the log snippets at the end of the email.)
>
> These are the typical errors seen in the error log from an outage:
>
> Many of these: Script timed out before returning headers: django.wsgi
>
> Many of these: Premature end of script headers: django.wsgi
>
> Occasionally one of these: (4)Interrupted system call: mod_wsgi (pid=19990):
> Unable to connect to WSGI daemon process 'usernamehere’' on
> '/var/run/apache2/wsgi.25099.19.3.sock' after multiple attempts.
>
> Occasionally one of these: (2)No such file or directory: mod_wsgi
> (pid=10877): Unable to connect to WSGI daemon process 'usernamehere' on
> '/var/run/apache2/wsgi.5194.16.3.sock' after multiple attempts.
>
>
> One of the sites has been running for about a year, the others only a few
> months. These problems first started a few weeks ago.
>
> The sites were migrated from a server running an older version of Debian a
> little over a month ago. There was a few weeks in between the migration and
> the problems appearing. So I’m fairly confident that we can attribute these
> problems to the newer version of mod_wsgi, but due to the timing I can’t be
> certain.
>
> Old server versions:
>
> Debian: Lenny
>
> Apache: apache2 2.2.9-10+lenny8, apache-mpm-prefork 2.2.9-10+lenny8
>
> mod_wsgi: libapache2-mod-wsgi 2.5-1~lenny1
>
> Django: 1.3.1
>
>
> Current server versions:
>
> Debian: Squeeze
>
> Apache: apache2 2.2.16-6+squeeze7, apache2-mpm-prefork 2.2.16-6+squeeze7
>
> Mod_wsgi: libapache2-mod-wsgi 3.3-2
>
> Django: 1.3.1
>
>
> Our system admin has reproduced the problem on our test environment with
> INFO level logging turned on. Here are his comments:
>
> I set the apache log level for the test site and ab'd it to death.
>
> The pertinent log entries are:
>
> [Fri Sep 21 09:10:22 2012] [info] Initial (No.1) HTTPS request received
> for child 70 (server XXXXXomittedXXXXXX:443)
> [Fri Sep 21 09:10:22 2012] [info] mod_wsgi (pid=3845): Maximum requests
> reached 'XXusername-omittedXX'.
> [Fri Sep 21 09:10:22 2012] [info] mod_wsgi (pid=3845): Shutdown
> requested 'XXusername-omittedXX'.
> [Fri Sep 21 09:10:22 2012] [info] Initial (No.1) HTTPS request received
> for child 27 (server XXXXXomittedXXXXXX:443)
> ....
> [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request received
> for child 28 (server XXXXXomittedXXXXXX:443)
> [Fri Sep 21 09:10:23 2012] [info] mod_wsgi (pid=3845): Aborting process
> 'XXusername-omittedXX'.
> [Fri Sep 21 09:10:23 2012] [error] [client 192.168.146.19] Premature end
> of script headers: django.wsgi
> [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request received
> for child 85 (server XXXXXomittedXXXXXX:443)
> ....
> [Fri Sep 21 09:10:23 2012] [info] Initial (No.1) HTTPS request received
> for child 46 (server XXXXXomittedXXXXXX:443)
> [Fri Sep 21 09:15:20 2012] [error] [client 192.168.146.19] Script timed
> out before returning headers: django.wsgi
> [Fri Sep 21 09:15:20 2012] [error] [client 192.168.146.19] Script timed
> out before returning headers: django.wsgi
>
> So it seems that there is an unclean shutdown after wsgi's max requests
> count is reached.
>
> A normal shutdown looks like this:
>
> [Fri Sep 21 09:10:17 2012] [info] Initial (No.1) HTTPS request received
> for child 73 (server XXXXXomittedXXXXXX:443)
> [Fri Sep 21 09:10:17 2012] [info] mod_wsgi (pid=3691): Maximum requests
> reached 'XXusername-omittedXX'.
> [Fri Sep 21 09:10:17 2012] [info] mod_wsgi (pid=3691): Shutdown
> requested 'XXusername-omittedXX'.
> [Fri Sep 21 09:10:17 2012] [info] Initial (No.1) HTTPS request received
> for child 79 (server XXXXXomittedXXXXXX:443)
> ....
> [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request received
> for child 27 (server XXXXXomittedXXXXXX:443)
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Stopping process
> 'XXusername-omittedXX'.
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Destroying
> interpreters.
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Destroy
> interpreter 'XXXXXomittedXXXXXX|'.
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Cleanup
> interpreter ''.
> [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request received
> for child 61 (server XXXXXomittedXXXXXX:443)
> ....
>
> [Fri Sep 21 09:10:18 2012] [info] Initial (No.1) HTTPS request received
> for child 19 (server XXXXXomittedXXXXXX:443)
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Terminating Python.
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3691): Python has shutdown.
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3845): Attach
> interpreter ''.
> [Fri Sep 21 09:10:18 2012] [info] mod_wsgi (pid=3845): Create
> interpreter 'XXXXXomittedXXXXXX|'.
> [Fri Sep 21 09:10:18 2012] [info] [client 192.168.146.19] mod_wsgi
> (pid=3845, process='XXusername-omittedXX',
> application='XXXXXomittedXXXXXX|'): Loading WSGI script
> '/home/XXusername-omittedXX/scripts/django.wsgi'.
> [Fri Sep 21 09:10:19 2012] [info] Initial (No.1) HTTPS request received
> for child 85 (server XXXXXomittedXXXXXX:443)
>
>
> Thanks for making all the way through this long post ;)
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To view this discussion on the web visit
>
https://groups.google.com/d/msg/modwsgi/-/NSg2A39f7DYJ.
> To post to this group, send email to
mod...@googlegroups.com.
> To unsubscribe from this group, send email to
>
modwsgi+u...@googlegroups.com.
> For more options, visit this group at
>
http://groups.google.com/group/modwsgi?hl=en.