HTTP 503 Service Temporarily Unavailable response immediately following graceful restart

357 views
Skip to first unread message

Fraser

unread,
Nov 5, 2010, 7:43:29 AM11/5/10
to modwsgi
We are seeing an issue where performing a graceful restart of Apache
results in a 503 response for a small number of requests made
immediately following the restart.

I enabled debug-level logging within Apache, and see the following
entries relating to one of the WSGI daemons in question:

[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12455): Shutdown
requested 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12455): Stopping
process 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12455): Destroying
interpreters.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12455): Cleanup
interpreter ''.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12451): Shutdown
requested 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12451): Stopping
process 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12451): Destroying
interpreters.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12451): Cleanup
interpreter ''.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12455): Terminating
Python.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12455): Python has
shutdown.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12444): Shutdown
requested 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12444): Stopping
process 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12444): Destroying
interpreters.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12444): Cleanup
interpreter ''.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12451): Terminating
Python.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12451): Python has
shutdown.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Shutdown
requested 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Stopping
process 'myprocname'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Destroying
interpreters.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Destroy
interpreter 'hostname:port|'.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12444): Terminating
Python.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12444): Python has
shutdown.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Cleanup
interpreter ''.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Terminating
Python.
[Thu Nov 04 10:15:01 2010] [info] mod_wsgi (pid=12443): Python has
shutdown.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12740): Starting
process 'myprocname' with uid=48, gid=502 and threads=1.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12741): Starting
process 'myprocname' with uid=48, gid=502 and threads=1.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12742): Starting
process 'myprocname' with uid=48, gid=502 and threads=1.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12727): Starting
process 'myprocname' with uid=48, gid=502 and threads=1.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12727): Initializing
Python.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12727): Attach
interpreter ''.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12740): Initializing
Python.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12741): Initializing
Python.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12741): Attach
interpreter ''.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12742): Initializing
Python.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12742): Attach
interpreter ''.
[Thu Nov 04 10:15:02 2010] [info] mod_wsgi (pid=12740): Attach
interpreter ''.
[Thu Nov 04 10:15:06 2010] [error] [client XXX.XXX.XXX.XXX] (2)No such
file or directory: mod_wsgi (pid=12523): Unable to connect to WSGI
daemon process 'myprocname' on '/var/wsgi/wsgi.12667.55.10.sock' after
multiple attempts.
[Thu Nov 04 10:15:06 2010] [info] mod_wsgi (pid=12523): Destroying
interpreters.
[Thu Nov 04 10:15:06 2010] [info] mod_wsgi (pid=12523): Cleanup
interpreter ''.
[Thu Nov 04 10:15:06 2010] [info] mod_wsgi (pid=12523): Terminating
Python.
[Thu Nov 04 10:15:06 2010] [info] mod_wsgi (pid=12523): Python has
shutdown.

From these logs, it looks like is that the request is trying to
connect to the old WSGI socket, which has already been destroyed. This
doesn't seem correct/ideal behaviour.

I'm not very familiar with the inner workings of Apache or mod_wsgi,
but my guess is that this may be caused by one of the following
scenarios:

1. An "old" Apache process is trying to handle a request following a
graceful restart. My understanding is that a graceful restart should
result in new Apache process being created and the old ones only being
used to complete requests already in progress before being destroyed.

2. The request is being handled by a newly spawned Apache process, but
somehow it has a reference to the old (and now defunct) mod_wsgi
socket.

Has anyone else experienced this and come up with a workaround, or is
this a bug within Apache/mod_wsgi?

Thanks,

Fraser

P.S. Environment is RHEL5 with Apache 2.2.3 and mod_wsgi 3.3.

Graham Dumpleton

unread,
Nov 5, 2010, 8:11:15 AM11/5/10
to mod...@googlegroups.com

How long after the restart?

When doing a graceful restart, there may be a small window where an
existing Apache child process hasn't properly stopped accepting new
requests when waiting for existing requests to complete. This may be
made worse by keep alive connections with existing child server
process not being able to shutdown and with it accepting new request
on the existing connection. In that latter situation especially you
can technically see this issue.

Try turning off keep alive and see if the problem persists. Besides
that, not much you can do in Apache as Apache doesn't provide a way of
extending graceful restart mechanism to what it regards as 'other'
child processes. Putting nginx in front can help if you are concerned
about lack of keep alive as then it will handle it. Having nginx in
front brings a lot of other benefits as well.

BTW, what else is running on this server? Is it just the WSGI
application or is it handling static files, or non Python dynamic web
applications as well?

Graham

> My understanding is that a graceful restart should
> result in new Apache process being created and the old ones only being
> used to complete requests already in progress before being destroyed.
>
> 2. The request is being handled by a newly spawned Apache process, but
> somehow it has a reference to the old (and now defunct) mod_wsgi
> socket.
>
> Has anyone else experienced this and come up with a workaround, or is
> this a bug within Apache/mod_wsgi?
>
> Thanks,
>
> Fraser
>
> P.S. Environment is RHEL5 with Apache 2.2.3 and mod_wsgi 3.3.
>

> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
>
>

Graham Dumpleton

unread,
Nov 6, 2010, 2:33:57 AM11/6/10
to mod...@googlegroups.com
A few more comments about why it works this way anyway.

In Apache, it tracks the concept of a 'generation'. This tracks how
many times a restart/graceful restart is done.

On the basis that a restart is being, or can be done, when a
configuration change has been made in Apache configuration files and
that those changes may well be linked to some changes in the WSGI
application, mod_wsgi doesn't allow a request originally accepted by
an Apache child process to be proxied through to WSGI application
daemon process which was created against a newer iteration of the
configuration.

This is achieved by the UNIX listener socket which Apache child
processes connect to for a daemon process incorporation the generation
number in the name of the socket in the file system.

Thus, when a restart occurs, the old listener socket is removed and
new one created with new generation number. If an Apache child process
wasn't able to shutdown immediately because of a keep alive connection
and a subsequent request received after the restart but before keep
alive timeout expired went to WSGI application in daemon process, then
it isn't going to be able to connect and you will see the error that
you do.

So, the code is being very conservative in trying to ensure that
configuration/code bases are always in sync.

If this is a huge issue, as I said before you can stick nginx in front
which effectively disables keep alive in Apache as nginx proxying
doesn't support keep alive connections.

Another option if people think this is a big enough deal and you know
you aren't changing configuration/code when doing a restart and so
chance of things being out of sync is not a problem, is that I add a
directive WSGISocketGeneration or similar which defaults to On, but
could be set to Off so that generation number isn't used in socket
file name.

Comments? Anyone concerned about this?

Graham

Fraser Nevett

unread,
Nov 9, 2010, 1:12:29 PM11/9/10
to modwsgi
Graham,

I think you're right that keep alive is the culprit here. I should
have a chance to do some further testing on our environment later this
week and will report back on how it goes.

Thanks,

Fraser


On Nov 5, 12:11 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

Martijn Moeling

unread,
Nov 9, 2010, 1:54:39 PM11/9/10
to mod...@googlegroups.com
Hi!

Graham said in another post (the subject of that post is not relevant):

>> Putting nginx in front can help if you are concerned
>> about lack of keep alive as then it will handle it. Having nginx in
>> front brings a lot of other benefits as well.


I am curious about the "other benefits" part. I saw that nginx has uwsgi support, does this mean that it is an alternative? Or must I read it like putting nginx between the browser and apache?

Does ngingx help in any way if I want to have a "LAN" behind it with multiple front-end apache/mod_wsgi workers and one or more local connected database servers to implement a large system and load balance (important!) running multiple domains. I am trying to work at that part of the job too.....

Martijn

Verstuurd vanaf mijn iPad

Jason Garber

unread,
Nov 9, 2010, 4:50:40 PM11/9/10
to mod...@googlegroups.com
Hey Martijn,

I placed nginx in front of over 50 domains with quite a bit of traffic.  I love it because it is little, simple, fast, powerful, etc...

We use it as an SSL load balancer / proxy to (potentially) multiple backend apaches running a mix of PHP and WSGI applications.  And also, by deploying the web application source code to the load balancer box, we can tell Nginx to serve all /Static content for a given website directly, thus dramatically reducing the # of requests to apache.

I've seen nginx complete 12,000 static requests per second when load testing on LAN.  It's just plain fast.  And the fact that the SSL config is totally simple (and only one place to purchase and install certs) makes it that much better.

Hope this perspective helps a bit.

Jason Garber


Graham Dumpleton

unread,
Nov 11, 2010, 6:03:54 PM11/11/10
to mod...@googlegroups.com
Sorry, haven't had enough time to sit down and summaries potential
benefits into a wiki page. You can find one past discussion about this
at:

http://groups.google.com/group/modwsgi/browse_frm/thread/30752228efe8f8b9

Search for 'nginx proxy' in mod_wsgi mailing list on Google Groups and
you should fine more discussions about it.

Graham

Reply all
Reply to author
Forward
0 new messages