First up, go watch:
http://lanyrd.com/2012/pycon/spcdg/
http://lanyrd.com/2012/pycon-au/swkdq/
as it talks a bit about these issues.
So, what one can do depends on how you are using mod_wsgi. Embedded
mode or daemon mode?
With embedded mode not too much you can do just within
Apache/mod_wsgi, because the connection gets queued in the socket
listener queue for Apache itself for which there isn't a great deal of
visibility. So Apache doesn't know how long it may have been sitting
in the listener socket backlog queue before it gets it.
This arises because Apache will only accept a request when it actually
has the resources to handle it. Thus when all processes/threads busy
and the request will backlog in that socket listener queue.
If you are using daemon mode, you can do a little bit better because
of the fact that web application processes are behind Apache. Thus you
can time stamp a request when Apache does accept it and look at the
different between that and the current time when the application in
the daemon process actually gets to handle it.
What this is therefore showing is where the daemon mode processes get
overloaded, although does require Apache worker processes still having
enough threads to keep accepting requests and let them back up in the
worker processes rather than the listener queue, otherwise time stamp
not applied.
In mod_wsgi 3.4 (just release recently), it will automatically time
stamp all requests and make that available in the WSGI request environ
dictionary as 'mod_wsgi.queue_start'. Doing:
queue_start = int(value) / 1000000.0
will give you a time stamp in seconds that can then be compared to
time.time() to work out how much time occurred between Apache
accepting the request and the web application getting passed the
request. You could write a little middleware that monitors that.
Beyond queueing time, the next measure one can use is thread
utilisation. This is a measure of how much of the capacity of the WSGI
server is being used. In effect it is time spent serving requests
divided by time it could have spent serving requests based on
available number of processes/threads.
The value of thread utilisation is that once you head towards 100% and
stay at high levels, you know you are starting to run out of capacity.
In combination with queueing time, as thread utilisation increases,
then queueing time because of backlog will also increase.
Measuring thread utilisation is interesting and a bit trick to do in
pure Python with doing lots of thread locking which could impact
performance. Using a C extension one can do it with acceptable
overhead.
Important to realise though is that these sorts of measures should
only be seen as one part of what you should be monitoring. The need to
fiddle things to increase capacity actually means you most likely are
doing a poor job at making your application perform better.
You know you are doing the right thing when these measures prove that
you can safely drop processes/threads and not the other way around.
Anyway, the two talks I link to talk a bit about these issues and give examples.
Although you can easily do queuing time yourself, because thread
utilisation is tricky and because all this stuff is better seen as one
part of an overall monitoring strategy, it is going to much easier
were you to just use New Relic, which does all this stuff and more.
Queueing time is visible in the New Relic Lite plan if you don't want
to pay for New Relic after its trial period ends. The thread
utilisation and resulting capacity analysis reporting based on it are
though part of the paid level, so once you drop to Lite you don't get
access to it anymore. You still have the trial period though to get an
answer to your question.
The normal trial period for New Relic is 14 days. Use this URL at the
moment and you can get an extended trial.
http://newrelic.com/30
So having monitoring is best way of trying to work out what is going
on and then using the result of that to tune your configuration.
Another area one can investigate, especially if using embedded mode,
is if you have totally screwed up your MPM settings, or were using the
defaults Apache ships with which aren't very good for Python,
especially if using prefork MPM.
I have been doing some work in that area as well as far as writing
some scripts which will validate the Apache configuration and produce
some charts which show how it behaves under certain simulated
conditions. These tell you if you have stuffed it up and are going to
cause Apache to perform badly through basic process management.
I have this stuff working for worker MPM, but not prefork MPM yet. I
am not sure I want to make it available just yet though.
Enough words, a couple of images to wet your appetite.
https://dl.dropbox.com/u/22571016/CapacityAnalysisExample.jpg
This one shows the capacity analysis page in New Relic giving how much
your server is being used.
https://skitch.com/grahamdumpleton/e1dqj/figure-1
This shows evaluation of worker MPM settings for Apache shipped as
source code. Not ideal for Apache, but can still be okay.
https://skitch.com/grahamdumpleton/e1dqa/figure-1
This shows evaluation of poorly chosen MPM settings done by user.
Too many processes were created initially which were immediately
killed because excess to requirements. As number of concurrent
requests increased, the incorrect configuration meant Apache would
swap between thinking it needed more processes and thinking it had too
many, so potential existed for it to continually kill off and then
restart processes.
You can all mull over those images.
Since I am about to go on holidays and aren't going to be online much,
my best suggestion is just to try New Relic and find that capacity
analysis report.
Also keep an eye out on the New Relic blog as there will be a post
going up in the next week sometime about the Capacity Analysis report.
It also includes additional information about using it to tune one
aspect of mod_wsgi daemon mode.
Enjoy the carrots for now. This exploration of MPM settings and
evaluating its effectiveness will be something that intend to talk
about at next PyCon US if talk gets accepted.
Graham