What this feature relies on is an ability within mod_wsgi itself to provide a snapshot of what is called the Apache scoreboard.
This Apache scoreboard is a shared memory segment used by Apache to keep track of the status of each worker thread across all Apache processes. Apache itself uses this information to determine how busy the Apache child processes are that the worker threads run in and depending on the MPM settings will use the information to increase or decrease the number of child process running.
If mod_status is loaded, additional information about the number of requests handled by Apache is also kept within the scoreboard. Enable ExtendedStatus and even more information is tracked.
As mod_status itself provides by way of being able to expose an external URL such as /server-status, the plugin uses the data to create a picture of what Apache is doing.
The difference between what the plugin is doing and what can be done from an external monitoring system using the exposed URL, is that the plugin can poll on a regular 1 second interval without needing to do a web request which would itself be reflected in server traffic. By being able to poll more frequently it can build up a better picture, including using extra detail available directly from the scoreboard to grab sample data on actual requests and so generate response time averages and percentiles.
The short polling also allows the number of active child processes to be monitored closely and so be able to derive metrics for things such as process churn for the child processes, as well as more accurate metrics about server restarts and capacity utilisation.
The intention in monitoring mod_wsgi is for mod_wsgi to have its own form of scoreboard using shared memory which tracks a snapshot of what is going on across all processes, whether the WSGI application is running in embedded mode or daemon mode.
This would allow tracking of details such as response time measured within the WSGI application independent of front end time, the queueing time which is how long between when Apache accepted the request and the WSGI application got to handle it, plus separate measures of capacity utilisation for each mod_wsgi daemon process group. Other metrics which could be thrown into this might include queue depth for daemon processes, queue timeout rate, daemon connection failure rates etc.
If a module such as psutil were available, then possibly the plugin could also track and report for processes memory usage, CPU usage and the number of process context switches. In essence, anything I can think of that would help to supplement data on throughput and response times to work out whether changing processes/threads mix is actually having some form of positive effect.
So my plans for future work in trying to achieve all that are as follows.
1. Refactor the current plugin code, which doesn't have a clean separation between deriving the metrics and reporting them up to New Relic, such that there is a distinct layer between the two.
2. Implement the equivalent of a scoreboard for mod_wsgi itself in order to be able to accumulate the additional information required and enhance the metrics generation code and the current New Relic plugin to match.
3. Create an optionally enabled internal consumer of the metrics which would retain a working history of metrics for a period of 30 minutes, but only within the collecting process itself, this process being a dedicated daemon process group set up to collect the data. Part of this would involve a minimal REST API to retrieve raw metric data from the process in some way.
4. Create as a proof of concept an extension for Django Debug Toolbar which can query the historical data from the in memory cache using the REST API. I intend doing this purely though as an example to support a talk I will be giving at PyCon AU in August on how Python web application toolbars work. Part of the talk will be about the usefulness or applicability of debug toolbars to a production environment, and I can see this proof of concept helping me to illustrate some points about the problem of a debug toolbar being of use in a multi host deployment.
That is as much as I have planned at this point.
Things I have no intention of doing are the following.
1. Creating any plugin to report data to any other charting system such as Graphite.
2. Creating a database for long term persistence of data.
3. Creating any chart visualisation system of my own to view the metric data, beyond any minimal experiments I may do to support the Django Debug Toolbar experiment.
The reason I am not doing any of these is that they are outside of my area of expertise. I have never used tools such as Graphite. I am not a database person, nor am I a front end web developer or Javascript developer.
I well know from my work at New Relic how much time and effort needs to go into creating a professional production quality backed system for retaining and visualising metric data and even if I had the skills in those areas it would be an amazingly huge time suck which would totally dwarf any time I am even able to spend on progressing mod_wsgi itself.
Since my experience lies in the area of Apache, WSGI servers and instrumenting for and collecting metric data, I will keep to that area. Doing so is just the most practical thing I can do as that is where I will be most productive and can do the most.
Those areas are also the ones I am interested in and enjoy working in. I don't find database and front end web design to be that interesting and given that my impetus for doing any work these days is a personal requirement or because I enjoy the technical challenge in a specific problem, then I will as a result be staying well clear of those areas.
Personally I got no issue if others want to pursue those things I have no interest in and certainly the way I intend refactoring the code would allow anyone to develop their own plugins to get the metrics out and into some other system.
In saying that, please don't take this as me saying 'patches welcome' and otherwise buzz off. I hate the way that some Open Source projects will say that when they don't have time to do something themselves. Reality is that I am time poor and I simply need to focus my time in the best way I can.
If you are genuinely interested in trying to fill in those areas where I feel I can't do a good job or don't have the time, I will not stop you nor make it difficult and will actually be accommodating as I can and make it easier for you to get the data out and also advise on what would be the best way to do something.
What I simply am not in a position to do is lead such an initiative. My own priorities and interest will always take precedence and I have come to learn that I must do that if I am to avoid become burnt out again in respect of the work I do on mod_wsgi. So is due to a measure of self preservation that I take this stance.
Hope that all makes sense and gives you a better idea of where I am heading and why I am restricting myself to that.
Graham