Apache/mod_wsgi performance monitoring plugin.

847 views
Skip to first unread message

Graham Dumpleton

unread,
Jun 8, 2014, 8:28:19 AM6/8/14
to mod...@googlegroups.com
When I recently announced version 4.1.0 of mod_wsgi here on the mailing list, I said I would try and blog about what was new in it, but I have been having too much fun working on even more new things.

The latest new thing comes in the mod_wsgi version 4.2.0 release.

This feature is something I have wanted to do for many years, even back before I started working with New Relic, but my general burnt out state of mind around mod_wsgi meant nothing ever happened and mod_wsgi instead stagnated.

Anyway, I am beyond that now and finally getting on to doing it. At least now I have access to an easy to use backend to visualise all this data I am collecting, something which would have been non trivial back those many years ago.

So what mod_wsgi version 4.2.0 introduces is the first phase of a mechanism to more closely monitor the performance of Apache and mod_wsgi itself. This is specifically intended to help address the issue of how best to tune the Apache and mod_wsgi processes/threads settings.

Right now what I have done only covers Apache as a whole. This in itself is extremely revealing, but I will be adding more detail around mod_wsgi in the next phase.

So you can understand what what the result of this new capability is, I present the three dashboards of information that can currently be viewed about what your Apache instance is doing.

This first dashboard is an overview dashboard. The main things you can obviously see here are throughput and response times for all requests hitting Apache. This includes any requests for static files and isn't just for requests handled by mod_wsgi.

For response times, it isn't just limited to average response time as shown here either. One can drill down to more information about requests.

In particular, one can see percentiles for requests. So not only average response time, but also 95th and 99th percentiles. Add to that you can see the amount of data being transferred by Apache.

The dashboard I most like though is that which looks at what the Apache workers are up to, as well as the process they operate in and the server utilisation as a whole.

The workers in this case are the Apache threads in the Apache child processes which are initially accepting requests. In the case of mod_wsgi embedded mode, these are the same threads which would then in turn handle the request after it is handed off to be processed by mod_wsgi within the same process. If using daemon mode, they are the same threads that would then proxy the request through to the appropriate mod_wsgi daemon process group.

What the workers are doing at this level can be useful in helping to tune the Apache MPM settings and can also give guidance on setting up mod_wsgi daemon process groups as well. In the case of the mod_wsgi daemon process groups, the whole process would be aided somewhat though if you also happened to be using the New Relic Python agent for monitoring your actual web application. In the second phase of work on this new feature though, I will be surfacing more about what mod_wsgi itself is up to which should provide everything one needs to tune even the mod_wsgi daemon process groups, without needing too much extra.

As to the charts above, this is actually for a Python web application running in mod_wsgi daemon mode with an Apache instance using prefork MPM. If you ever watched my PyCon US talk from last year, you may remember me talking about the evils of process churn and how the Apache prefork MPM child processes are managed is detrimental to a Python web application running in embedded mode. At the time I illustrated that with charts derived from a simulation. This process churn chart above is from a real live Apache instance and it backs up what I showed through simulation would have been occurring. Although this Python web application was running in daemon mode and so wasn't overly affected by that due to how Apache was configured, if your Python web application was running in embedded mode under mod_wsgi, that process churn could be quite detrimental to your performance.

So lots of pretty pictures and statistics.

Now you may well have noticed the little New Relic logos on those charts. This is because this feature just happens to be using the New Relic Platform product (http://newrelic.com/platform) to capture the data and visualise it.

Before you run away and think that this is all useless to you then, as New Relic is a paid product, I want to make it very clear that that is not the case.

All of the above charts and the ability to monitor your Apache/mod_wsgi instance using New Relic Platform comes for free.

This is because the feature of New Relic I am using here is available on all free Lite accounts in New Relic. Thus you do not need to pay a cent. You do not even need to use the New Relic Python agent to monitor your actual Python web application, although since the Python agent web application monitoring also provides a wealth of information even on the free Lite accounts, you would be a bit silly not too. Same for the free New Relic server monitoring, why wouldn't you use it.

Sure I am biased because I also wrote the Python agent for New Relic and work there, but believe me when I say I wouldn't work on something like this if it wasn't going to be useful to mod_wsgi users. It was the main reason I wanted to go work for New Relic in the first place, it has unfortunately just taken me so long to bring my ideas about monitoring Apache and mod_wsgi itself to reality. :-)

Now if you had already experimented with mod_wsgi-express from version 4.1.0, and if you had already tried the New Relic integration provided with it, then this new plugin will be automatically enabled and the new dashboards will start appearing in the New Relic UI. All you need to do is upgrade mod_wsgi to version 4.2.0 or later from PyPi, ensuring that in doing that it also automatically installs the new mod_wsgi-metrics package.

If you are using New Relic already but using a manually installed version of mod_wsgi and aren't installing it from PyPi, then you will need to add some additional configuration to your Apache configuration files. The details of this can be found with the mod_wsgi-metrics package on PyPi at:


So, upgrade to mod_wsgi 4.2.0 or later, which you can get from github or PyPi:


Then install the mod_wsgi-metrics Python package and follow those instructions on the mod_wsgi-metrics PyPi page.

Hopefully I will get some feedback and at least a few examples from where people have used it that I can analyse in order to continue my quest for building up some general rules around Apache and mod_wsgi tuning.  :-)

Graham

  

Garito

unread,
Jun 8, 2014, 11:46:06 AM6/8/14
to mod...@googlegroups.com
I will not use an external service for that, nor new relic nor any other

A performance monitor is not only useful but necessary but link it to an external service is not a clever solution

Sorry, I respect you a lot (you are part of my programming dream team) but this is a huge mistake in my opinion

atomekk

unread,
Jun 8, 2014, 5:25:07 PM6/8/14
to mod...@googlegroups.com
Thats really nice feature ! Is there any chance it will be usable outside newrelic ?

Jason Garber

unread,
Jun 8, 2014, 11:50:11 PM6/8/14
to mod...@googlegroups.com

Hi Graham,

Can you comment on how this mechanism could provide data to a custom monitoring application and the plans for extending it to cover daemon mode process  information?

Thanks!

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To post to this group, send email to mod...@googlegroups.com.
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Graham Dumpleton

unread,
Jun 9, 2014, 7:59:00 AM6/9/14
to mod...@googlegroups.com
On 09/06/2014, at 1:50 PM, Jason Garber <ja...@gahooa.com> wrote:

Can you comment on how this mechanism could provide data to a custom monitoring application and the plans for extending it to cover daemon mode process  information?

What this feature relies on is an ability within mod_wsgi itself to provide a snapshot of what is called the Apache scoreboard.

This Apache scoreboard is a shared memory segment used by Apache to keep track of the status of each worker thread across all Apache processes. Apache itself uses this information to determine how busy the Apache child processes are that the worker threads run in and depending on the MPM settings will use the information to increase or decrease the number of child process running.

If mod_status is loaded, additional information about the number of requests handled by Apache is also kept within the scoreboard. Enable ExtendedStatus and even more information is tracked.

As mod_status itself provides by way of being able to expose an external URL such as /server-status, the plugin uses the data to create a picture of what Apache is doing.

The difference between what the plugin is doing and what can be done from an external monitoring system using the exposed URL, is that the plugin can poll on a regular 1 second interval without needing to do a web request which would itself be reflected in server traffic. By being able to poll more frequently it can build up a better picture, including using extra detail available directly from the scoreboard to grab sample data on actual requests and so generate response time averages and percentiles.

The short polling also allows the number of active child processes to be monitored closely and so be able to derive metrics for things such as process churn for the child processes, as well as more accurate metrics about server restarts and capacity utilisation.

The intention in monitoring mod_wsgi is for mod_wsgi to have its own form of scoreboard using shared memory which tracks a snapshot of what is going on across all processes, whether the WSGI application is running in embedded mode or daemon mode.

This would allow tracking of details such as response time measured within the WSGI application independent of front end time, the queueing time which is how long between when Apache accepted the request and the WSGI application got to handle it, plus separate measures of capacity utilisation for each mod_wsgi daemon process group. Other metrics which could be thrown into this might include queue depth for daemon processes, queue timeout rate, daemon connection failure rates etc.

If a module such as psutil were available, then possibly the plugin could also track and report for processes memory usage, CPU usage and the number of process context switches. In essence, anything I can think of that would help to supplement data on throughput and response times to work out whether changing processes/threads mix is actually having some form of positive effect.

So my plans for future work in trying to achieve all that are as follows.

1. Refactor the current plugin code, which doesn't have a clean separation between deriving the metrics and reporting them up to New Relic, such that there is a distinct layer between the two.

2. Implement the equivalent of a scoreboard for mod_wsgi itself in order to be able to accumulate the additional information required and enhance the metrics generation code and the current New Relic plugin to match.

3. Create an optionally enabled internal consumer of the metrics which would retain a working history of metrics for a period of 30 minutes, but only within the collecting process itself, this process being a dedicated daemon process group set up to collect the data. Part of this would involve a minimal REST API to retrieve raw metric data from the process in some way.

4. Create as a proof of concept an extension for Django Debug Toolbar which can query the historical data from the in memory cache using the REST API. I intend doing this purely though as an example to support a talk I will be giving at PyCon AU in August on how Python web application toolbars work. Part of the talk will be about the usefulness or applicability of debug toolbars to a production environment, and I can see this proof of concept helping me to illustrate some points about the problem of a debug toolbar being of use in a multi host deployment.

That is as much as I have planned at this point.

Things I have no intention of doing are the following.

1. Creating any plugin to report data to any other charting system such as Graphite.

2. Creating a database for long term persistence of data.

3. Creating any chart visualisation system of my own to view the metric data, beyond any minimal experiments I may do to support the Django Debug Toolbar experiment.

The reason I am not doing any of these is that they are outside of my area of expertise. I have never used tools such as Graphite. I am not a database person, nor am I a front end web developer or Javascript developer.

I well know from my work at New Relic how much time and effort needs to go into creating a professional production quality backed system for retaining and visualising metric data and even if I had the skills in those areas it would be an amazingly huge time suck which would totally dwarf any time I am even able to spend on progressing mod_wsgi itself.

Since my experience lies in the area of Apache, WSGI servers and instrumenting for and collecting metric data, I will keep to that area. Doing so is just the most practical thing I can do as that is where I will be most productive and can do the most.

Those areas are also the ones I am interested in and enjoy working in. I don't find database and front end web design to be that interesting and given that my impetus for doing any work these days is a personal requirement or because I enjoy the technical challenge in a specific problem, then I will as a result be staying well clear of those areas.

Personally I got no issue if others want to pursue those things I have no interest in and certainly the way I intend refactoring the code would allow anyone to develop their own plugins to get the metrics out and into some other system.

In saying that, please don't take this as me saying 'patches welcome' and otherwise buzz off. I hate the way that some Open Source projects will say that when they don't have time to do something themselves. Reality is that I am time poor and I simply need to focus my time in the best way I can.

If you are genuinely interested in trying to fill in those areas where I feel I can't do a good job or don't have the time, I will not stop you nor make it difficult and will actually be accommodating as I can and make it easier for you to get the data out and also advise on what would be the best way to do something.

What I simply am not in a position to do is lead such an initiative. My own priorities and interest will always take precedence and I have come to learn that I must do that if I am to avoid become burnt out again in respect of the work I do on mod_wsgi. So is due to a measure of self preservation that I take this stance.

Hope that all makes sense and gives you a better idea of where I am heading and why I am restricting myself to that.

Graham

Garito

unread,
Jun 9, 2014, 10:20:39 AM6/9/14
to mod...@googlegroups.com
I have some experience with MongoDB and jqPlot, for instance...

I will be happy to help to create an extension to unlink this super awesome/needed library from a particular service

It has to be modular and interchangeable

Anyone interested?


On Sunday, June 8, 2014 2:28:19 PM UTC+2, Graham Dumpleton wrote:

Rory Campbell-Lange

unread,
Jun 9, 2014, 11:17:40 AM6/9/14
to mod...@googlegroups.com
On 09/06/14, Graham Dumpleton (graham.d...@gmail.com) wrote:
> On 09/06/2014, at 1:50 PM, Jason Garber <ja...@gahooa.com> wrote:
> > Can you comment on how this mechanism could provide data to a custom
> > monitoring application and the plans for extending it to cover
> > daemon mode process information?
...

> The short polling also allows the number of active child processes to
> be monitored closely and so be able to derive metrics for things such
> as process churn for the child processes, as well as more accurate
> metrics about server restarts and capacity utilisation.

From my point of view an internal metric consumer with a minimal REST
API would be invaluable as it is difficult for me, presently, to monitor
lots of WSGI processes on each server other than bundling them rather
inelegantly into WSGIProcessGroups by application type.

...

> Personally I got no issue if others want to pursue those things I have
> no interest in and certainly the way I intend refactoring the code
> would allow anyone to develop their own plugins to get the metrics out
> and into some other system.

This sounds terrific. If the suggested minimal REST API is provided
people can write any number of plugins/storage systems using that API.

Thanks for flighting this great idea.

Rory

--
Rory Campbell-Lange

Jason Garber

unread,
Jun 9, 2014, 1:57:23 PM6/9/14
to mod...@googlegroups.com
Hi Graham,

Thank you for the detailed response.  I understand and fully support your decision to focus on core, not UI or DB stuff.  I think the priority here is to ensure that the data points are exposed in a layer or way that applications can do what they want with them, including (and maybe especially!) a New Relic plugin.  

Decoupling would provide a lot of tangible benefits.  I think we should make the configuration straightforward.  I am thinking out loud here:

WSGIDaemonProcess app1 threads=10 processes=2

WSGIMonitorDaemonProcess app1 handler=/path/to/mystats.py stat_a=10 stat_b=20 stat_c=1

# The above instructs mod_wsgi to create a process and thread, load /path/to/mystats.py, call the setup() function, and report stat_a every 10 seconds, stat_b every 20 seconds and stat_c every 1 second.  I would think the stats should be grouped into useful groupings so you don't have to specify each one, but could if you wanted to.

/path/to/mystats.py:

def setup(config):
  # setup environment, spawn threads, whatever you want to be.
  # return the stats collector callback.

def onstat(name, value, interval):
  # This function will be called every time there is a statistic to report
  # It would likely queue them up in memory for a number of seconds and then make a webservice call


Please take the above just as a concept, not as a proposal.  What are your thoughts on creating a pythonic interface to the stats scoreboard?

Thanks!
Jason



--

Graham Dumpleton

unread,
Jun 9, 2014, 11:56:00 PM6/9/14
to mod...@googlegroups.com
On 10/06/2014, at 3:57 AM, Jason Garber <ja...@gahooa.com> wrote:

Hi Graham,

Thank you for the detailed response.  I understand and fully support your decision to focus on core, not UI or DB stuff.  I think the priority here is to ensure that the data points are exposed in a layer or way that applications can do what they want with them, including (and maybe especially!) a New Relic plugin.  

Decoupling would provide a lot of tangible benefits.  I think we should make the configuration straightforward.  I am thinking out loud here:

WSGIDaemonProcess app1 threads=10 processes=2

WSGIMonitorDaemonProcess app1 handler=/path/to/mystats.py stat_a=10 stat_b=20 stat_c=1

# The above instructs mod_wsgi to create a process and thread, load /path/to/mystats.py, call the setup() function, and report stat_a every 10 seconds, stat_b every 20 seconds and stat_c every 1 second.  I would think the stats should be grouped into useful groupings so you don't have to specify each one, but could if you wanted to.

/path/to/mystats.py:

def setup(config):
  # setup environment, spawn threads, whatever you want to be.
  # return the stats collector callback.

def onstat(name, value, interval):
  # This function will be called every time there is a statistic to report
  # It would likely queue them up in memory for a number of seconds and then make a webservice call


Please take the above just as a concept, not as a proposal.  What are your thoughts on creating a pythonic interface to the stats scoreboard?

I specifically want to avoid as much as possible have the mechanism be intrinsically linked to, or bundled within the mod_wsgi package itself. This is why what has been done so far is in a separate mod_wsgi-metrics package. By being separate you aren't required to be pip installing the mod_wsgi package if using traditional way of installing mod_wsgi or using a Linux distro package where they pinkly cannot, or will not use the pip path. The core mod_wsgi module therefore simply needs to provide access to the raw scoreboard data and nothing else as further processing of that will be done in the separate package.

In that vein I want to avoid having any special directives in Apache which would be used to configure it. If any ability is added to configure stuff it should be generic as that has been a source of problems in the past anyway.

This does mean there needs to be a bit of boiler plate Apache configuration to set it up, but you can't avoid that anyway due to the need to ensure that mod_status is loaded and ExtendedStatus enabled.

So right now what you need is:

   # Load mod_status and enabled full stats collection.

    LoadModule status_module modules/mod_status.so
    ExtendedStatus On

    # Define a dummy daemon process to run metrics service.
    # Need to ensure that turn on access to metrics as is off by
    # default for potential security reasons.

    WSGIDaemonProcess metrics display-name=%{GROUP} \
        processes=1 threads=1 server-metrics=On

    # Ensure that the Python script which starts the service is
    # is being loaded on process start. The service will be started
    # as side effect of loading the script.

    WSGIImportScript /some/path/server-metrics.py \
        process-group=metrics application-group=%{GLOBAL}

So a couple of things that could be done here.

The first is that WSGIImportScript script could allow a module path to be specified instead of a file name path, something which have thought about before but never done.

This might then allow one to do:

    WSGIImportModule mod_wsgi.metrics.service \
        process-group=metrics application-group=%{GLOBAL}

avoiding the need for you to create a separate script file just to load and start the service.

The next thing is configuration.

The SetEnv directive only works to set the environ dictionary on a per request basis and not for a whole process. This has been a source of problems for ever as people expect it to set the process environment variable.

Maybe then one could instead have:

    WSGIApplicationSetting name value process-group=metrics application-group=%{GLOBAL}

The setting would then be available in the designated Python sub interpreter by accessing:

    mod_wsgi.settings

Any WSGI script file or module/package could then access that to look for settings passed down from the Apache configuration.

This would not set environment variables in processes as in general I still feel that providing such an ability is not an entirely good idea. If I got over my concerns with that, then we could have parallel directive of:

    WSGIApplicationEnviron name value process-group=metrics application-group=%{GLOBAL}

With that there would not need to be a special directive that can take special arguments specific to the service and it would also provide a general mechanism which is useful for other things as well. For example:

    WSGIDaemonProcess django
    WSGIScriptAlias / /some/path/wsgi.py process-group=django application-group=%{GLOBAL}
    WSGIApplicationEnviron DJANGO_SETTINGS_MODULE mysite.settings process-group=django application-group=%{GLOBAL}

It is a slight pain to have to specify target process and sub interpreter all the time if you have multiple settings or environment variables.

Another option if you think that a lot of settings/environ might be the norm would be to also allow:

    <WSGIApplicationEnviron process-group=django application-group=%{GLOBAL}>
    DJANGO_SETTINGS_MODULE = mysite.settings
    </WSGIApplicationEnviron>

That gets a bit tricker as have to parse multiple lines and I haven't ever implemented a container directive in Apache before. It may though look better to a user.

Start to fiddle with container directives and you could start to allow lots of horrible stuff which may or may not be a good idea.

    <WSGIExecuteScript process-group=django application-group=%{GLOBAL}>

    import logging

    logging.basicConfig(level=logging.INFO,
        format='%(name)s (pid=%(process)d, level=%(levelname)s): %(message)s')

    from mod_wsgi.metrics.newrelic import Agent

    config_file = '/some/path/newrelic.ini'

    agent = Agent(config_file=config_file)
    agent.start()

    </WSGIExecuteScript>

That is, start allowing Python script snippets in the Apache configuration file itself.

Anyway, as to your specific example of:

    stat_a=10 stat_b=20 stat_c=1

I want to make clear one thing about how metrics collection needs to work which systems like statsd do not get right.

The best way of handling metrics generation is firstly not as a pull type system. That is, having a REST API to pull metrics at regular time periods does not work well.

This is because you can loose data when a process is being shutdown. You also have bad alignment between the reporting period enforced by the external system and when the process started.

The preferred way is therefore to push data from the process into a separate service, or if you are going to have the metrics collector write direct to the database, into the database. The code in the process can be setup to do a final push of data on process shutdown even if haven't reached the time one would push out the data.

You may now be thinking that statsd uses a push method and so it could be used. Wrong again. Statsd is still the wrong sort of model.

This is because statsd only accumulates individual data points sent from the monitored process. There is no separate meta data which allows a process to itself aggregate data over a specific time period and report on the time period as a whole as one data point. Such meta data would also carry information about the process and host. Initial data on a first push from a new process could also contain environment information as well such as number of CPU cores etc.

Statsd is quite poor in this respect as it doesn't handle properly this concept of a data harvest period, or association with a data source or other meta data. Only the latter of association with a host/process can be done in any way, but that entails embedding in the metric name the host name and process ID, so you get an explosion of metrics which the visualisation system needs to be deal with in some way and be able to aggregate across.

The preferred way is simply to have the monitored process itself collect data internally and report the aggregated data on a one minute period from the time the process starts. You thus end up with a bucket of data which is for a known duration, host and process ID. All up this makes it somewhat easier to deal with later and results in things like rate metrics especially being more accurate as the bucketing defines the period rather than some externally applied arbitrary time window which may be longer than the time period that the data was actually collected from a process for.

In part the reason that statsd has a small 15 second windowed bucketing strategy is to cover over that the data it collects has lost any association with the true original time period that the data may have been generated over in the original monitored process.

As we go along I can guide on what I believe works best on this, but be prepared to step back from previous notions you may have about how metrics collection should be done based on what statsd does. I looked quite a lot at statsd leading up to last PyCon as part of preparation on a workshop about metrics collection. I don't know the history of statsd, but to me it looks like they started out with one simple metric type and when that didn't handle other cases they started bolting on further abstractions. It started to then become a collection of disjoint ways of measuring things where as they should have gone back and started over with one base way of doing things which could accommodate multiple abstraction types. I am not therefore much of a fan of statsd. It is something that people probably came to accept as to do it but there are certainly better ways.

I will have to see if I can get an okay on releasing the slides I did for the workshop at PyCon and where I can post them and you can see what I said in that as far as concepts around metrics collection.

Finally, it makes me quite happy to see people interested in this and perhaps considering helping in some way. The core of mod_wsgi has always been such a dense area to get into that mod_wsgi has never attracted any developers besides myself. Doing stuff at the periphery like this metrics stuff and keeping it to Python code presents a much lower barrier for people getting involved and helping out. So thanks. :-)

Graham

Carlos Abrantes

unread,
Nov 19, 2019, 4:31:16 PM11/19/19
to modwsgi
Hi all,

Is this status the current state of art? or this evolved? is there a plugin to retrieve metrics? or documentation on how to create one, for example to integrate with collectd?

Thanks, 

Graham Dumpleton

unread,
Nov 19, 2019, 4:35:12 PM11/19/19
to mod...@googlegroups.com
Since this is a very very old discussion you are commenting on, can you perhaps start over and explain what you are after?

For application performance monitoring with Python you have three services you can use:

New Relic
DataDog
Elastic APM

They can have free tiers or Open Source variants.

If you want to roll your own, more recent mod_wsgi has hooks for pulling out raw events that can be used to drive metrics collection. See:

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.

Carlos Abrantes

unread,
Dec 12, 2019, 6:57:19 PM12/12/19
to modwsgi
Hi Graham ,

Sorry for this delayed response, somehow i didn't get the notification in my inbox.

My idea was if it there was available something similar to apache server status (mod_status i m not wrong), but for the mod_wsgi.
Something that i could make some "get" to retrieve metrics like workers per state (total, free, busy(drilldown state if possible)), number of requests per unit of time, response time per unit of time, etc ,etc

I will give a look at your suggestions.

Thanks,

Reply all
Reply to author
Forward
0 new messages