CPU-load problem with apache/mod_wsgi

708 views
Skip to first unread message

Lars Westermann

unread,
Mar 3, 2016, 8:52:05 AM3/3/16
to modwsgi
I have a webserver running apache (2.4.10) and mod_wsgi (3.4)

I have seen a cpu-load around 2.5 (on a 2 core HyperV-2012 machine) - and cpu-utilization well below 10%.
After searching for information on how to improve things, I changed from running mod_wsgi in embedded mode to
running it in daemon mode - in the hope that offloading the python code from apache would help.

Unfortunately this is not the case. Cpu-load increased to around 8, cpu-utilization was well over 10% - so clearly something is wrong!

I am using apache in prefork mode (yes, I know, but php5 module is not threadsafe...) - with this configuration:

<IfModule mpm_prefork_module>
        StartServers              5
        MinSpareServers           5
        MaxSpareServers          10
        MaxRequestWorkers       150
        MaxConnectionsPerChild    0
</IfModule>

At the global configuration level I have this:

WSGIDaemonProcess web-deploy processes=30 threads=5 display-name=wd-wsgi-daemon
WSGIProcessGroup web-deploy


Process tree (using htop) shows this:

  1  [||||||||||                                                  12.8%]     Tasks: 81, 289 thr; 1 running
  2  [||||||||                                                     9.3%]     Load average: 6.25 6.54 5.99
  Mem[||||||||||||||||||||||||||||||||||||||||||||||||||||||5110/9996MB]     Uptime: 72 days, 03:19:58
  Swp[|                                                       10/1021MB]

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
    1 root       20   0 90576  3288  2468 S  0.0  0.0  0:04.96 /sbin/init
46975 root       20   0  424M 20424 12792 S  0.0  0.2  0:02.06 ├─ /usr/sbin/apache2 -k start
51598 www-data   20   0  429M 24048 12744 S  0.0  0.2  0:12.50 │  ├─ /usr/sbin/apache2 -k start
51561 www-data   20   0  429M 24132 12744 S  0.0  0.2  0:12.29 │  ├─ /usr/sbin/apache2 -k start
47453 www-data   20   0  432M 27444 13568 S  0.0  0.3  0:40.11 │  ├─ /usr/sbin/apache2 -k start
47006 www-data   20   0 1414M  164M  7772 S  0.0  1.7  1:20.43 │  ├─ wd-wsgi-daemon    -k start
47214 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:14.50 │  │  ├─ wd-wsgi-daemon    -k start
47205 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:13.08 │  │  ├─ wd-wsgi-daemon    -k start
47204 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:12.73 │  │  ├─ wd-wsgi-daemon    -k start
47203 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:07.11 │  │  ├─ wd-wsgi-daemon    -k start
47202 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:11.95 │  │  ├─ wd-wsgi-daemon    -k start
47201 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:01.48 │  │  ├─ wd-wsgi-daemon    -k start
47200 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:00.00 │  │  ├─ wd-wsgi-daemon    -k start
26045 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:00.08 │  │  ├─ wd-wsgi-daemon    -k start
26043 www-data   20   0 1414M  164M  7772 S  0.0  1.7  0:00.08 │  │  └─ wd-wsgi-daemon    -k start
47005 www-data   20   0 1478M  147M  7108 S  0.0  1.5  1:19.79 │  ├─ wd-wsgi-daemon    -k start
47212 www-data   20   0 1478M  147M  7108 S  0.0  1.5  0:17.69 │  │  ├─ wd-wsgi-daemon    -k start
47211 www-data   20   0 1478M  147M  7108 S  0.0  1.5  0:09.00 │  │  ├─ wd-wsgi-daemon    -k start

Strangely there are 9 threads in each wsgi-daemon (I would expect 8 including the 3 used for process control),
ref: http://blog.dscpl.com.au/2014/02/use-of-threading-in-modwsgi-daemon-mode.html

WSGI daemons use quite a lot of memory - and I think we have too many wsgi-daemon processes. The reason for the many processes is that
many of the python services use other network services (PostgreSQL, LDAP, memcache), and I have read that the threads share a number
of common resources in this respect.

When using top, it looks like apache processes are using the cpu-resources, wsgi-daemon processes don't even show up in the top of the list.

Apache handles approx. 75 requests/second with the above shown cpu-load.

After reverting to the previous configuration (no wsgi-daemons), htop shows this:

  1  [||||||                                                       8.6%]     Tasks: 51, 36 thr; 1 running
  2  [|                                                            0.7%]     Load average: 1.87 2.20 3.28
  Mem[|||||||||||||||||||||||||||||||||||||||||||           1917/9996MB]     Uptime: 72 days, 03:59:56
  Swp[|                                                       11/1021MB]

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
39518 www-data   20   0 1008M  126M 14552 S  1.0  1.3  0:25.87 /usr/sbin/apache2 -k start
39405 www-data   20   0 1080M  129M 15280 S  1.0  1.3  0:24.94 /usr/sbin/apache2 -k start
39517 www-data   20   0 1068M  131M 14392 S  1.0  1.3  0:24.60 /usr/sbin/apache2 -k start
39560 www-data   20   0 1139M  131M 15116 S  1.0  1.3  0:26.76 /usr/sbin/apache2 -k start
39470 root       20   0 94320  4884  2812 R  0.0  0.0  0:14.37 htop
39514 www-data   20   0 1150M  141M 14392 S  0.0  1.4  0:21.83 /usr/sbin/apache2 -k start
39387 www-data   20   0 1032M  147M 17844 S  0.0  1.5  0:26.35 /usr/sbin/apache2 -k start
39468 www-data   20   0 1145M  135M 14400 S  0.0  1.4  0:25.96 /usr/sbin/apache2 -k start
39467 www-data   20   0 1019M  147M 16700 S  0.0  1.5  0:24.94 /usr/sbin/apache2 -k start
39466 www-data   20   0 1145M  144M 15688 S  0.0  1.5  0:24.90 /usr/sbin/apache2 -k start


Does anyone have a clue, or just a hint to what to do next?

Best Regards,
Lars

Graham Dumpleton

unread,
Mar 3, 2016, 5:59:29 PM3/3/16
to mod...@googlegroups.com
What number of requests/sec does your web site need to handle (as opposed to what you are getting)?

What is the average response times for requests?

What long running requests do you get? How long and how often?

Is the code your request handlers run primarily CPU or I/O bound?

What other Apache modules for other languages are you loading? Eg., mod_python, mod_php etc.

My initial impression is that you are creating way more daemon processes than you need. Creating too many can actually make things worse and be detrimental to memory usage and performance.

The resident size of the Apache child worker processes suggests you aren't disabling the creation of the Python interpreter in the Apache child processes.


You are also using a very old mod_wsgi version. More recent versions better control memory usage to work around some oddities around how Apache handles memory and buffering. It is not recommended to be using such an older mod_wsgi version. It isn’t worth digging to much further into this unless you are on recent version.

Can you possibly upgrade to newer mod_wsgi?

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To post to this group, send email to mod...@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Lars Westermann

unread,
Mar 4, 2016, 5:44:13 AM3/4/16
to modwsgi
Hi Graham

Thanks for your very prompt reply! :)

I have just investigated our most recent logfiles (in the old setup - embedded mode) - here are the figures:

Requests/second: Actual: 5-15 req/sec, Expected: Up to 50 req/sec
Response time: < 10ms: 50%, 10-100ms: 45%: >100ms: 5% (longest response times encountered today: 560ms)
Almost entirely wsgi requests - the long running requests are all http POST to a WSGI service.
This gives us a cpu-load of around 2.5 on a 2 core HyperV Ubuntu 12.04

Regarding CPU vs. I/O it will be a guess - 50/50, and the I/O is primarily network I/O to memcache (running on its own server),
ldap (running on two servers in master/slave configuration) and postgresql (running on two servers in r/w resp. r configuration),
meaning very little disk I/O from the python services.

The server also serves PHP5 and a few static files as the application consists mainly of a number of WSGI services,
some of which are using simplesamlphp for authentication purposes.
We don't use mod_python.
I should mention that we do have a fair amount of <Directory> configurations in the apache configuration (approx. 80) - will that have an impact on the request handling in apache?

I have read your blog article regarding disabling the python interpreter in the apache parent processes - and will include that in the next attempt.
I also saw one of your posts where you suggest a 1:1.5 ratio between apache threads and wsgi threads,
so maybe we should aim for 150 apache threads and 100 wsgi treads (10 procs and 10 threads each)?

We are looking into an upgrade of the server, but as Ubuntu 16.04 is not released yet we will try with 14.04, which includes mod_wsgi 4.3.0.
Would that be recent enough, or would it be wiser to go with the latest release (4.4.21)?
Do you know which version will go into 16.04?

Lars

Graham Dumpleton

unread,
Mar 4, 2016, 6:03:11 AM3/4/16
to mod...@googlegroups.com
Quick response.

Minimum mod_wsgi version would recommend is probably 4.4.12 (not a typo, I didn’t mean 4.4.21). That had an important memory usage related fix that could in some cases see Apache child worker processes memory use blow out even when using daemon mode.

That ratio of 1.5 related more to using mod_wsgi-express and running the one WSGI application. It wouldn’t apply where running PHP at the same time. It doesn’t matter if have more capacity in Apache child processes as will need it for your static files and PHP. Such a high ratio can become an issue with you get backlogging because WSGI application get overwhelmed. Recent mod_wsgi versions have various times you can set to deal with back logging issues and help recover more quickly by timing out requests before they hit the daemon processes.

Running 10 threads per process is not good when have a measure of CPU boundedness.

Making a guess based on your figures, I would have suggested 10 processes each with 3 threads.

If you ignore the 5% outliers initially, and assume that 100% request run at 100ms. Then with 30 threads across all processes. That is still a full capacity of 300 requests/sec.

Obviously you never want to run at 100% capacity as the GIL would kill you before you actually managed to get to the theoretical capacity. On thinking that should never go over 30% capacity utilisation, that is still 100 requests/sec. So if you are looking at only 5% above 100ms, that is probably enough headroom for the longer running requests, so long as long running because they are I/O bound.

Important thing if testing with that configuration, don’t just hammer the site so is overloaded. Aim for your 50 requests/sec using your load testing tool. Then see what you get with CPU usage for each process and let me know.

I will read through your message again on the weekend and see if there is anything else want to mention. Is late now for me and want to sleep.

Graham

Lars Westermann

unread,
Mar 15, 2016, 9:36:22 AM3/15/16
to modwsgi
Hi Graham

Thank you for your very valuable comments and suggestions. They are MUCH appreciated!

We ended up building a new server (Ubuntu 14.04) with apache 2.4.7 and custom built mod_wsgi 4.4.21 (along with php-fpm).
For our backend we tuned the ldap servers (increased the cache size) and put pgbouncer in front of our postgresql databases.

By using your suggested webserver configuration (10 processes, 3 threads each) we are able to handle much more than we need to handle -
in fact our performance went up by a factor 40 or more. Now the cpu-load-1 in busy hour averages around 0.1-0.2, where it
previously would be around 2.5. Now cpu-load is more in line with cpu-utilization which we were aiming at. And we are
handling 20-30 reqs/sec 50% of the time, and 30-50 reqs/sec 20% of the time - so that looks really great. And the cpu-usage
has switched from being spent in apache to mod_wsgi. We haven't seen cpu-usage above 20%

Best Regards,
Lars

Graham Dumpleton

unread,
Mar 15, 2016, 7:40:09 PM3/15/16
to mod...@googlegroups.com
That sounds like a great start.

To start refining things more gets a lot harder unfortunately.

I was working on some inbuilt performance metrics monitoring for mod_wsgi but it is still stuck out on a branch and I haven’t gone back and merged the changes to main working branch. If we had that then we could start to look more closely at per request CPU utilisation and impacts of the GIL by comparing that to process wide CPU utilisation. Other metrics those changes provide such as capacity utilisation could help dial in the best configuration even more.

I really need to get back to checking the state of that work and merging it back in.
Reply all
Reply to author
Forward
0 new messages