Apache High Memory Usage With Modwsgi

310 views
Skip to first unread message

Zohaib Ahmed Hassan

unread,
Dec 5, 2020, 12:15:54 AM12/5/20
to modwsgi
We have an ec2 instance 4vcpu and 16gb of ram which is running Apache server with mpm event behind an aws ELB (application load balancer). This server serve just Images requested by our other applications although for most of application we are uasing cloudfront for caching but one app is directly sending request on server .  Now Apache memory usage reached to 70% every day but it did not come down we have to restart server every time. Earier will old Apache 2.2 version and worker mpm without load balncer we were not having this issue. I have tried different configuration for MPM EVENT and Apache but its not working. Here is apache2.conf

    
    Timeout 120   # also tried the timeout with 300
    KeepAlive On
    MaxKeepAliveRequests 100
    KeepAliveTimeout 45 # varies this setting from 1 seconds to 300


Here is load balancer setting

 - Http and https listener
 
 - Idle timeout is 30

Mpm event

    <IfModule mpm_event_module>
        StartServers            2
        MinSpareThreads         50
        MaxSpareThreads         75
        ThreadLimit                      64
        #ServerLimit               400
        ThreadsPerChild          25
        MaxRequestWorkers        400
        MaxConnectionsPerChild   10000
</IfModule>

 1. When i change MaxRequestWorkers to 150 with MaxConnectionsPerChild  0 and ram usage reached 47 percent system health checks are failed and new instance is launched by auto scaling group. Seems like worker limit is reached which already happend when this instance was working with 8GB Ram.
 2. Our other server which are just running with simple django site and django rest frame apis are working fine with default values for MPM and apache configured on installation.
 3. I have also tried the configuration with KeepAliveTimeout equals to 2, 3 and 5 seconds as well but it did not work .
 4. I have also follow this link [enter link description here][1] it worked somewhat better but memory usage is not coming down.

here is the recent error  log

    [Fri Dec 04 07:45:21.963290 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:22.964362 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:23.965432 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:24.966485 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:25.967281 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:26.968328 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:27.969392 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:28.970449 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:29.971505 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:30.972548 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:31.973593 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:32.974644 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:33.975697 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:34.976753 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.
    [Fri Dec 04 07:45:35.977818 2020] [mpm_event:error] [pid 5232:tid 139782245895104] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase ServerLimit.

top command result

 

 

    3296 www-data  20   0 3300484 469824  58268 S   0.0  2.9   0:46.46 apache2                                               
     2544 www-data  20   0 3359744 453868  58292 S   0.0  2.8   1:24.53 apache2                                               
     1708 www-data  20   0 3357172 453524  58208 S   0.0  2.8   1:02.85 apache2                                               
      569 www-data  20   0 3290880 444320  57644 S   0.0  2.8   0:37.53 apache2                                               
     3655 www-data  20   0 3346908 440596  58116 S   0.0  2.7   1:03.54 apache2                                               
     2369 www-data  20   0 3290136 428708  58236 S   0.0  2.7   0:35.74 apache2                                               
     3589 www-data  20   0 3291032 382260  58296 S   0.0  2.4   0:50.07 apache2                                               
     4298 www-data  20   0 3151764 372304  59160 S   0.0  2.3   0:18.95 apache2                                               
     4523 www-data  20   0 3140640 310656  58032 S   0.0  1.9   0:07.58 apache2                                               
     4623 www-data  20   0 3139988 242640  57332 S   3.0  1.5   0:03.51 apache2

What is wrong in the configuration that is causing high memory?


  [1]: https://aws.amazon.com/premiumsupport/knowledge-center/apache-backend-elb/

Graham Dumpleton

unread,
Dec 5, 2020, 12:18:15 AM12/5/20
to mod...@googlegroups.com
What is the mod_wsgi part of the Apache configuration?

Need to know if you are using embedded mode or daemon mode and how it is set up.

Also, what is the request throughput to the Django application and what is average and worst case response times?

Graham

--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/f10fbec6-d2b9-4486-a63b-e1fe80f45ddbn%40googlegroups.com.

Graham Dumpleton

unread,
Dec 5, 2020, 12:22:22 AM12/5/20
to mod...@googlegroups.com
Also, in addition to what I already asked, what version of mod_wsgi is being used?

Graham

Zohaib Ahmed Hassan

unread,
Dec 5, 2020, 4:35:25 AM12/5/20
to mod...@googlegroups.com
Thanks for the response here are details
1 mod_wsgi version is 4.5.7
2 its used as embedded mode
3 basically this app get images in request and crop those images and return , average time it took is around 3 to 5 seconds

You received this message because you are subscribed to a topic in the Google Groups "modwsgi" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/modwsgi/9FMOGfbmTgg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to modwsgi+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modwsgi/B006AA7E-E305-4358-AD4A-4EDBE229FA5C%40gmail.com.

Graham Dumpleton

unread,
Dec 5, 2020, 6:58:58 AM12/5/20
to mod...@googlegroups.com
What about request throughput? That is, requests/sec it current handles, and how many concurrent requests at a time.

Graham

Zohaib Ahmed Hassan

unread,
Dec 5, 2020, 9:51:28 PM12/5/20
to mod...@googlegroups.com
 I don't know how much throughput is per second. It's random in the peak hours it is 5req/sec but the below chart can help you understand the requests throughput as well.
one more thing I have tested 300 Concurrent requests with it using a script and it works well with it but after requests completed the memory usage stays at peek like before if it was 20 percent and with concurrent requests it went to 45 percent it stays at 45 ..

Zohaib Ahmed Hassan | Senior DevOps Engineer




imagelb.PNG

Zohaib Ahmed Hassan

unread,
Dec 7, 2020, 2:39:46 AM12/7/20
to modwsgi
I also get this issue sometime
[Mon Dec 07 07:04:22.142767 2020] [core:warn] [pid 1836:tid 139752646228928] AH00045: child process 2807 still did not exit, sending a SIGTERM
[Mon Dec 07 07:04:24.144831 2020] [core:warn] [pid 1836:tid 139752646228928] AH00045: child process 1847 still did not exit, sending a SIGTERM
[Mon Dec 07 07:04:24.144875 2020] [core:warn] [pid 1836:tid 139752646228928] AH00045: child process 2807 still did not exit, sending a SIGTERM
[Mon Dec 07 07:04:26.146928 2020] [core:warn] [pid 1836:tid 139752646228928] AH00045: child process 1847 still did not exit, sending a SIGTERM
[Mon Dec 07 07:04:26.146967 2020] [core:warn] [pid 1836:tid 139752646228928] AH00045: child process 2807 still did not exit, sending a SIGTERM
[Mon Dec 07 07:04:28.149026 2020] [core:error] [pid 1836:tid 139752646228928] AH00046: child process 1847 still did not exit, sending a SIGKILL
[Mon Dec 07 07:04:28.149092 2020] [core:error] [pid 1836:tid 139752646228928] AH00046: child process 2807 still did not exit, sending a SIGKILL

Graham Dumpleton

unread,
Dec 7, 2020, 11:27:00 PM12/7/20
to mod...@googlegroups.com
Been trying to catch up on other stuff the last few days, which is why this response is delayed.

Over the years have seen a number of times people doing this exact same thing as you are. That is, doing image manipulation on an uploaded image and then returning the result. For one reason or another the final outcome seemed to always be that you are better off using a backend queuing system such as Celery to handle the image manipulation. In other words, remove the processing of the images from your web application processes.

There are a few reasons why this is the case.

The first is that images and image manipulation can use a lot of transient memory. Especially when using multithreading in your web application with Python, this can result in high peak memory usage for the process. This is because you might get a whole bunch of requests come in at the same time and so processing of them overlaps. The memory consumption will blow out to the maximum required to support that number all being processed at the same time. When done, although the memory is released back for use by other parts of the application, the damage has already been done and your application will keep the overall high memory reservation. End result is that most of the time you will have lots of unused memory held by the process, with it only being used when you get concurrent requests again.

The second problem is that image manipulation can be CPU intensive. In a multithreaded application, depending on how well the image manipulation library works and how it handles locking of the global interpreter lock, in worst case parts of that image processing will be forced to be serialised resulting in requests being blocked and time taken for requests being longer than it would if processes were single threaded. In other words, image manipulation done in different threads interfere with each other and they all suffer.

The third is that if using embedded mode of mod_wsgi, you can see problems with per request thread pool usage of Apache worker process (in which the Python code is running), blowing out due to large response sizes. In the old days of Apache, up to 8MB could be held in the per request thread memory pool and only memory above that limit would actually be released. Thus if have lot of threads per worker process, that means 8MB of memory that stays reserved for each worker thread. In more recent Apache versions the sample configuration that comes with Apache drops this to 2MB, but if the distro has removed that setting from original Apache sample configuration, or you remove it, then I believe it defaults back to 8MB.

Using a backend Celery task system avoids the first two issues as work is done in a separate process and that process could even be recycled after every task, so you avoid problem with unused memory hanging around being reserved. The Celery processes are also single threaded, eliminating Python global interpreter lock issues.

The third problem above can be lessened by ensuring the Apache configuration directive for setting per request memory pool size is actually set, and lower the value if necessary. How you configure the Apache MPM settings can also affect this.

In general though, it is always recommended as first option that you avoid using modwsgi embedded mode at all, and use daemon mode. This avoids various problems caused by Apache MPM choice and settings.

So if you can change to Celery in the short term, switch to daemon mode instead.

In doing this, ensure that embedded mode is disabled completely by setting:

    WSGIRestrictEmbedded On

Also reduce the per request thread pool size. Where Apache worker processes are only acting as proxy to mod_wsgi daemon process, the value I set in mod_wsgi-express configuration is:

    ThreadStackSize 262144

Thus 0.25MB per thread instead of 2MB or 8MB.

Another dangerous setting you were using that would have caused lots of problems when using embedded mode was:

    MaxKeepAliveRequests 100

This would be causing Apache to restart your application processes too frequently, causing higher CPU due to high start up cost. In mod_wsgi-express I don't set this at all.

Next problem is:

    KeepAliveTimeout 45

In mod_wsgi-express, I set this to 2 seconds. By having such a high value you risk problems, especially when using worker MPM, although event MPM can have its own issues. By having it lower, you may not need as many Apache worker processes and threads.

The question now is why you were restarting after 100 requests. Was this in attempt to try and keep memory usage down?

One of the consequences of this is that would possibly see a lot of interrupted requests. This is what those warning messages about killing off processes is about. This is because Apache will only wait so long for processes to shutdown. Depending on how shutdown is managed, this can be only 5 seconds, but since you have long running requests, that can prevent that so Apache kills the processes anyway, and thus why requests can be interrupted. You really want to avoid periodic restarts of Apache child worker processes using that option.

If you do have a growing memory problem because of issues with your application code, there are various ways you can trigger restarts of the mod_wsgi daemon processes, but these self initiated restarts allow for a graceful restart timeout. Thus for the WSGIDaemonProcess directive you can set the options:

    maximum-requests=100 graceful-timeout=120

So when 100 requests arrive, a restart of the process will be signalled, but since the graceful timeout is set to 120 seconds, it will only be forcibly restarted after 120 seconds. In the interim, if the number of active requests being handled by the process drops to 0, a restart will be triggered at that point. This way it limits interrupting of active requests. You will still have issues though if have issues with requests getting blocked indefinitely as never reaches point where no active requests, but then if that is occurring any why are restarting so frequently, you have bigger issues.

For the latter, if you are getting stuck requests, you want to look at request-timeout option to WSGIDaemonProcess.

Anyway, for further guidance on setting up mod_wsgi daemon mode, would suggest watching:


The defaults for mod_wsgi daemon mode are not the best options for historical reasons. The video talks about that and how mod_wsgi-express sets different defaults.

To start with that is probably all I can suggest. Giving recommendations on tuning Apache MPM settings and mod_wsgi daemon mode is harder to do at this point.

Summarising things. Use Celery as out of process means to handle image manipulation. If you can't do that for now, try and switch to mod_wsgi daemon mode as that will allow memory and CPU usage to be better controlled.

Graham

Graham Dumpleton

unread,
Dec 7, 2020, 11:53:29 PM12/7/20
to mod...@googlegroups.com

On 8 Dec 2020, at 3:26 pm, Graham Dumpleton <graham.d...@gmail.com> wrote:

So if you can change to Celery in the short term, switch to daemon mode instead.

Meant to say "if you can't change" here.

The other to consider when using daemon mode is split out the image processing to a separate set of daemon mode processes.

This is discussed in:


In your case you want to send all image processing into a daemon process group where have multiple processes and each is single threaded. You can handle everything else the application does in a normal multithreaded process.

Graham

Zohaib Ahmed Hassan

unread,
Dec 8, 2020, 6:05:43 AM12/8/20
to modwsgi
Thank you very much for the reply your suggestions are very important to us I will apply them one by one and will let you know.

Graham Dumpleton

unread,
Dec 8, 2020, 6:08:03 AM12/8/20
to mod...@googlegroups.com
Just understand you will need to play with the processes/threads options of WSGIDaemonProcess directive when you switch to daemon mode. So watch carefully how things are affected when you change things and feel free to come back explaining what you see with what settings as could then perhaps give more advice.

-- 
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages