Apache memory consumption

87 views

Skip to first unread message

Kent

unread,

Apr 1, 2011, 4:32:23 PM4/1/11

to modwsgi

I am hoping you might gracefully suggest what we might be able to do
to improve our problem of memory usage being consumed by apache.

We have a turbo gears type web server with 2GB ram which is running a
point of sale system for about 15 or 18 stores.

We are running mod_wsgi 2.5, but about to upgrade to 3.3.

Is it typical for one process (wsgi:rarch) to consume virtually all
the CPU and memory consumption while the remaining apache children
seem to not be doing much?

Is it typical for memory consumption to increase throughout a day?

I've been reading the release notes, was lazy initialisation of Python
interpreter already existent with version 2.5?

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
444 apache 25 0 1682m 1.1g 10m S 26.0 58.2 35:54.74 httpd
20869 root 18 0 122m 24m 1576 S 0.0 1.2 0:09.09 lfd
4046 root 15 0 79712 14m 1788 S 0.0 0.7 0:03.64 perl
3471 root 15 0 154m 8508 4384 S 0.0 0.4 16:00.12 snmpd
445 apache 15 0 191m 8260 2968 S 0.0 0.4 0:02.11 httpd
6957 apache 15 0 191m 8200 2968 S 0.0 0.4 0:01.23 httpd
465 apache 15 0 191m 8088 2968 S 0.0 0.4 0:02.02 httpd
5167 apache 15 0 190m 7988 2968 S 0.3 0.4 0:01.81 httpd
6956 apache 15 0 190m 7868 2964 S 0.0 0.4 0:01.34 httpd
5168 apache 15 0 190m 7556 2968 S 0.0 0.4 0:01.57 httpd
468 apache 15 0 190m 7552 2968 S 0.0 0.4 0:02.25 httpd
463 apache 15 0 190m 7544 2968 S 0.0 0.4 0:02.14 httpd
5169 apache 15 0 190m 7540 2968 S 0.0 0.4 0:01.73 httpd
464 apache 15 0 190m 7528 2968 S 0.0 0.4 0:02.19 httpd
4723 apache 15 0 190m 7528 2968 S 0.0 0.4 0:01.79 httpd
466 apache 15 0 190m 7492 2968 S 0.0 0.4 0:02.15 httpd
6958 apache 15 0 190m 7480 2968 S 0.0 0.4 0:01.27 httpd
467 apache 15 0 190m 7476 2968 S 0.0 0.4 0:02.08 httpd
469 apache 15 0 190m 7476 2968 S 0.0 0.4 0:02.27 httpd
2445 apache 15 0 190m 7476 2968 S 0.0 0.4 0:02.04 httpd
10984 apache 15 0 190m 7356 2912 S 0.0 0.4 0:00.39 httpd
442 root 15 0 189m 7332 3240 S 0.0 0.4 0:00.07 httpd
10982 apache 15 0 190m 7316 2912 S 0.0 0.4 0:00.44 httpd
10983 apache 15 0 190m 7300 2900 S 0.0 0.4 0:00.50 httpd
14399 ntp 15 0 23384 5020 3896 S 0.0 0.2 0:10.94 ntpd

apache .conf file:
=========
LoadModule wsgi_module modules/mod_wsgi.so
AddHandler wsgi-script .wsgi
WSGIPythonHome /home/rarch/tg2env
WSGIPythonEggs /home/rarch/tg2env/lib/python-egg-cache
WSGIDaemonProcess rarch threads=15 display-name=%{GROUP} python-eggs=/
home/rarch/tg2env/lib/python-egg-cache
WSGIProcessGroup rarch
WSGISocketPrefix run
WSGIRestrictStdout Off

# we'll make the root directory of the domain call the wsgi script
WSGIScriptAlias /tg /home/rarch/trunk/src/appserver/wsgi-config/wsgi-
deployment.py

# make the wsgi script accessible
<Directory /home/rarch/trunk/src/appserver/wsgi-config>
Order allow,deny
Allow from all
</Directory>

<Location /tg/_debug>
AuthType Basic
AuthName "For your company's security, this link is for
retailarchitects.com support only. Please copy the Server Debug link
and email it to your administrator."
AuthUserFile /home/rarch/trunk/src/appserver/debugpasswd
Require valid-user
</Location>
================

Thank you so very much for any time you spare to point me in the right
direction, documentation regarding memory management with mod_wsgi,
etc.

Kent

Graham Dumpleton

unread,

Apr 2, 2011, 6:21:03 PM4/2/11

to mod...@googlegroups.com

On 2 April 2011 07:32, Kent <jkent...@gmail.com> wrote:
> I am hoping you might gracefully suggest what we might be able to do
> to improve our problem of memory usage being consumed by apache.
>
> We have a turbo gears type web server with 2GB ram which is running a
> point of sale system for about 15 or 18 stores.
>
> We are running mod_wsgi 2.5, but about to upgrade to 3.3.
>
> Is it typical for one process (wsgi:rarch) to consume virtually all
> the CPU and memory consumption while the remaining apache children
> seem to not be doing much?

With your configuration yes, that would be expected.

> Is it typical for memory consumption to increase throughout a day?

No, but that is going to be an issue with TurboGears, how you use it,
or your specific application code.

> I've been reading the release notes, was lazy initialisation of Python
> interpreter already existent with version 2.5?

That would reduce a little the memory used by the other processes, but
not your main fat one.

> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 444 apache 25 0 1682m 1.1g 10m S 26.0 58.2 35:54.74 httpd

> ...
>
>
> apache .conf file:

Is this serious all you have in your apache.conf file, or have you
only shown the relevant bits?

I know some people promote the idea of throwing away the complete
Apache configuration and only adding back in the bits thought to be
necessary, but I disagree with that as the underlying C code defaults
for configuration don't actually match what Apache configuration files
setup, so removing the configuration delivered with Apache could have
unknown consequences.

> =========
> LoadModule wsgi_module modules/mod_wsgi.so
> AddHandler wsgi-script .wsgi

The AddHandler line is not needed if using WSGIScriptAlias.

> WSGIPythonHome /home/rarch/tg2env
> WSGIPythonEggs /home/rarch/tg2env/lib/python-egg-cache

You shouldn't need WSGIPythonEggs as you are using python-eggs option
to WSGIDaemonProcess directive below. The WSGIPythonEggs directive
only applies to embedded mode but you are delegating everything to a
daemon mode process.

> WSGIDaemonProcess rarch threads=15 display-name=%{GROUP} python-eggs=/
> home/rarch/tg2env/lib/python-egg-cache
> WSGIProcessGroup rarch
> WSGISocketPrefix run

I suspect that this value of WSGISocketPrefix is going to live the
socket listener files in the wrong place. If you do need to override
it like that, it would be:

WSGISocketPrefix run/wsgi

With what you have, would be left in the Apache root directory and not
in the 'run' subdirectory and not with a 'wsgi' prefix to the socket
files.

> WSGIRestrictStdout Off
>
> # we'll make the root directory of the domain call the wsgi script
> WSGIScriptAlias /tg /home/rarch/trunk/src/appserver/wsgi-config/wsgi-
> deployment.py
>
> # make the wsgi script accessible
> <Directory /home/rarch/trunk/src/appserver/wsgi-config>
> Order allow,deny
> Allow from all
> </Directory>
>
> <Location /tg/_debug>
> AuthType Basic
> AuthName "For your company's security, this link is for
> retailarchitects.com support only. Please copy the Server Debug link
> and email it to your administrator."
> AuthUserFile /home/rarch/trunk/src/appserver/debugpasswd
> Require valid-user
> </Location>
> ================
>
> Thank you so very much for any time you spare to point me in the right
> direction, documentation regarding memory management with mod_wsgi,
> etc.

TurboGears is known to have a large base memory foot print to begin
with. The size of your process though appears to be the result of
application code performing caching and not purging the cache
properly. Alternatively, objects in application and creating reference
count cycles between objects which the Python garbage collector can't
break and so they hang around.

I would probably suggest you ask about your memory problems on the
TurboGear mailing list.

Other than that, the only place that memory leaks usually come from
Apache itself are when mod_python is also loaded, but since only one
process has memory problems and certain other bits of configuration
are working, you don't appear to be doing that.

Graham

Kent

unread,

Apr 4, 2011, 10:26:34 AM4/4/11

to modwsgi

First, Graham, thank you for taking some time; my responses below:

On Apr 2, 6:21 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:

> On 2 April 2011 07:32, Kent <jkentbo...@gmail.com> wrote:
>
> > I am hoping you might gracefully suggest what we might be able to do
> > to improve our problem of memory usage being consumed by apache.
>
> > We have a turbo gears type web server with 2GB ram which is running a
> > point of sale system for about 15 or 18 stores.
>
> > We are running mod_wsgi 2.5, but about to upgrade to 3.3.
>
> > Is it typical for one process (wsgi:rarch) to consume virtually all
> > the CPU and memory consumption while the remaining apache children
> > seem to not be doing much?
>
> With your configuration yes, that would be expected.
>

After finding your Sydney slideshow presentation, I'm understanding
that if I were to set processes=2 threads=15, for example, I'd have 2
(fatter) processes which actually run the python wsgi app and to which
the other threads can delegate; is that understanding correct?

If so, I'm not sure I understand the purpose of the threads, since
wouldn't they need to effectively wait for a process anyway? Earlier,
I believed threads=15 (and processes=1) would allow me to have many
simultaneous requests processing in parallel. Can this one process
accept multiple requests and multitask them, and if so, then what
advantage is gained from processes=2 or higher (does it only make
sense with multi-core processor)?

> > Is it typical for memory consumption to increase throughout a day?
>
> No, but that is going to be an issue with TurboGears, how you use it,
> or your specific application code.
>
> > I've been reading the release notes, was lazy initialisation of Python
> > interpreter already existent with version 2.5?
>
> That would reduce a little the memory used by the other processes, but
> not your main fat one.
>
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 444 apache 25 0 1682m 1.1g 10m S 26.0 58.2 35:54.74 httpd
> > ...
>
> > apache .conf file:
>
> Is this serious all you have in your apache.conf file, or have you
> only shown the relevant bits?
>
> I know some people promote the idea of throwing away the complete
> Apache configuration and only adding back in the bits thought to be
> necessary, but I disagree with that as the underlying C code defaults
> for configuration don't actually match what Apache configuration files
> setup, so removing the configuration delivered with Apache could have
> unknown consequences.
>

No, this is just an included wsgi.conf file under conf.d. The Apache
provided conf file is largely unmodified.

> > =========
> > LoadModule wsgi_module modules/mod_wsgi.so
> > AddHandler wsgi-script .wsgi
>
> The AddHandler line is not needed if using WSGIScriptAlias.

makes sense.

>
> > WSGIPythonHome /home/rarch/tg2env
> > WSGIPythonEggs /home/rarch/tg2env/lib/python-egg-cache
>
> You shouldn't need WSGIPythonEggs as you are using python-eggs option
> to WSGIDaemonProcess directive below. The WSGIPythonEggs directive
> only applies to embedded mode but you are delegating everything to a
> daemon mode process.
>
> > WSGIDaemonProcess rarch threads=15 display-name=%{GROUP} python-eggs=/
> > home/rarch/tg2env/lib/python-egg-cache
> > WSGIProcessGroup rarch
> > WSGISocketPrefix run
>
> I suspect that this value of WSGISocketPrefix is going to live the
> socket listener files in the wrong place. If you do need to override
> it like that, it would be:
>
> WSGISocketPrefix run/wsgi
>
> With what you have, would be left in the Apache root directory and not
> in the 'run' subdirectory and not with a 'wsgi' prefix to the socket
> files.
>

I see that, thanks.

Yes, I cache things that make sense to cache, but cache them as part
of 'session' objects which I believed to be being garbage collected,
maybe that is my problem. I wanted to see if this was typical of even
well behaved wsgi apps running thru apache *because of an article I
read*, which reads:

"If you serve 99% static files and 1% dynamic files with Apache, each
httpd process will use from 3-20 megs of RAM (depending on your MOST
complex dynamic page).
This occurs because a process grows to accommodate whatever it is
serving, and NEVER decreases until that process dies. Unless you have
very few dynamic pages and major traffic fluctuation, most of your
httpd processes will soon take up an amount of RAM equal to the
largest dynamic script on your system. A very smart web server would
deal with this automatically. As it is, you have a few options to
manually improve RAM usage."

http://onlamp.com/pub/a/onlamp/2004/02/05/lamp_tuning.html

This article lead me to hypothesize [hypothesise ;) ] that it would be
typical for any apache/wsgi server to slowly increase in RAM
consumption as more and more requests simultaneously requested
processing that required some bulk of RAM... if these occurred in
parallel, then the article suggests this RAM is NEVER returned to the
OS.

I can take further inquiry to the turbogears group if I can't resolve
my memory problems, but please first answer this: suppose my WSGI app
grabbed a large amount of RAM *and assume it properly disposed of it*:
would I see the RAM returned to the OS, or would the apache process
hold it indefinitely?

Graham Dumpleton

unread,

Apr 4, 2011, 6:28:36 PM4/4/11

to mod...@googlegroups.com

On 5 April 2011 00:26, Kent <jkent...@gmail.com> wrote:
> First, Graham, thank you for taking some time; my responses below:
>
> On Apr 2, 6:21 pm, Graham Dumpleton <graham.dumple...@gmail.com>
> wrote:
>> On 2 April 2011 07:32, Kent <jkentbo...@gmail.com> wrote:
>>
>> > I am hoping you might gracefully suggest what we might be able to do
>> > to improve our problem of memory usage being consumed by apache.
>>
>> > We have a turbo gears type web server with 2GB ram which is running a
>> > point of sale system for about 15 or 18 stores.
>>
>> > We are running mod_wsgi 2.5, but about to upgrade to 3.3.
>>
>> > Is it typical for one process (wsgi:rarch) to consume virtually all
>> > the CPU and memory consumption while the remaining apache children
>> > seem to not be doing much?
>>
>> With your configuration yes, that would be expected.
>>
>
> After finding your Sydney slideshow presentation, I'm understanding
> that if I were to set processes=2 threads=15, for example, I'd have 2
> (fatter) processes which actually run the python wsgi app and to which
> the other threads can delegate; is that understanding correct?

In the context of your very fat web application, I don't the processes
would appear any 'fatter' than they were already. Since you are
restricting it to two, rather than whatever Apache MPM for embedded
mode allowed, you have at least constrained it.

In other words, using multiple threads in a process will increase the
amount of base memory used by that process, but it isn't necessarily
that much that I would be labelling it 'fatter'.

> If so, I'm not sure I understand the purpose of the threads, since
> wouldn't they need to effectively wait for a process anyway? Earlier,
> I believed threads=15 (and processes=1) would allow me to have many
> simultaneous requests processing in parallel. Can this one process
> accept multiple requests and multitask them, and if so, then what
> advantage is gained from processes=2 or higher (does it only make
> sense with multi-core processor)?

Despite the presence of the GIL in Python which restricts only one
thread to running Python code at a time in a process, with a
potentially I/O bound process like a web application, there is ample
opportunity for the GIL to be released while code is waiting for I/O,
such that an effective level of concurrency can still be handled with
one single multithreaded process.

So, using multiple processes across multiple CPUs can allow you to
harness the CPU power of the whole system, the nature of web
applications is such that you can still achieve a lot with a single
process.

Have a read of comments I make in the following about parallelisation
in Apache/mod_wsgi.

http://blog.dscpl.com.au/2007/09/parallel-python-discussion-and-modwsgi.html
http://blog.dscpl.com.au/2007/07/web-hosting-landscape-and-modwsgi.html

>> TurboGears is known to have a large base memory foot print to begin
>> with. The size of your process though appears to be the result of
>> application code performing caching and not purging the cache
>> properly. Alternatively, objects in application and creating reference
>> count cycles between objects which the Python garbage collector can't
>> break and so they hang around.
>>
>
> Yes, I cache things that make sense to cache, but cache them as part
> of 'session' objects which I believed to be being garbage collected,
> maybe that is my problem. I wanted to see if this was typical of even
> well behaved wsgi apps running thru apache *because of an article I
> read*, which reads:

Even if not explicitly caching, object cycles can still cause problems
for transient objects.

You might try seeing if you can get going:

http://pypi.python.org/pypi/Dozer

This will allow you to try and track were all the objects are being
created and what type they are.

> "If you serve 99% static files and 1% dynamic files with Apache, each
> httpd process will use from 3-20 megs of RAM (depending on your MOST
> complex dynamic page).

That is a motherhood statement that has no practical usefulness and
would likely be totally meaningless to anything but the specific setup
and application the person was using. At I guess I would say that that
statement wasn't even made about Python web applications. Python web
applications tend to have much larger memory requirements.

> This occurs because a process grows to accommodate whatever it is
> serving, and NEVER decreases until that process dies. Unless you have
> very few dynamic pages and major traffic fluctuation, most of your
> httpd processes will soon take up an amount of RAM equal to the
> largest dynamic script on your system. A very smart web server would
> deal with this automatically. As it is, you have a few options to
> manually improve RAM usage."
>
> http://onlamp.com/pub/a/onlamp/2004/02/05/lamp_tuning.html
>
> This article lead me to hypothesize [hypothesise ;) ] that it would be
> typical for any apache/wsgi server to slowly increase in RAM
> consumption as more and more requests simultaneously requested
> processing that required some bulk of RAM... if these occurred in
> parallel, then the article suggests this RAM is NEVER returned to the
> OS.

The memory use should plateau though and shouldn't just keep growing.
If it keeps growing you have an object leak through bad caching or
object cycles.

> I can take further inquiry to the turbogears group if I can't resolve
> my memory problems, but please first answer this: suppose my WSGI app
> grabbed a large amount of RAM *and assume it properly disposed of it*:
> would I see the RAM returned to the OS, or would the apache process
> hold it indefinitely?

The simple answer is that now you wouldn't see memory returned to the
OS. There are some slight exceptions to this but only with recent
Python versions (not sure which, may even only be some of the Python
3.X versions). You shouldn't though be able to count on those
exceptions though as in most cases you aren't likely to encounter it.

Graham

>> I would probably suggest you ask about your memory problems on the
>> TurboGear mailing list.
>>
>> Other than that, the only place that memory leaks usually come from
>> Apache itself are when mod_python is also loaded, but since only one
>> process has memory problems and certain other bits of configuration
>> are working, you don't appear to be doing that.
>>
>> Graham
>

> --
> You received this message because you are subscribed to the Google Groups "modwsgi" group.
> To post to this group, send email to mod...@googlegroups.com.
> To unsubscribe from this group, send email to modwsgi+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
>
>

Kent

unread,

Apr 7, 2011, 11:46:53 AM4/7/11

to modwsgi

Just a word of thanks for all this excellent information, which has
been very helpful. I needed to fix Dozer a bit (somewhat broken as
is), but its helpful and appears I'm not leaking memory, just needed
more RAM I think...

Thanks again, I understand mod_wsgi much better now.

On Apr 4, 6:28 pm, Graham Dumpleton <graham.dumple...@gmail.com>
wrote: