Re: [modwsgi] Memory leaks on Apache restart.

68 views
Skip to first unread message

Graham Dumpleton

unread,
Mar 22, 2009, 9:02:33 PM3/22/09
to mod...@googlegroups.com
2009/3/23 gert <gert.c...@gmail.com>:
>
> wsgi r1232 python 3.1 apache 2.2.11
>
> www      29747  0.0  0.8 229496  4160 ?        Sl   01:35   0:00
> (wsgi:site1)         -k start
> www      29776  0.0  0.8   8268  4040 ?        S    01:35   0:00 /usr/
> httpd/bin/httpd -k start
> www      29777  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> httpd/bin/httpd -k start
> www      29778  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> httpd/bin/httpd -k start
> www      29779  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> httpd/bin/httpd -k start
> www      29780  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> httpd/bin/httpd -k start
>
> x20 apache2ctl restart
>
> www      30550  0.0  1.3 231432  6352 ?        Sl   01:36   0:00
> (wsgi:site1)         -k start
> www      30579  0.0  1.3  10204  6192 ?        S    01:36   0:00 /usr/
> httpd/bin/httpd -k start
> www      30580  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> httpd/bin/httpd -k start
> www      30581  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> httpd/bin/httpd -k start
> www      30582  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> httpd/bin/httpd -k start
> www      30583  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> httpd/bin/httpd -k start
>
> Don't ask to many difficult questions please :-)

How can I not ask questions when it isn't obvious what you are asking
or pointing out. Since you didn't post the labels, I don't even know
what each of the columns represents on your system.

Looking into my crystal ball I assume that you are possibly pointing
out that memory is still being leaked.

Even though that issue addresses a larger source of memory leakage,
the Python interpreter itself still leaks memory when Py_Finalize() is
called.

I actually find the comment by Mark Hammond in:

http://groups.google.com/group/comp.lang.python/browse_frm/thread/7b8eef94aa2af6f7?hl=en#

quite disturbing. Namely:

"""Calling
Py_Initialize and Py_Finalize multiple times does leak (Python 3 has
mechanisms so this need to always be true in the future, but it is true
now for non-trivial apps."""

Unfortunately his grammar is a bit unclear and so not 100% sure what
he meant. Not sure if what he meant to say is that Python 3 will
always have memory leaks, or that it shouldn't, whereas older versions
of Python can.

If by design Python 3.0 is now going to never properly clean up its
memory on exit, then we are all screwed and embedded mode will be
useless and may as well be removed, as well as mod_python also dying
for good. This means that mod_wsgid as described in mod_wsgi roadmap
will be the only viable way of running Python under Apache in the
future.

I'll see if I can get Mark to clarify what he meant.

Graham

gert

unread,
Mar 22, 2009, 9:50:37 PM3/22/09
to modwsgi
On Mar 23, 2:02 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/3/23 gert <gert.cuyk...@gmail.com>:
>
> > wsgi r1232 python 3.1 apache 2.2.11

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND

> > www      29747  0.0  0.8 229496  4160 ?        Sl   01:35   0:00
> > (wsgi:site1)         -k start
> > www      29776  0.0  0.8   8268  4040 ?        S    01:35   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      29777  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      29778  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      29779  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      29780  0.0  0.8   8268  4032 ?        S    01:35   0:00 /usr/
> > httpd/bin/httpd -k start
>
> > x20 apache2ctl restart

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND

> > www      30550  0.0  1.3 231432  6352 ?        Sl   01:36   0:00
> > (wsgi:site1)         -k start
> > www      30579  0.0  1.3  10204  6192 ?        S    01:36   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      30580  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      30581  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      30582  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> > httpd/bin/httpd -k start
> > www      30583  0.0  1.3  10204  6184 ?        S    01:36   0:00 /usr/
> > httpd/bin/httpd -k start
>
> > Don't ask to many difficult questions please :-)
>
> How can I not ask questions when it isn't obvious what you are asking
> or pointing out. Since you didn't post the labels, I don't even know
> what each of the columns represents on your system.
>
> Looking into my crystal ball I assume that you are possibly pointing
> out that memory is still being leaked.
>

Added the headers :-) Easy questions you may, I expected Graham going
in malloc() mode.

gert

unread,
Mar 22, 2009, 10:23:16 PM3/22/09
to modwsgi


On Mar 23, 2:02 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/3/23 gert <gert.cuyk...@gmail.com>:
>
> > wsgi r1232 python 3.1 apache 2.2.11

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND

> Looking into my crystal ball I assume that you are possibly pointing
> out that memory is still being leaked.
>
> Even though that issue addresses a larger source of memory leakage,
> the Python interpreter itself still leaks memory when Py_Finalize() is
> called.
>
> I actually find the comment by Mark Hammond in:
>
>  http://groups.google.com/group/comp.lang.python/browse_frm/thread/7b8...
>
> quite disturbing. Namely:
>
> """Calling
> Py_Initialize and Py_Finalize multiple times does leak (Python 3 has
> mechanisms so this need to always be true in the future, but it is true
> now for non-trivial apps."""
>
> Unfortunately his grammar is a bit unclear and so not 100% sure what
> he meant. Not sure if what he meant to say is that Python 3 will
> always have memory leaks, or that it shouldn't, whereas older versions
> of Python can.
>
> If by design Python 3.0 is now going to never properly clean up its
> memory on exit, then we are all screwed and embedded mode will be
> useless and may as well be removed, as well as mod_python also dying
> for good. This means that mod_wsgid as described in mod_wsgi roadmap
> will be the only viable way of running Python under Apache in the
> future.
>
> I'll see if I can get Mark to clarify what he meant.

Note that (wsgi:site1) witch is the daemon process, increases exactly
the same the 5 embedded processes

Graham Dumpleton

unread,
Mar 22, 2009, 10:39:24 PM3/22/09
to mod...@googlegroups.com
2009/3/23 gert <gert.c...@gmail.com>:

The important one to look at to gauge rate of leakage is the Apache
parent process. So, if can enable showing of PPID as well as PID, you
can more easily see which is the parent process of the wsgi process.
That will be the one you want to compare rate of growth.

Anyway, as I said, while ever Python leaks memory on Py_Finalize()
this is going to be an issue. Although third party C extensions module
might leak memory as well, they aren't loaded into Apache parent as
don't provide a way of preloading of additional modules into parent.
That leaks can occur is one of the reasons don't allow it.

All up, this is another reason why using daemon mode is better default
way of doing things as you don't need to restart whole of Apache just
to restart a WSGI application.

Graham

Graham Dumpleton

unread,
Mar 23, 2009, 2:11:46 AM3/23/09
to mod...@googlegroups.com
2009/3/23 Graham Dumpleton <graham.d...@gmail.com>:

Part of the discussion associated with:

http://bugs.python.org/issue1856

is pertinent to this problem.

One thing that is suggested is that the underlying data which exists
to support the simplified Python GIL cannot be torn down and has to
exist for the life of the process.

This might be reasonable if Py_Initialize() is called straight away,
but when a restart in Apache occurs it will unload the mod_wsgi.so
file in the Apache parent process and thus also unload the Python
library. As a result, all the references to that preserved global data
is lost and cannot be reused. Thus when mod_wsgi.so is reloaded by
Apache the global variables are reset to nulls and Python thinks it
has to reinitialise the data.

This therefore is going to be one source of memory leaks that can
never be avoided. If there are similar instances where Python makes
the assumption that it can cache the data because Py_Initialize() may
be called again, and so never truly free the memory, that will also
leak.

This is why having daemon mode only option is better. In that case and
for mod_wsgi 4.0, Python would be initialised in separate monitor
process. On a restart that whole monitor process is also destroyed.
Thus don't have this problem with memory leaks when calling
Py_Initialize() a second time, as would never occur.

We are therefore almost at the point where for UNIX systems embedded
mode should be completely done away with or has to change
significantly. On Windows it doesn't matter, as Python is only
initialised in the Apache child process anyway and not cycling of
Py_Initialize()/Py_Finalize().

If getting rid of embedded mode entirely, the problem is how to
support WSGIAccessScript, WSGIAuthUserScript and WSGIAuthGroupScript
if want to keep providing that option. The only option for that would
be to do what FASTCGI does and execute the operation for that in the
daemon process as well. It would slow things down doing that, but no
choice.

The other thing that would be similarly a problem would be
WSGIDispatchScript. This allows user (admin) provided Python code to
select process group and application group for specific WSGI
application dynamically.

The only choice is to not initialise Python in the Apache parent
process and instead delay initialisation until after Apache child
server processes are created. This is actually what the experimental
WSGILazyInitialization directive in mod_wsgi 3.0 does. The difference
at the moment though is that that directive also causes initialisation
for daemon processes to be delayed. This all avoids the memory leaks
in the Apache parent process which is in turn inherited by the Apache
child server processes and daemon processes, but from my tests results
in all those processes then taking more memory as some ability to
share data or rely on delayed copy on write is lost. So, you win one
way but loose in another.

Anyway, gert, I am sure you will enjoy having a play with the
WSGILazyInitialization directive and see how it affects your overall
memory usage figures, as well as verify that it does eliminate the
memory leak problems in the Apache parent process and thus all other
processes.

Graham

gert

unread,
Mar 23, 2009, 4:31:18 PM3/23/09
to modwsgi
On Mar 23, 7:11 am, Graham Dumpleton <graham.dumple...@gmail.com>
wrote:
> 2009/3/23 Graham Dumpleton <graham.dumple...@gmail.com>:
>
> > 2009/3/23 gert <gert.cuyk...@gmail.com>:
Pfff i made it, glad you did not go realloc(on, my *ss)
Killing embedded mode you may as long you do this first
http://groups.google.com/group/modwsgi/browse_thread/thread/c29dde8fbef68e0b
the dynamic daemon process thing

So I can expect a major update this friday or something in the
'broken' section of the repository :-)
Reply all
Reply to author
Forward
0 new messages