How can I not ask questions when it isn't obvious what you are asking
or pointing out. Since you didn't post the labels, I don't even know
what each of the columns represents on your system.
Looking into my crystal ball I assume that you are possibly pointing
out that memory is still being leaked.
Even though that issue addresses a larger source of memory leakage,
the Python interpreter itself still leaks memory when Py_Finalize() is
called.
I actually find the comment by Mark Hammond in:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/7b8eef94aa2af6f7?hl=en#
quite disturbing. Namely:
"""Calling
Py_Initialize and Py_Finalize multiple times does leak (Python 3 has
mechanisms so this need to always be true in the future, but it is true
now for non-trivial apps."""
Unfortunately his grammar is a bit unclear and so not 100% sure what
he meant. Not sure if what he meant to say is that Python 3 will
always have memory leaks, or that it shouldn't, whereas older versions
of Python can.
If by design Python 3.0 is now going to never properly clean up its
memory on exit, then we are all screwed and embedded mode will be
useless and may as well be removed, as well as mod_python also dying
for good. This means that mod_wsgid as described in mod_wsgi roadmap
will be the only viable way of running Python under Apache in the
future.
I'll see if I can get Mark to clarify what he meant.
Graham
The important one to look at to gauge rate of leakage is the Apache
parent process. So, if can enable showing of PPID as well as PID, you
can more easily see which is the parent process of the wsgi process.
That will be the one you want to compare rate of growth.
Anyway, as I said, while ever Python leaks memory on Py_Finalize()
this is going to be an issue. Although third party C extensions module
might leak memory as well, they aren't loaded into Apache parent as
don't provide a way of preloading of additional modules into parent.
That leaks can occur is one of the reasons don't allow it.
All up, this is another reason why using daemon mode is better default
way of doing things as you don't need to restart whole of Apache just
to restart a WSGI application.
Graham
Part of the discussion associated with:
http://bugs.python.org/issue1856
is pertinent to this problem.
One thing that is suggested is that the underlying data which exists
to support the simplified Python GIL cannot be torn down and has to
exist for the life of the process.
This might be reasonable if Py_Initialize() is called straight away,
but when a restart in Apache occurs it will unload the mod_wsgi.so
file in the Apache parent process and thus also unload the Python
library. As a result, all the references to that preserved global data
is lost and cannot be reused. Thus when mod_wsgi.so is reloaded by
Apache the global variables are reset to nulls and Python thinks it
has to reinitialise the data.
This therefore is going to be one source of memory leaks that can
never be avoided. If there are similar instances where Python makes
the assumption that it can cache the data because Py_Initialize() may
be called again, and so never truly free the memory, that will also
leak.
This is why having daemon mode only option is better. In that case and
for mod_wsgi 4.0, Python would be initialised in separate monitor
process. On a restart that whole monitor process is also destroyed.
Thus don't have this problem with memory leaks when calling
Py_Initialize() a second time, as would never occur.
We are therefore almost at the point where for UNIX systems embedded
mode should be completely done away with or has to change
significantly. On Windows it doesn't matter, as Python is only
initialised in the Apache child process anyway and not cycling of
Py_Initialize()/Py_Finalize().
If getting rid of embedded mode entirely, the problem is how to
support WSGIAccessScript, WSGIAuthUserScript and WSGIAuthGroupScript
if want to keep providing that option. The only option for that would
be to do what FASTCGI does and execute the operation for that in the
daemon process as well. It would slow things down doing that, but no
choice.
The other thing that would be similarly a problem would be
WSGIDispatchScript. This allows user (admin) provided Python code to
select process group and application group for specific WSGI
application dynamically.
The only choice is to not initialise Python in the Apache parent
process and instead delay initialisation until after Apache child
server processes are created. This is actually what the experimental
WSGILazyInitialization directive in mod_wsgi 3.0 does. The difference
at the moment though is that that directive also causes initialisation
for daemon processes to be delayed. This all avoids the memory leaks
in the Apache parent process which is in turn inherited by the Apache
child server processes and daemon processes, but from my tests results
in all those processes then taking more memory as some ability to
share data or rely on delayed copy on write is lost. So, you win one
way but loose in another.
Anyway, gert, I am sure you will enjoy having a play with the
WSGILazyInitialization directive and see how it affects your overall
memory usage figures, as well as verify that it does eliminate the
memory leak problems in the Apache parent process and thus all other
processes.
Graham