Apparent memory leak running CherrypPy app

91 views
Skip to first unread message

Pete H

unread,
Feb 26, 2008, 6:07:55 AM2/26/08
to modwsgi
I seem to have a memory leak running my CherryPy 3 app under mod_wsgi
1.3. I'm asking here first because the symptoms I'm seeing only occur
when running with mod_wsgi, and not with mod_python or the cherryPy
HTTP server.

What I'm seeing with mod_wsgi is that for every new URI requested from
my application the memory shown by 'top' increases by 2-3MB.
Requesting the same URI multiple times doesn't result in this
increase.

My Apache (2.0.61 pre-fork) configuration in the VirtualHost stanza
looks like this:

WSGIScriptAliasMatch .* /home/sombshet/cgi-bin/run_wsgi.wsgi
WSGIDaemonProcess sombshet_wsgi user=sombshet group=sombshet home=/
home/sombshet/cgi-bin
WSGIProcessGroup sombshet_wsgi
WSGIPassAuthorization On

My entry script looks like this:

import os
import sys
sys.stdout = sys.stderr
# Modify this line to suit your deployment
#sys.path.append(os.path.join(os.path.dirname(__file__), 'dbapp'))
import atexit

import cherrypy

import config
import dblib
import tools # Must have this or our tools won't be registered

# Set up site-wide config. Do this first so that,
# if something goes wrong, we get a log.
cherrypy.config.update({
'log.screen': False,
'log.error_file': os.path.join(os.path.dirname(__file__),
'logs', 'cp_error_log'),
'environment': 'production',
# Turn off signal handlers when CP does not control the OS process
'engine.SIGTERM': None,
'engine.SIGHUP': None,
})

config.initialise_app()

atexit.register(dblib.DB_CONN.close)
application = cherrypy.tree


config.initialise_app basically creates the URI trees with their
configurations and attaches them to cherrypy.tree

The startup scripts to run with mod_python and CherryPy server look
much the same but with a cherrypy.engine.start() after the
config.initialise_app() line instead otf the two lines shown here.

I have the feeling that I should be doing something after the
'application = cherrypy.tree' line?


gert

unread,
Feb 26, 2008, 1:05:45 PM2/26/08
to modwsgi
we need Graham for this but as a meanwhile answer can you try apache
2.2.x and mod_wsgi 2.0r4 or svn ?

gert

unread,
Feb 26, 2008, 1:07:52 PM2/26/08
to modwsgi
also a mod wsgi only test would be helpfull

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)

return [output]

gert

unread,
Feb 26, 2008, 2:38:40 PM2/26/08
to modwsgi
and what python version are you using

Graham Dumpleton

unread,
Feb 26, 2008, 4:07:51 PM2/26/08
to mod...@googlegroups.com
I'm confused, is this the mod_wsgi script file you are using or not.
If not and this is the mod_python one, please post the one for
mod_wsgi.

Also explain why you aren't following guidelines in:

http://code.google.com/p/modwsgi/wiki/IntegrationWithCherryPy

Namely, should you run in 'embedded' mode and not 'production'. There
are important differences with how CherryPy sets things up and not
running in 'embedded' mode may cause problems. Running in 'production'
doesn't mean it will run faster, 'embedded' mode is the correct mode
for mod_python and mod_wsgi.

Also get rid of the whole CherryPy config bit as 'embedded' mode does
most of that. Also can you not log to an alternate file and instead
let any output go through to Apache error log.

In other words, follow the recipe in the documentation instead and
then indicate what issues there are.

Graham

Graham Dumpleton

unread,
Feb 26, 2008, 4:14:08 PM2/26/08
to mod...@googlegroups.com
One more thing. Don't use:

WSGIScriptAliasMatch .* /home/sombshet/cgi-bin/run_wsgi.wsgi

Use:

WSGIScriptAlias / /home/sombshet/cgi-bin/run_wsgi.wsgi

I'd have to check, but that in itself may be the problem as what you
are using may be resulting an a separate instance of your application
for every URL. ie., separate interpreter instance for every URL.

Ensure you enable:

LogLevel info

The Apache error log will then show you lots of information about when
sub interpreters are being created. If you keep seeing new ones
created that is the problem.

Graham

Graham Dumpleton

unread,
Feb 26, 2008, 8:14:18 PM2/26/08
to mod...@googlegroups.com
Have done tests and this is why you would have had problems.

The reason is that by default mod_wsgi will create sub interpreters
for each application. What is an application is determined by value of
SCRIPT_NAME. Problem is that for .* on pattern side like that,
SCRIPT_NAME is set to the complete URL. Thus mod_wsgi thinks that
every URL is a different application and creates a different sub
interpreter for it.

I am though surprised though that CherryPy even worked unless it
doesn't actually honour SCRIPT_NAME as mount point correctly and
instead falls back to REQUEST_URI instead anyway and relies on its own
internally configured base url parameter.

Pete H

unread,
Feb 27, 2008, 12:02:25 PM2/27/08
to modwsgi
Graham,

Thanks for your help. The problem was indeed fixed by replacing the
WSGIScriptAliasMatch line by the WSGIScriptAlias line you suggested. I
had noticed that I got a lot of sub-interpreters spawned, but the
significance of that didn't occur to me.

As for your other point,

Q Why am I not using 'embedded' mode?

A Because a) 'production' plus 'engine.SIGHUP': None and
'engine.SIGTERM': None is identical to 'embedded' and thus works, and
b) according to the comment in _cpconfig.py embedded mode is intended
'for use when cherrypy is embedded in another deployment stack' which
is not the case here. It seems to me to be better to use things for
their intended purpose, and if what I'm doing is equivalent _now_ that
may not always be the case and I shan't get burnt later if the config
for 'embedded' changes for some reason..


SCRIPT_NAME is set to an empty string by the time I get to set
application = cherrypy.tree

Pete

Brian Smith

unread,
Feb 27, 2008, 2:11:27 PM2/27/08
to mod...@googlegroups.com
Graham Dumpleton wrote:
> One more thing. Don't use:
>
> WSGIScriptAliasMatch .* /home/sombshet/cgi-bin/run_wsgi.wsgi
>
> Use:
>
> WSGIScriptAlias / /home/sombshet/cgi-bin/run_wsgi.wsgi
>
> I'd have to check, but that in itself may be the problem as
> what you are using may be resulting an a separate instance of
> your application for every URL. ie., separate interpreter
> instance for every URL.

Maybe, mod_wsgi should use (mod_wsgi.application_group, script_filename)
as the key for the mapping, instead of script_name. That way, multiple,
disjoint URLs can get mapped to the same application.

Otherwise, is WSGIScriptAliasMatch really needed? I admit that I use it
because I have dozens of testcase WSGI applications in a directory. But,
that is hardly typical usage. I wouldn't be sad if it was removed.

It seems like we should be able to replace ScriptAliasMatch with
WSGIScriptAliasMatch to switch from CGI to mod_wsgi. However, that is
almost always going to be a very bad idea, like in this example.
WSGIScriptAlias should be used in almost every case.

- Brian

gert

unread,
Feb 27, 2008, 3:21:06 PM2/27/08
to modwsgi
And how do i suppose to do this without WSGIScriptAliasMatch ?

WSGIScriptAliasMatch "^/([^/]+)/servlet" "/srv/trunk/wsgi/$1.py"

Graham Dumpleton

unread,
Feb 27, 2008, 3:32:43 PM2/27/08
to mod...@googlegroups.com
On 28/02/2008, Pete H <pe...@ssbg.zetnet.co.uk> wrote:
> As for your other point,
>
> Q Why am I not using 'embedded' mode?
>
> A Because a) 'production' plus 'engine.SIGHUP': None and
> 'engine.SIGTERM': None is identical to 'embedded' and thus works, and
> b) according to the comment in _cpconfig.py embedded mode is intended
> 'for use when cherrypy is embedded in another deployment stack' which
> is not the case here.

Actually that is the case. Running inside mod_python or mod_wsgi is
exactly what is meant by embedding it in another deployment stack. :-)

Graham

Brian Smith

unread,
Feb 27, 2008, 4:11:09 PM2/27/08
to mod...@googlegroups.com
gert wrote:
> WSGIScriptAliasMatch "^/([^/]+)/servlet" "/srv/trunk/wsgi/$1.py"

I'm not sure what you are trying to accomplish with that kind of
mapping, so I can only speculate. If you don't have many applications,
it is easy to just use WSGIScriptAlias for each one. If you do have a
lot of applications, then the above mapping will likely waste tons of
memory, so it would be better to combine the apps together into a bigger
one.

I'd also use mod_rewrite to get rid of the "/servlet" suffix on all your
URLs, as it seems to be getting in the way.

Regards,
Brian

Pete H

unread,
Feb 27, 2008, 4:58:46 PM2/27/08
to modwsgi
OK, I stand corrected.

I thought this referred to cases such as Turbogears where CherryPy is
incorporated in another framework.

Pete

On Feb 27, 8:32 pm, "Graham Dumpleton" <graham.dumple...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages