Limit the memory of each daemon process ...

Mic Pringle

unread,

Sep 3, 2008, 10:51:54 AM9/3/08

to modwsgi

Hi,

I'm looking to use mod_wsgi to implement a shared hosting scheme for
Django and was wondering if it was possible to set a limit/cap on the
memory used by each daemon child process, so that if one Django apps
starts sucking up all available memory it won't affect any other sites
hosted or the Linux box itself ?

This would also allow me to advertise a guaranteed memory allocation
for each hosted application.

Thanks in advance.

-Mic Pringle

Graham Dumpleton

unread,

Sep 3, 2008, 7:18:39 PM9/3/08

to mod...@googlegroups.com

2008/9/4 Mic Pringle <micpr...@gmail.com>:

This issue exists to look at adding options for setting both CPU and
memory limits.

http://code.google.com/p/modwsgi/issues/detail?id=21

I haven't actually tested it, but in the interim the following may work for you:

WSGIDaemonProcess customer-1 <other options>
WSGIImportScript /some/path/setlimits.wsgi process-group=customer-1
application-group=%{GLOBAL}

The script /some/path/setlimits.wsgi would then contain:

import resource

RSSMAX = 60*1024*1024
resource.setrlimit(resource. RLIMIT_RSS, (RSSMAX, RSSMAX))

That is, create a WSGIImportScript directive for each daemon process
group configured. The script indicated would be loaded and run when
processes created. You can then set limits in there.

For memory, allocating too much memory will fail. You might look at
other limits that could be set as well to control run away processes,
like CPU limit, but unfortunately to actually have exceeding CPU limit
trigger a graceful shutdown of process when soft limit reached would
require some stuff done in mod_wsgi as it would need to catch the
appropriate signal and recycle process when it sees it.

If you try this can you please provide feedback and indicate whether
it does indeed work as expect it would.

Thanks.

Graham

William Dode

unread,

Sep 4, 2008, 2:48:50 AM9/4/08

to mod...@googlegroups.com

On 03-09-2008, Mic Pringle wrote:
>
> Hi,
>
> I'm looking to use mod_wsgi to implement a shared hosting scheme for
> Django and was wondering if it was possible to set a limit/cap on the
> memory used by each daemon child process, so that if one Django apps
> starts sucking up all available memory it won't affect any other sites
> hosted or the Linux box itself ?

One solution is to monitor the process with an external script using the
ouput of ps. It will be more flexible, for example you could send a mail
to the webmaster before killing the process.

Munin can help you for that.
http://munin.projects.linpro.no

bye

--
William Dodé - http://flibuste.net
Informaticien Indépendant

Mic Pringle

unread,

Sep 4, 2008, 4:14:44 AM9/4/08

to mod...@googlegroups.com

Hi Graham,

Thanks for the detailed response.

I will definitley get back to you with my findings, but this project
is still in a very early stage so it may be later rather than sooner.

On a side note, do you have any other recommendations/interesting
tid-bits that could be useful for setting up a shared hosting
environment ?

Thanks

-Mic

2008/9/4 Graham Dumpleton <graham.d...@gmail.com>:

Clodoaldo

unread,

Sep 4, 2008, 6:27:59 AM9/4/08

to mod...@googlegroups.com

2008/9/4 Mic Pringle <micpr...@gmail.com>:

>
> Hi Graham,
>
> Thanks for the detailed response.
>
> I will definitley get back to you with my findings, but this project
> is still in a very early stage so it may be later rather than sooner.
>
> On a side note, do you have any other recommendations/interesting
> tid-bits that could be useful for setting up a shared hosting
> environment ?

This thread has lots of info:

http://groups.google.com/group/modwsgi/browse_thread/thread/aae1bf2787e393f3

Regards, Clodoaldo

Graham Dumpleton

unread,

Sep 4, 2008, 7:12:58 AM9/4/08

to mod...@googlegroups.com

2008/9/4 Clodoaldo <clodoal...@gmail.com>:

>
> 2008/9/4 Mic Pringle <micpr...@gmail.com>:
>>
>> Hi Graham,
>>
>> Thanks for the detailed response.
>>
>> I will definitley get back to you with my findings, but this project
>> is still in a very early stage so it may be later rather than sooner.
>>
>> On a side note, do you have any other recommendations/interesting
>> tid-bits that could be useful for setting up a shared hosting
>> environment ?
>
> This thread has lots of info:
>
> http://groups.google.com/group/modwsgi/browse_thread/thread/aae1bf2787e393f3

I never did finish up with the remaining installments on that thread.

Bad me. :-(

Graham

Brian Smith

unread,

Sep 4, 2008, 3:43:37 PM9/4/08

to mod...@googlegroups.com

Mic Pringle wrote:
> I will definitley get back to you with my findings, but this
> project is still in a very early stage so it may be later
> rather than sooner.
>
> On a side note, do you have any other
> recommendations/interesting tid-bits that could be useful for
> setting up a shared hosting environment ?

You cannot securely do shared hosting for mutually untrusted users with
mod_wsgi unless you give each user their own Apache instance. Search the
mailing list archive for "fork", "exec", and "spawn" to read the previous
discussions about this. There are a lot of hosting provider doing that. I
believe at least one (WebFaction?) offers mod_wsgi hosting already.

Regards,
Brian

Brian Smith

unread,

Sep 4, 2008, 3:57:47 PM9/4/08

to mod...@googlegroups.com

William Dode wrote:
> On 03-09-2008, Mic Pringle wrote:
> > I'm looking to use mod_wsgi to implement a shared hosting
> > scheme for Django and was wondering if it was possible to
> > set a limit/cap on the memory used by each daemon child
> > process, so that if one Django apps starts sucking up all
> > available memory it won't affect any other sites hosted
> > or the Linux box itself ?
>
> One solution is to monitor the process with an external
> script using the ouput of ps. It will be more flexible, for
> example you could send a mail to the webmaster before killing
> the process.

I am very interested in this problem too. I've looked at it on and off for a
long time and it is really difficult. William's suggestion is probably the
best way to do it. Almost any kind of memory-limiting scheme is going to be
something that needs manual intervention by the sysadmin because memory use
patterns for applications differ wildly.

You have to be careful with how you count memory. If you sum the total RSS
of each process then you will be double-counting memory that is being shared
between processes. If you only sum total private RSS then you can end up
with situations where a set of applications is over the limit, but no single
process is "large" because a huge amount of mmaped/shared memory is being
used.

Also, you need to look at the cost of killing the process. If you kill the
process, will the application just restart and reload everything back into
memory? If so, you haven't saved any memory, and you will be killing your
disk buffer cache and your CPUs with all the churn.

Note that, according to the man page [1], setrlimit(RLIMIT_RSS,...) hasn't
worked since Linux 2.4.30.

[1] http://linux.die.net/man/2/setrlimit

Regards,
Brian

Brian Smith

unread,

Sep 4, 2008, 4:07:57 PM9/4/08

to mod...@googlegroups.com

What operating system are you using? Solaris 10 (with Zones/Containers) is a
lot more capable than Linux in this area. But, even with Zones, getting
memory to be shared fairly between non-cooperative applications is
difficult. Even if you could limit a process/zone to a specific amount of
RSS, you still have to deal with the buffer cache contending with
applications for memory.

AFAIK, If you really need to ensure a memory allocation per customer on the
same system then you need to use Xen, VMWare, or similar virtualization
technologies.

- Brian

Graham Dumpleton

unread,

Sep 4, 2008, 7:11:29 PM9/4/08

to mod...@googlegroups.com

2008/9/5 Brian Smith <br...@briansmith.org>:

How true that is depends on what other Apache modules have been loaded.

If the Apache instance is dedicated to Python hosting and you are not
also running stuff like PHP and so have trimmed down the number of
Apache modules in use to the absolute minimum required and perhaps
don't use HTTPS, then you have eliminated most of the unknowns as far
as what has been inherited into your process space.

What this leaves you with at present is inherited log file descriptors
and the apache scoreboard. Most other stuff inherited from Apache
parent is probably pretty innocuous.

Closing off log file descriptors in daemon processes for virtual hosts
other than that which the daemon is for can eliminate that issue,
leaving only the scoreboard. All that needs is a bit of homework as to
whether it is safe to close scoreboard off in daemon processes. This
may only be problematic because of mod_wsgi still using some of the
Apache output filters in daemon processes and thus low level Apache
functions. If they try and access scoreboard may be an issue if it is
closed off.

One could also try and improve security further through measures
borrowed from suEXEC such as:

http://code.google.com/p/modwsgi/issues/detail?id=96

So, at this stage I don't believe it is a totally lost cause, its just
needs further auditing and a list of Apache modules drafted up which
could be used at the same time and not cause an undue security risk.

Graham

Brian Smith

unread,

Sep 4, 2008, 11:58:19 PM9/4/08

to mod...@googlegroups.com

Graham Dumpleton wrote:
> 2008/9/5 Brian Smith <br...@briansmith.org>:

> > You cannot securely do shared hosting for mutually untrusted users
> > with mod_wsgi unless you give each user their own Apache instance.
>

> How true that is depends on what other Apache modules have
> been loaded.
>
> If the Apache instance is dedicated to Python hosting and you
> are not also running stuff like PHP and so have trimmed down
> the number of Apache modules in use to the absolute minimum
> required and perhaps don't use HTTPS, then you have
> eliminated most of the unknowns as far as what has been
> inherited into your process space.

> What this leaves you with at present is inherited log file
> descriptors and the apache scoreboard. Most other stuff
> inherited from Apache parent is probably pretty innocuous.

As far as I understand, if the Apache child process has done anything with
any sensitive data (e.g. password files, authorization headers), or it has
processed any sensitive requests (whether or not they use HTTPS), then
remnants of that sensitive data will be available to the every mod_wsgi
application that runs in that child process (in embedded mode) or that is
forked from that child process (in daemon mode). Using ctypes, an
application could constantly dump its heap to a file, and then grep the
dumped files for "Authorization:", "password", etc. I don't know offhand if
a mod_wsgi application can create core dumps programmatically, but if so
then the process is even easier.

> So, at this stage I don't believe it is a totally lost cause,
> its just needs further auditing and a list of Apache modules
> drafted up which could be used at the same time and not cause
> an undue security risk.

I never wanted to imply that it was a lost cause. That is why I suggested a
solution (one customer per Apache instance). Another possible solution would
be to create a new mode for mod_wsgi that did a fork+exec like most other
WSGI gateways (mod_fcgid, Phusion Passenger, mod_cgid). The first option
will create a very robust system that is very easy to audit; it has been
used by WebFaction and quite a few other hosts and it seems to work well.
The second option *might* be more memory efficient (probably not) but it
still has a large attack surface for vulnerabilities.

That said, I *do* think it is impossible--even theoretically--to secure the
current embedded and daemon modes for use by multiple uncooperative users.

Regards,
Brian

Graham Dumpleton

unread,

Sep 5, 2008, 12:24:35 AM9/5/08

to mod...@googlegroups.com

2008/9/5 Brian Smith <br...@briansmith.org>:

You misunderstand how mod_wsgi daemon mode works. The daemon processes
are forked from the Apache parent process not from the Apache child
worker process and the Apache parent process never handles any actual
requests.

The only risk then is from data left over from initialisation of any
Apache modules, or the core, in the Apache parent process. Thus stuff
like open log files for unrelated virtual hosts, the scoreboard used
to communicate between Apache processes etc. Thus why it is important
to looking at ensuring stuff like that closed off where it can.

Besides that don't believe there would be anything too bad in core
Apache modules that would be inherited across from Apache parent
process to daemon processes.

Even for authentication modules that connect to backend systems to
work, eg. LDAP, would expect that the connection would be created in
the Apache child worker process and not the parent, as each child
worker process would need their own connection. Thus wouldn't have a
situation where would inherit open file descriptor for that from
Apache parent process. Would be concerned obviously where login
credentials for authentication system were stored in Apache
configuration file as that could be accessible, but then most times
the Apache configuration files are world readable anyway.

Obviously if using more complicated third party modules, there may be
issues. For example, PHP preloads lots of stuff in the context of the
Apache parent process before any child worker processes or daemon
processes are forked. This is why PHP is such a PITA in respect of
shared library conflicts.

>> So, at this stage I don't believe it is a totally lost cause,
>> its just needs further auditing and a list of Apache modules
>> drafted up which could be used at the same time and not cause
>> an undue security risk.
>
> I never wanted to imply that it was a lost cause. That is why I suggested a
> solution (one customer per Apache instance). Another possible solution would
> be to create a new mode for mod_wsgi that did a fork+exec like most other
> WSGI gateways (mod_fcgid, Phusion Passenger, mod_cgid). The first option
> will create a very robust system that is very easy to audit; it has been
> used by WebFaction and quite a few other hosts and it seems to work well.
> The second option *might* be more memory efficient (probably not) but it
> still has a large attack surface for vulnerabilities.

And I have in mind how a fork/exec model could be added on later on as
another option, but it would not replace what is there now. The main
reason for the fork/exec approach though would be to allow multiple
Python versions to be supported. How it would work would still though
be very tightly integrated and not be a FASTCGI/SCGI like system where
the interface was the socket protocol. The mod_wsgi module would still
control both halves and so no backend adapter required.

> That said, I *do* think it is impossible--even theoretically--to secure the
> current embedded and daemon modes for use by multiple uncooperative users.

Embedded mode definitely agree. At the moment you seem to have
misunderstanding about how daemon mode processes are setup, so lets
see on that point.

Graham

Mic Pringle

unread,

Sep 5, 2008, 4:26:40 AM9/5/08

to mod...@googlegroups.com

Looks like I've touched on quite an interesting subject :-)

>Also, you need to look at the cost of killing the process. If you kill the
>process, will the application just restart and reload everything back into
>memory? If so, you haven't saved any memory, and you will be killing your
>disk buffer cache and your CPUs with all the churn.

(Well written) Django applications in particular should use a pretty
much consistent amount of memory, usually well under 60mb. The reason
I'm looking to cap memory usage is for poorly written/tested apps
where they may fall into such things as infinite loops, poorly written
database access etc. It would be unfair if these applications were to
impact others. Killing and reloading the application would reduce
memory consumption in the short term, up until the code responsible
for the increase in memory usage is executed again. Then, perhaps
using something like Monit, you could kill the process a set number of
times before not restarting altogether and contacting the user. This
way you would limit churn.

>AFAIK, If you really need to ensure a memory allocation per customer on the
>same system then you need to use Xen, VMWare, or similar virtualization
>technologies.

I am looking to use Amazons EC2 for this project, especially now they
have introduced persistent storage, so that means using any type of
virtualization software is a no go.

Also, allowing each user to have their own Apache instance, IMHO, is
not the best way to do shared hosting, in particular for Django. If
there is a way to be able to restart applications without requiring a
restart of Apache then that solution needs to be investigated
exhaustively before considering anything else. Also, because I am
solely concentrating on Django hosting at this time, I would be using
an extremely stripped back, lightweight Apache instance, no PHP, no
overhead of unused modules, only the bare essentials along with
mod_wsgi would be available. So that should cut out some of the
security worries mentioned.

I shall look forward to see what else this thread brings :-)

-Mic

2008/9/5 Graham Dumpleton <graham.d...@gmail.com>:

Graham Dumpleton

unread,

Sep 5, 2008, 5:40:20 AM9/5/08

to mod...@googlegroups.com

2008/9/5 Mic Pringle <micpr...@gmail.com>:

>
> Looks like I've touched on quite an interesting subject :-)
>
>>Also, you need to look at the cost of killing the process. If you kill the
>>process, will the application just restart and reload everything back into
>>memory? If so, you haven't saved any memory, and you will be killing your
>>disk buffer cache and your CPUs with all the churn.
>
> (Well written) Django applications in particular should use a pretty
> much consistent amount of memory, usually well under 60mb. The reason
> I'm looking to cap memory usage is for poorly written/tested apps
> where they may fall into such things as infinite loops, poorly written
> database access etc. It would be unfair if these applications were to
> impact others. Killing and reloading the application would reduce
> memory consumption in the short term, up until the code responsible
> for the increase in memory usage is executed again. Then, perhaps
> using something like Monit, you could kill the process a set number of
> times before not restarting altogether and contacting the user. This
> way you would limit churn.

When using mod_wsgi daemon mode you don't need monit as mod_wsgi
itself does the process monitoring and will restart processes when
they die. Thus, if using external monitoring scripts to look at memory
usage, they just need to send a SIGTERM signal to the daemon process
and it will perform a shutdown. The mod_wsgi monitoring code in Apache
parent process will see it die and restart it.

In order to make it easier to identify mod_wsgi daemon processes, use
the display-name option to WSGIDaemonProcess. That way you can
uniquely name processes in the group specific to a user as they appear
in 'ps' output.

>>AFAIK, If you really need to ensure a memory allocation per customer on the
>>same system then you need to use Xen, VMWare, or similar virtualization
>>technologies.
>
> I am looking to use Amazons EC2 for this project, especially now they
> have introduced persistent storage, so that means using any type of
> virtualization software is a no go.
>
> Also, allowing each user to have their own Apache instance, IMHO, is
> not the best way to do shared hosting, in particular for Django. If
> there is a way to be able to restart applications without requiring a
> restart of Apache then that solution needs to be investigated
> exhaustively before considering anything else.

With mod_wsgi daemon mode it defaults to process reloading when WSGI
script file is changed. Thus, without user needing to restart whole of
Apache, they can restart just their own daemon processes for their
application by touching the WSGI application script file.

For details see:

http://code.google.com/p/modwsgi/wiki/ReloadingSourceCode

Graham

Brian Smith

unread,

Sep 5, 2008, 11:09:45 AM9/5/08

to mod...@googlegroups.com

Graham Dumpleton wrote:
> 2008/9/5 Brian Smith <br...@briansmith.org>:

> >> What this leaves you with at present is inherited log file
> >> descriptors and the apache scoreboard. Most other stuff inherited
> >> from Apache parent is probably pretty innocuous.

> > As far as I understand, if the Apache child process has
> > done anything with any sensitive data (e.g. password files,
> > authorization headers), or it has processed any sensitive
> > requests (whether or not they use HTTPS), then remnants
> > of that sensitive data will be available to the every
> > mod_wsgi application that runs in that child process (in
> > embedded mode) or that is forked from that child process (in daemon
> > mode).

> You misunderstand how mod_wsgi daemon mode works. The daemon

> processes are forked from the Apache parent process not from
> the Apache child worker process and the Apache parent process
> never handles any actual requests.

Thank you for pointing out my mistake. I admit that I found the code that
deals with this hard to follow (mostly because I am not very knowledgeable
about Apache's API). The key is to realize that wsgi_hook_init is only
executed in the Apache parent process (apparently--this isn't called out
explicitly in the documentation), and that it calls wsgi_start_daemons.
wsgi_start_daemons calls wsgi_start_process to do the forking, which happens
in the parent process. The only other place wsgi_start_process is called is
in wsgi_manage_process; however, wsgi_manage_process can only be called in
response to a signal from the forked child process, so it too is only
executed in the parent process.

Now I understand why issue #33 (transient daemon processes) is tricky--you
need the parent process to fork a process in response to a request that is
being handled by a child process.

> > That said, I *do* think it is impossible--even theoretically--to
> > secure the current embedded and daemon modes for use by
> multiple uncooperative users.
>
> Embedded mode definitely agree. At the moment you seem to
> have misunderstanding about how daemon mode processes are
> setup, so lets see on that point.

I agree that the daemon mode might not be impossible. And, I also agree that
mod_wsgi should close unnecessary file descriptors and do anything else it
can do to improve security of the daemon (and embedded) modes.

That said, even just the core of Apache (without any modules) is a lot to
audit. If someone is going to strip Apache down to the minimum number of
modules then the cost of running multiple separate Apache instances is going
to be very low--about the same as running one Apache instance. A front-end
proxy would be needed, but the cost of that proxy would be offset by the
ability to safely use embedded mode instead of daemon mode in the back-end
Apache servers--effectively, you are just replacing one proxy with another.
Again, that seems to be what Webfaction does.

Again, thank you for correcting my misunderstanding on the process
management.

Regards,
Brian

William Dode

unread,

Sep 20, 2008, 12:45:27 PM9/20/08

to mod...@googlegroups.com

On 03-09-2008, Graham Dumpleton wrote:
>
> 2008/9/4 Mic Pringle <micpr...@gmail.com>:
>>
>> Hi,
>>
>> I'm looking to use mod_wsgi to implement a shared hosting scheme for
>> Django and was wondering if it was possible to set a limit/cap on the
>> memory used by each daemon child process, so that if one Django apps
>> starts sucking up all available memory it won't affect any other sites
>> hosted or the Linux box itself ?
>>
>> This would also allow me to advertise a guaranteed memory allocation
>> for each hosted application.
>
> This issue exists to look at adding options for setting both CPU and
> memory limits.
>
> http://code.google.com/p/modwsgi/issues/detail?id=21
>
> I haven't actually tested it, but in the interim the following may work for you:
>
> WSGIDaemonProcess customer-1 <other options>
> WSGIImportScript /some/path/setlimits.wsgi process-group=customer-1
> application-group=%{GLOBAL}
>
> The script /some/path/setlimits.wsgi would then contain:
>
> import resource
>
> RSSMAX = 60*1024*1024
> resource.setrlimit(resource. RLIMIT_RSS, (RSSMAX, RSSMAX))

I just try it, and now i remember that on linux RLIMIT_RSS and
RLIMIT_DATA doesn't work, only RLIMIT_AS works.

http://www.haypocalc.com/wiki/Mémoire

Reply all

Reply to author

Forward