Dangers of excess memory usage in multithreaded systems.

2 views

Skip to first unread message

Graham Dumpleton

unread,

Mar 10, 2008, 10:03:24 PM3/10/08

to Trac Development

I know there is a separate thread about memory usage at the moment,
but the core of that seems to be looking at memory leaks.

Independent of that I thought I might raise a general problem, which
from what I have seen gets all but ignored by Python web developers.
The problem is the memory usage implications from using
multithreading, as would be the case under Apache worker MPM when
using mod_python or mod_wsgi, fastcgi, and perhaps even tracd (if it
runs as multithreaded).

Note that I am not talking here about the base per thread memory
requirement for each thread stack, as much as some seem to want to
blame that at times, but the implications of concurrent threads
executing code within an application at the same time.

To explain, I include a quote from message where I was talking about
this on Python WEB-SIG.

"""The one area where memory usage can be a problem with Python web
applications and which is not necessarily understood well by most
people, is the risk of concurrent requests causing a sudden burst in
memory usage. Imagine a specific URL which needs a large amount of
transient memory, for example something which is generating PDFs using
reportlab and PIL. All is okay if the URL only gets hit by one request
at a time, but if multiple requests hit at the same time, then your
memory blows out considerably as each request needs the large amount
of transient memory at the same time and once allocated it will be
retained by the process.

So, if one was using worker MPM to keep down the number of overall
processes and memory usage, you run the risk of this sort of problem
occurring. One could stop it occurring by implementing throttling in
the application, that is put locking on specific URLs which consumed
lots of transient memory to restrict number of concurrent requests,
but frankly I have never actually ever heard of anyone actually doing
it.

The alternative is to use prefork MPM, or similar model, such that
there can only be one active request in the process at a time. But
then you need more processes to handle the same number of requests, so
overall memory usage is high again. For large sites however, which can
afford lots of memory, using prefork would be the better way to go as
it will at least limit the possibilities of individual processes
spiking memory usage unexpectedly, with memory usage being more
predictable."""

In the other thread on memory usage, there was the comment:

"""For what it's worth /query and /timeline also have large memory
footprints for larger result sets. """

It would seem that these operations might in particular be especially
prone to causing blowouts in memory if hit concurrently by multiple
requests.

Anyway, I know that this is perhaps nothing to do any problems with
memory leakage that people are trying to battle, but may in general
may be an issue for people trying to run any Python web application in
a memory constrained environment, whereby they are using mulithreading
to keep the number of processes down.

Will be most interested in peoples comments and whether you see it as
relevant to Trac or whether you are doing anything to counter this
sort of problem already.

Graham

Reply all

Reply to author

Forward

0 new messages