Memory usage

Miles

unread,

Jun 11, 2009, 9:51:04 PM6/11/09

to Django users

Or rather - how to keep the server from blowing up. I've searched the
web before, but nothing I've found solves the problem.

Some background info - I'm running worldoflogs.com, a site that gets
around 100 concurrent requests during rush hour and still growing
rather rapidly. Django powers the frontend and custom java code does
the number crunching on the backend. Running Django via mod_wsgi,
multiprocess, apache-prefork.

We ran into trouble this week when all WSGI workers were busy serving
requests, apache queueing requests and page load times bouncing up and
down from 0 to 20s. Easy fix: increase amount of processes, until you
run out of ram. ps showed that each python process took at least 100M
and half of them at 150M+, so getting past 60 processes was no go.

I used Dozer to see if anything leaked between requests: nope. Okay...
There are no objects alive, yet the memory usage rises after serving
requests; running JMeter to fire off infinite requests to apache
raised memory used from ~30M after 1 request to 200M+ after 250,
stopped the benchmark after that.

Maybe python doesn't free up memory from it's heap? I know that java
with the default GC options does that. Bingo. The following 4 lines
solved it for us:

class GCMiddleware(object):
def process_request(self, request):
import gc
gc.collect()

Yeah. It was that simple. Memory usage went from insane to 50M and
stable, even after thousands of requests to a single worker. If you're
running out of ram but got CPU cycles to spare, do a full gc before
every request. We had 85% idle time on the CPU, but RAM utilization
was at 80% and I don't dare raising the limits further, running into
swap kills the server instantly.

It's silly how much attention GC gets on java and none at all on
python, especially on a server memory tends to be a problem under load
- if you go the multiprocess way instead of using threads. That's the
main reason we use java on the backend - threads. This is not Django's
fault, it's just python that tries to minimize GC time - what's good
for one app is poison for another, and python's default GC behavior is
quite evil in this case.

This "solution" is quite crude, but tuning the garbage collector with
set_threshold is an a pain in the backside; what I would like to see
is a simple collector, like java's new generation: if full: collect;
if free memory after collection <= min_free or >= max_free, resize
heap to follow it.

With threshold on the default 700/10/10, we run into minor collections
all the time, promoting objects quickly from gen 0 to 1 to 2 and
requiring a full collection to get it out of there. Trying to get the
size right is impossible without an equivalent for -XX:+PrintGC -XX:
+PrintGCDetails -XX:+PrintGCTimeStamps, things either get promoted too
quickly or never, making any collection as expensive as a full GC.

If someone has an idea how to get memory usage at about the same with
lower cpu cost than a full GC every request, please tell.

Graham Dumpleton

unread,

Jun 11, 2009, 10:43:25 PM6/11/09

to Django users

In Python, garbage collecting is not real garbage collection like
other systems. Primarily, Python uses reference counting on objects
and so is able to reclaim memory as soon as last reference is gone.
The problem, and where GC kicks in in Python, is where there are
object reference count loops. That is, where a network of objects have
mutual references to each other.

If you can, perhaps try and determine what objects it is that the GC
is having to deal with. In some cases the design of objects can be
changed to avoid the GC needing to kick in to reclaim them.

Graham

Miles

unread,

Jun 12, 2009, 7:28:48 AM6/12/09

to Django users

On Jun 12, 4:43 am, Graham Dumpleton <Graham.Dumple...@gmail.com>
wrote:

Almost everything refers to and from a certain page variable, which
contains the state of the whole page, with lists to in memory and
database objects.

Page: contains a list of actors (read: players / creatures). Contains
many other variables, like current report (read: url).
Actor: contains a reference to page, so actor.link() knows where to
point the a href to.

And that's the case for many other objects, basically anything that
needs to dereference some id to an object.

We could rewrite everything from OO to page.actor_get_link(actor), but
in templates where the convention for tags / macros (jinja2 \o/) is
only pass what it really needs to know, passing page around on every
call will make the code really ugly to work with.

Unrelated: I found out about gc.set_debug(gc.DEBUG_STATS); It's pretty
useful, but:

1. What's the idea of printing time.time() to stderr like 3 million
times. And elapsed. It does that for every ref cleared?!
2. Auto collect triggers gen 2 collection frequently, live object
count is exactly the same as manual collection, but memory used
doesn't drop. Weird. Sample line: [Fri Jun 12 04:20:07 2009] [error]
gc: done, 8193 unreachable, 0 uncollectable.
3. gc.gc() gives 8193 as well, but it *does* get the RSS down to
acceptable levels.

Bug? I've glanced around in gcmodule.c casually, but couldn't find
anything wrong with that. It might be some weird interaction with
other code, but as a complete newbie to Python internals, I probably
can't track that down.

Miles

Reply all

Reply to author

Forward