Or rather - how to keep the server from blowing up. I've searched the
web before, but nothing I've found solves the problem.
Some background info - I'm running worldoflogs.com, a site that gets
around 100 concurrent requests during rush hour and still growing
rather rapidly. Django powers the frontend and custom java code does
the number crunching on the backend. Running Django via mod_wsgi,
We ran into trouble this week when all WSGI workers were busy serving
requests, apache queueing requests and page load times bouncing up and
down from 0 to 20s. Easy fix: increase amount of processes, until you
run out of ram. ps showed that each python process took at least 100M
and half of them at 150M+, so getting past 60 processes was no go.
I used Dozer to see if anything leaked between requests: nope. Okay...
There are no objects alive, yet the memory usage rises after serving
requests; running JMeter to fire off infinite requests to apache
raised memory used from ~30M after 1 request to 200M+ after 250,
stopped the benchmark after that.
Maybe python doesn't free up memory from it's heap? I know that java
with the default GC options does that. Bingo. The following 4 lines
solved it for us:
def process_request(self, request):
Yeah. It was that simple. Memory usage went from insane to 50M and
stable, even after thousands of requests to a single worker. If you're
running out of ram but got CPU cycles to spare, do a full gc before
every request. We had 85% idle time on the CPU, but RAM utilization
was at 80% and I don't dare raising the limits further, running into
swap kills the server instantly.
It's silly how much attention GC gets on java and none at all on
python, especially on a server memory tends to be a problem under load
- if you go the multiprocess way instead of using threads. That's the
main reason we use java on the backend - threads. This is not Django's
fault, it's just python that tries to minimize GC time - what's good
for one app is poison for another, and python's default GC behavior is
quite evil in this case.
This "solution" is quite crude, but tuning the garbage collector with
set_threshold is an a pain in the backside; what I would like to see
is a simple collector, like java's new generation: if full: collect;
if free memory after collection <= min_free or >= max_free, resize
heap to follow it.
With threshold on the default 700/10/10, we run into minor collections
all the time, promoting objects quickly from gen 0 to 1 to 2 and
requiring a full collection to get it out of there. Trying to get the
size right is impossible without an equivalent for -XX:+PrintGC -XX:
+PrintGCDetails -XX:+PrintGCTimeStamps, things either get promoted too
quickly or never, making any collection as expensive as a full GC.
If someone has an idea how to get memory usage at about the same with
lower cpu cost than a full GC every request, please tell.