Something is eating memory. How to track it down?

1520 views
Skip to first unread message

AndyB

unread,
Apr 7, 2009, 5:24:40 PM4/7/09
to Django users
I've got a Django app that seems to eat up a lot of memory. I posted a
message on Stack Overflow and it got a little sidetracked into a
debate about the merits of WSGI and Apache in Worker MPM mode.

Webfaction have assured me that playing around withthat kind of thing
is not going to make a significant difference and that my app is to
blame.

Googling around Python and memory profiling led me to heapy but the
documentation for that is written in ancient Sumerian Even this very
helpful page ( http://blog.redinnovation.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy
) wasn't quite simple enough for my meagre understanding of Python
internals.

I am hoping for some way to list all the objects in memory. I'm sure I
can do this with with heapy but I don't really understand what it is
telling me.

Should I persevere or is there something a bit easier to understand
that I could be using?

Maybe I am I just too dense for the task at hand :-(

Maybe it's getting late and I deserve a nice gin and tonic...

Christian Joergensen

unread,
Apr 7, 2009, 6:02:21 PM4/7/09
to django...@googlegroups.com
AndyB wrote:
> I've got a Django app that seems to eat up a lot of memory. I posted a
> message on Stack Overflow and it got a little sidetracked into a
> debate about the merits of WSGI and Apache in Worker MPM mode.

First thing - make sure DEBUG is set to off.

If that's not the problem, Let me take a wild shoot in the dark; are you
by any chance looping through a large queryset?

I recently had a proces hog about 1.8 GB RAM when looping through a
queryset with approx. 350k items as:

for obj in Model.objects.all():
do_something(obj)

I rewrote it to:

objs = Model.objects.all().values_list("id", flat=True)
for obj_id in objs:
obj = Model.object.get(pk=obj_id)
do_something(obj)

... and my RAM usage was below 30 MB at all time.

> Maybe it's getting late and I deserve a nice gin and tonic...

Cheers,

--
Christian Joergensen
http://www.technobabble.dk

Alex Gaynor

unread,
Apr 7, 2009, 6:04:24 PM4/7/09
to django...@googlegroups.com
On Tue, Apr 7, 2009 at 6:02 PM, Christian Joergensen <ma...@razor.dk> wrote:

AndyB wrote:
> I've got a Django app that seems to eat up a lot of memory. I posted a
> message on Stack Overflow and it got a little sidetracked into a
> debate about the merits of WSGI and Apache in Worker MPM mode.

First thing - make sure DEBUG is set to off.

If that's not the problem, Let me take a wild shoot in the dark; are you
by any chance looping through a large queryset?

I recently had a proces hog about 1.8 GB RAM when looping through a
queryset with approx. 350k items as:

for obj in Model.objects.all():
    do_something(obj)

I rewrote it to:

objs = Model.objects.all().values_list("id", flat=True)
for obj_id in objs:
    obj = Model.object.get(pk=obj_id)
    do_something(obj)

... and my RAM usage was below 30 MB at all time.

You also executed 350k SQL queries.  A better idea would be to start with:

for obj in Model.objects.all().iterator():
   do_something(obj)
 

> Maybe it's getting late and I deserve a nice gin and tonic...

Cheers,

--
Christian Joergensen
http://www.technobabble.dk



Alex

--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Christian Joergensen

unread,
Apr 7, 2009, 6:16:33 PM4/7/09
to django...@googlegroups.com
Alex Gaynor wrote:
>> I recently had a proces hog about 1.8 GB RAM when looping through a
>> queryset with approx. 350k items as:
>>
>> for obj in Model.objects.all():
>> do_something(obj)
>>
>> I rewrote it to:
>>
>> objs = Model.objects.all().values_list("id", flat=True)
>> for obj_id in objs:
>> obj = Model.object.get(pk=obj_id)
>> do_something(obj)
>>
>> ... and my RAM usage was below 30 MB at all time.
>>
>
> You also executed 350k SQL queries. A better idea would be to start with:

Well, that in itself shouldn't cause that much RAM usage - just longer
execution time.

My guess, without looking at the queryset implementation, is that they
cache earlier, passed by items.

> for obj in Model.objects.all().iterator():
> do_something(obj)

Thank you. I didn't know about that function. That is certainly prettier
than my "hack" :)

I assumed that was the default behaviour when iterating. There shouldn't
be any need to cache previous items, as there (to my knowledge).is no
way to retrieve previous items from a python iterator.

Regards,

andybak

unread,
Apr 7, 2009, 6:20:00 PM4/7/09
to Django users
I've got no really huge tables - the entire db is under 6meg and the
site isn't even public yet so traffic is minimal. Memory just doesn't
seem to go down. (until the process get's restarted by maxchild or by
me killing it).

I have been labouring under the assumption that memory gets freed
eventually and I could ignore the messy details (heck why do you think
I'm not a C++ programmer).

It's a bit of a needle in a haystack at this point. Surely I can do
better than blindly optimize things on the off-chance that I find the
root of the problem?

Andy

On Apr 7, 11:04 pm, Alex Gaynor <alex.gay...@gmail.com> wrote:

Malcolm Tredinnick

unread,
Apr 7, 2009, 8:31:58 PM4/7/09
to django...@googlegroups.com
On Tue, 2009-04-07 at 15:20 -0700, andybak wrote:
> I've got no really huge tables - the entire db is under 6meg and the
> site isn't even public yet so traffic is minimal. Memory just doesn't
> seem to go down. (until the process get's restarted by maxchild or by
> me killing it).

That's completely normal behaviour for Unix-like memory management. Once
memory is freed, it is returned to the process for later reuse (so a
subsequent malloc() call by that process will reuse memory that is had
earlier freed). This helps avoid some causes of memory fragmentation and
allows for contiguous large blocks to be malloc-ed by other processes.

It's also precisely why long running processes are often designed to
have their memory using portions restart periodically. It's the act of
exiting the process at the system level which returns the memory to the
system pool.

Regards,
Malcolm


Andy Baker

unread,
Apr 8, 2009, 3:43:25 AM4/8/09
to django...@googlegroups.com
Wow! So Python will eat RAM until restarted? That changes the way I am thinking about this problem.

So the trick will be tweaking the number of processes and the number of requests allowed between a restart. I know one item requests a big chunk of memory is dynamic PDF generation.

So lets say this is the real RAM pig and eats up 80mb. If I've got 3 processes that could account for 240mb on it's own which you say will never get given back? However - it should never exceed this as once grabbed that RAM will be available to my app.

Even caching won't reduce this as there is always the possibility on all three processes being hit at the same time with nothing in the cache.

So my memory usage will always statistically converge on:
(Amount of RAM used by most expensive request) x (Maximum simultaneous Django processes)
 
and nothing I do with within Django itself can reduce this.

Does that sounds about right?

Andy

Graham Dumpleton

unread,
Apr 8, 2009, 6:05:27 AM4/8/09
to Django users


On Apr 8, 5:43 pm, Andy Baker <andy...@gmail.com> wrote:
> Wow! So Python will eat RAM until restarted? That changes the way I am
> thinking about this problem.
>
> So the trick will be tweaking the number of processes and the number of
> requests allowed between a restart. I know one item requests a big chunk of
> memory is dynamic PDF generation.
>
> So lets say this is the real RAM pig and eats up 80mb. If I've got 3
> processes that could account for 240mb on it's own which you say will never
> get given back? However - it should never exceed this as once grabbed that
> RAM will be available to my app.
>
> Even caching won't reduce this as there is always the possibility on all
> three processes being hit at the same time with nothing in the cache.
>
> So my memory usage will always statistically converge on:
> (Amount of RAM used by most expensive request) x (Maximum simultaneous
> Django processes)

Technically it can be worse than that. Absolute worst case is if
multithreaded processes used with multiprocess web server. In that
case:

(Amount of RAM used by most expensive request) x (Maximum
simultaneous Django processes) x (Maximum simultaneous request threads
per process).

You would probably be unlucky to have so many requests for same URL at
same time, but technically possible.

> and nothing I do with within Django itself can reduce this.
>
> Does that sounds about right?

There are few things one can do.

1. Exec a separate process just for purposes of generating the one
PDF. This way the memory usage is at least transient.

2. Have a backend service, communicated to via something like XML-RPC,
which performs the generation of the PDF, with requests potentially
queued up within it so only generating one PDF at a time.

3. Use Apache/mod_wsgi and use multiple daemon process groups, with
the memory consuming URLs delegated to process group of their own,
distinct from all the other URLs in the application. This means that
you can limit how many fat processes there are, make it single
threaded to ensure that maximum memory equates to how much one PDF
takes and not multiple if multiple concurrent requests. Can also set
process to recycle after fewer number of requests than other parts of
the application, or when idle for too long. For example, if your
Django application isn't thread safe anyway:

WSGIScriptAlias / /some/path/django.wsgi

WSGIDaemonProcess main processes=10 threads=1
WSGIDaemonProcess memory-hungry processes=1 threads=1 maximum-
requests=10 inactivity-timeout=30

WSGIProcessGroup main

<Location /memory/hungry/url>
WSGIProcessGroup memory-hungry
</Location>

If your application is thread safe, then could instead have used:

WSGIDaemonProcess main processes=1 threads=10

Graham

> Andy
>
> On Wed, Apr 8, 2009 at 1:31 AM, Malcolm Tredinnick <malc...@pointy-stick.com

Malcolm Tredinnick

unread,
Apr 8, 2009, 7:30:37 PM4/8/09
to django...@googlegroups.com
On Wed, 2009-04-08 at 08:43 +0100, Andy Baker wrote:
> Wow! So Python will eat RAM until restarted? That changes the way I am
> thinking about this problem.

Well, that's not really accurate, as you realise further down. The
maximum amount of RAM used will not decrease. However, it won't increase
without bound unless you actually require using a larger simultaneous
amount.

[...]


> So my memory usage will always statistically converge on:
> (Amount of RAM used by most expensive request) x (Maximum simultaneous
> Django processes)

Providing the most expensive request is likely to frequently. In many
cases, the most expensive request could well be an outlier that only
happens infrequently. Statistically infrequent events are a smaller
concern, since they'll happen less than once, on average, between
process restarts.

>
> and nothing I do with within Django itself can reduce this.

That's correct. It's not a really bad thing, since, as I mentioned, it's
completely normal Unix process behaviour. And webservers already account
for that with settings like the maximum number of requests per child.
It's also one of the arguments for using multiple threads instead of
multiple processes sometimes (the memory allocator operates on a
per-process granularity, so can be shared between threads).

Definitely something to take into consideration, but it's manageable.

Regards,
Malcolm

andybak

unread,
Apr 9, 2009, 8:47:35 AM4/9/09
to Django users
Thanks everyone. I might offload some big processes into a separate
process. The background knowledge I've aquired from this thread is
going to help a lot.

I still have no flipping idea how to use heapy though!

On Apr 9, 12:30 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
Reply all
Reply to author
Forward
0 new messages