Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tracking Memory Leaks

4 views
Skip to first unread message

John Mitchell

unread,
May 28, 1998, 3:00:00 AM5/28/98
to


Does anyone have any tips for tracking down memory leaks? I have a
long-running application (an ad server) which gains 40M in a few hours
under moderate load.

Currently, I'm using three tools: 1) coffee (yum!), 2) a wrapper around
sys.getrefcount(obj) [enclosed], and 3) eatmemory(), which "serves" a few
hundred ads, then exits.

Does Python have an equivalent of Java's java.lang.Runtime.freeMemory()
call? More importantly, I wish Java had getrefcount() !


>>> dumpmem()
1255 __doc__ None
736 AdRequest <class AdRequest at 1009cee8>
28 sys <module 'sys'>
27 sys <module 'sys'>
20 string <module 'string'>
18 x AdRequest: {}
17 os <module 'os'>
15 time <module 'time'>
14 mysite <module 'mysite'>
14 HardServer <class HardServer at 100a5df0>
13 _console <StringIO instance at 1005de50>
13 __builtins__ <module '__builtin__'>
13 AdServer <function AdServer at 100a0a18>
12 __name__ '__main__'
7 safe_proto <function safe_proto at 101843f0>
6 info n/a
6 index_html <function safe_proto at 101843f0>
6 dumpmem <function dumpmem at 1017cef8>
4 test2 <function test2 at 1017ced0>
4 test <function test at 1017cf48>
4 standard_html <function standard_html at 1017d010>
4 simple_html <function simple_html at 1017cfe8>
4 show_all <function show_all at 1017cf70>
4 show_adgroup_text <function show_adgroup_text at 10184350>
4 show_adgroup_raw <function show_adgroup_raw at 10184328>
4 show_adgroup_html <function show_adgroup_html at 10184378>
4 req_uninstall <function req_uninstall at 1017d088>
4 req_install <function req_install at 1017d060>
4 proto_ad <function proto_ad at 101843c8>
4 open_console <function open_console at 101843a0>
4 main <function main at 1017cf98>
4 fmt_times <function fmt_times at 1017cfc0>
4 eatmemory <function eatmemory at 1017cf20>
4 all_fmt '\012<TR>\012<TD colspan=2 bgcolor="#c0c
4 ad_html <function ad_html at 1017d038>


This dump shows that None is being used a lot, o'course, but also I have
700+ instances of AdRequest, when there should only be, oh, < 10. A leak!

Here's dumpmem() and my version of eatmemory():


def dumpmem():
import sys
info = []
for k,v in vars().items() + globals().items():
info.append( (sys.getrefcount(v), k, v) )
info.sort() ; info.reverse()
for count,varname,value in info:
if `type(value)` in (
"<type 'instance'>", "<type 'module'>",
"<type 'class'>", "<type 'None'>", "<type 'function'>",
"<type 'string'>"
):
valuestr = repr(value)[:40]
else:
valuestr = "n/a"
print '%d\t%-25s %s' % (count, varname, valuestr)

def eatmemory(count=100):
for i in range(count):
proto_ad() # <-- this is my main function

- j


See Also:
sys.getrefcount doc:
http://www.python.org/doc/lib/module-sys.html

Vladimir Marangozov

unread,
May 28, 1998, 3:00:00 AM5/28/98
to

John Mitchel wrote:
>
> Does anyone have any tips for tracking down memory leaks?

If the known (suggested) Python methods don't help, you may wish to
have a look at pymalloc (http://starship.skyport.net/~vlad/pymalloc/)
and see what it can do in such situations. With pymalloc you should
be able to see a limited dump of all unfreed blocks and identify who
allocated them (in the Python core / extensions sources).
Usually this helps in tracking down such problems.

If the problem persists and you don't have a tool like Purify, you could
try with a more powerful debugging malloc library as those listed at:
http://www.cs.colorado.edu/~zorn/MallocDebug.html

--
Vladimir MARANGOZOV | Vladimir....@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Guido van Rossum

unread,
May 28, 1998, 3:00:00 AM5/28/98
to

> Does anyone have any tips for tracking down memory leaks? I have a
> long-running application (an ad server) which gains 40M in a few hours
> under moderate load.

If you recompile all of Python with Py_TRACE_REFS defined, it can
print a list of all objects (not just the ones that are easily
accessible through modules) and their reference counts at the end of
the run. The disadvantage is that this only gives you their types,
not their name; the other side of this coin is that it finds nameless
leakage.

--Guido van Rossum (home page: http://www.python.org/~guido/)


Greg Ewing

unread,
Jun 2, 1998, 3:00:00 AM6/2/98
to

John Mitchell wrote:
>
> Does anyone have any tips for tracking down memory leaks?

Yes - put extreme pressure on Guido to implement
real garbage collection :-)

--
Greg Ewing, Computer Science Dept, | The address below is not spam-
University of Canterbury, | protected, so as not to waste
Christchurch, New Zealand | the time of Guido van Rossum.
gr...@cosc.canterbury.ac.nz

John Mitchell

unread,
Jun 2, 1998, 3:00:00 AM6/2/98
to

> From Guido van Rossum <gu...@CNRI.Reston.Va.US>
> Date Thu, 28 May 1998 15:50:57 GMT
> Newsgroups comp.lang.python


>
> > Does anyone have any tips for tracking down memory leaks? I have a
> > long-running application (an ad server) which gains 40M in a few hours
> > under moderate load.
>
> If you recompile all of Python with Py_TRACE_REFS defined, it can
> print a list of all objects (not just the ones that are easily
> accessible through modules) and their reference counts at the end of
> the run.


I apologize for the silly question, but I'm not quite sure how to read the
memory-pool results. I've recompiled Python1.4 with Py_TRACE_REFS
(thanks) -- this gives me sys.getobjects() -- but how do I see all the
values?

If I just print out everything, one of the objects refers to itself. I'd
like to do a "checkpoint" type of thing: run my program, initialze the
checkpoint, continue the program, use the checkpointer object to print out
all new object references. Something like:

import sys

stringType = type('')

class MemoryState:
def __init__(self):
self.checkpoint()

def checkpoint(self):
self._check = sys.getobjects(0)

def dump(self):
for obj in sys.getobjects(0):
if obj not in self._check:
print obj


How can I dump all my objects, without getting into an infinite loop? Can
I somehow detect the recursive object, and skip it?


thanks


- j


Michael Hudson

unread,
Jun 3, 1998, 3:00:00 AM6/3/98
to

John Mitchell <jo...@magnet.com> writes:
> If I just print out everything, one of the objects refers to itself. I'd
> like to do a "checkpoint" type of thing: run my program, initialze the
> checkpoint, continue the program, use the checkpointer object to print out
> all new object references. Something like:

[snippity snip]


> How can I dump all my objects, without getting into an infinite loop? Can
> I somehow detect the recursive object, and skip it?

Python 1.5 deals with this automagically:
Python 1.5.1 (#5, May 21 1998, 09:12:23) [GCC egcs...]
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> l=[]
>>> l.append(l)
>>> l
[[...]]

HTH.

--
Michael Hudson
Jesus College
Cambridge
mw...@cam.ac.uk

John Mitchell

unread,
Jun 3, 1998, 3:00:00 AM6/3/98
to

On 3 Jun 1998, Michael Hudson wrote:

> John Mitchell <jo...@magnet.com> writes:
> > How can I dump all my objects, without getting into an infinite loop? Can
> > I somehow detect the recursive object, and skip it?
>
> Python 1.5 deals with this automagically:
> Python 1.5.1 (#5, May 21 1998, 09:12:23) [GCC egcs...]
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> l=[]
> >>> l.append(l)
> >>> l
> [[...]]


Hmm: Python 1.5 (not 1.5.1) deals with this by crashing!
Thanks for the example infinite object. Maybe I'll upgrade.

- j

Milton L. Hankins

unread,
Jun 3, 1998, 3:00:00 AM6/3/98
to John Mitchell

On Tue, 2 Jun 1998, John Mitchell wrote:

>...


> If I just print out everything, one of the objects refers to itself.

>...

This sounds like the root of your problem. In case you don't already
know, here's some information: Circular data structures are never
implicitly garbage collected in Python. You need to use the del operator
to manually force deletion.
--
Milton L. Hankins <>< Software Engineer, Raytheon Systems Company
John 3:16 || I can't speak for Raytheon Company, but I often try to
L-I-F-E || speak for my wife and my cat. The latter doesn't mind.


John Mitchell

unread,
Jun 3, 1998, 3:00:00 AM6/3/98
to

On Wed, 3 Jun 1998, Milton L. Hankins wrote:

> On Tue, 2 Jun 1998, John Mitchell wrote:
>
> >...
> > If I just print out everything, one of the objects refers to itself.
> >...
>
> This sounds like the root of your problem. In case you don't already
> know, here's some information: Circular data structures are never
> implicitly garbage collected in Python. You need to use the del operator
> to manually force deletion.

Yes, but this is dumping out everything after a clean startup of Python --
one of the main structures (globals()? __builtins__?) refers to itself.

Didnt know "del obj" would force deletion. Sounds like a hack -- I'd
rather find the self-reference, remove it, and allow Python to do *all* GC
for me instead of relying on my poor soggy brain.


My original problem was this: In an ad server: per request, the number of
AdRequest instances went up by 7, when it shouldve stayed the same -- the
old requests would go away as new ones came in.

What was happening was basically this: I have a dictionary-like class,
which if you set any values on it, it "sanitizes" (trims, lowercases) the
values. If you try to set something it doesnt know about, it complains.

class AdRequest(UserDict):
def __init__(self):
self._sanitizer = {'url':self.sanitize_url}

def sanitize_url(self, key, value):
self.data[key] = string.strip(value)

def __setitem__(self, key, value):
if not has_key(self._sanitizer(key)):
raise KeyError, "%s: not a valid key" % key
self._sanitizer[key](key, value)

The problem was the self._sanitizer dictionary -- since it referred to the
same instance, the AdRequest instance never went away. Since I had 7
different "clean" keys, each AdRequest object created 7 references that
were never GC'd...

Solution:

class AdRequestSanitizer:
def url(self, value):
return string.strip(value)

class AdRequest(UserDict):
def __init__(self):
# create aggregate object:
self._sanitizerObj = AdRequestSanitizer()

def __setitem__(self, key, value):
if not hasattr(self._sanitizerObj, key):
raise KeyError, "%s: not a valid key" % key
self.data[key] = getattr(self._sanitizerObj, key)(value)


I seperated all the value-cleaning methods into a seperate class, and
AdRequest automatically uses it. This is a better design anyway -- I can
use different Sanitizers as I see fit, I can use them w/o the original
AdRequest object, and of course garbage collection is now much cleaner.


Using the dumpmem() produre I posted previously, this class immediately
jumped out at me. That's what I call progress: spend a few minutes, write
some code, and BAM a design problem shows up. I fix it, kick myself for
being a dumbass, and all is well.

My current memory leak is doesnt show up using dumpmem(), so I'm guessing
sonce instances are hogging lots of dictionaries, integers, strings, or
other non-instance objects. These items dont show up on dumpmem(), and I
havent figured out how to use sys.getobjects() to track down my leak.


Other Pointers:

- Vladimir Marangozov's pymalloc:
http://starship.skyport.net/~vlad/pymalloc/

Fascinating visual and numeric information of how Python allocates and
frees memory. This is too low-level for what I want, but it may prove
useful later.

- Python: recompile with Py_TRACE_REFS defined (Guido)

This gives me sys.getobjects(), which is a list of all (named and
anonymous) objects outstanding. No documentation, but not hard to use,
and very powerful if I could figure out how to ignore infinite objects.


- j

Fredrik Lundh

unread,
Jun 4, 1998, 3:00:00 AM6/4/98
to

>> Python 1.5 deals with this automagically:
>
>> Python 1.5.1 (#5, May 21 1998, 09:12:23) [GCC egcs...]
>> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>> >>> l=[]
>> >>> l.append(l)
>> >>> l
>> [[...]]
>
>Hmm: Python 1.5 (not 1.5.1) deals with this by crashing!
>Thanks for the example infinite object. Maybe I'll upgrade.

That was fixed in 1.5.1. Python 1.4 and 1.5 still has the old
behaviour.

Cheers /F
fre...@pythonware.com
http://www.pythonware.com

Tim Peters

unread,
Jun 4, 1998, 3:00:00 AM6/4/98
to

[John Mitchell]
> ...

> Didnt know "del obj" would force deletion. Sounds like a hack --
> I'd rather find the self-reference, remove it, and allow Python to
> do *all* GC for me instead of relying on my poor soggy brain.

There's a confusion here, but rather than sort it out I'll simply Proclaim
that, no, del doesn't force anything to get recycled, ever, period. It just
removes a binding, and after that the same old reference counting mechanisms
may or *may not* free anything up.

E.g.,

class A:
def __del__(s): print 'a deleted'
class B:
def __del__(s): print 'b deleted'
a = A()
b = B()
a.x = b
b.x = a
print "doing del a"; del a
print "doing del b"; del b
print "going away"

That prints:

doing del a
doing del b
going away

So the circular trash never goes away.

But stick a "del a.x" before the first print, and the output changes to:

doing del a
doing del b
b deleted
a deleted
going away

That's nothing magic about del, though! The same thing would happen if you,
e.g., stuck "a.x = 3.14" before the print. Either way simply breaks the
circularity by removing the binding in a that points to b.

So the good news (or is it bad news <wink>?) is that there's nothing you can
do to *stop* "Python [doing] *all* GC for" you -- del is completely safe.

> [war story. about a circular class instance getting cleaned up by
> factoring the source of the circularity into a new class that the
> original class then used as a service]
> ...


> This is a better design anyway -- I can use different Sanitizers as
> I see fit, I can use them w/o the original AdRequest object, and of
> course garbage collection is now much cleaner.

Bravo, John! That's the right way (as if you were looking for confirmation
<wink>).

BTW, I've often tracked down my-fault leaks "simply" by modifying __init__
methods to insert id(self) in a global dict, and adding __del__ methods
that just do "del that_dict[id(self)]". Then the addresses of the things
that haven't yet been deleted is just that_dict.keys(). Every time I do
this, I swear I'll add a secret builtin function that maps back from
addresses to objects, but the source of the problem always reveals itself
before I follow through.

You could do it with a few lines of code, though -- and as an added benefit,
really tick Guido off <wink>.

a-quick-hack-can-spare-a-day-of-thought-although-the-more-
you-do-it-the-less-effective-it-is-ly y'rs - tim

0 new messages