Debugging IPython with pympler

42 views
Skip to first unread message

Brian Granger

unread,
Aug 25, 2009, 8:07:37 PM8/25/09
to pympl...@googlegroups.com
Hi,

I am doing some work on IPython and part of this is cleaning up our memory footprint.  I have done some very simple things and I am a bit confused.

First, I did a muppy.diff before and after instantiating IPython:

                                  types |   # objects |   total size
======================================= | =========== | ============
                                   list |        1254 |      1.18 MB
                                    str |       12272 |    904.17 KB
                                   dict |         792 |    391.03 KB
                                  tuple |        8040 |    352.01 KB
                                   code |        2210 |    212.18 KB
                                   type |         135 |     58.01 KB
                     wrapper_descriptor |         353 |     13.79 KB
                                StgDict |          35 |     11.16 KB
                     _ctypes.SimpleType |          24 |     10.31 KB
                                weakref |         239 |     10.27 KB
             builtin_function_or_method |         266 |      9.35 KB
                                    int |         791 |      9.27 KB
                    function (__init__) |         139 |      8.14 KB
  <class 'IPython.utils.ipstruct.Struct |          21 |      7.56 KB
                      getset_descriptor |         213 |      7.49 KB


This looks pretty scary as all I have done is create IPython - not use it.  I know IPython is a mess, but this seems a bit crazy.  How can I begin to get a grip on what is causing this?

Next, I am trying to see why IPython's __del__ method is not being called.  So I created an InteractiveBrowser to see who is holding refs to IPython.  I got this *massive* tree that appears to be completely recursive.  I am attaching a screen shot to show this.  It basically looks like we have self-referential lists all over the place.

Am I doing something wrong?  What should my next step be with pympler?

Cheers and thanks,

Brian
IPythonDebug.jpg

Robert Schuppenies

unread,
Aug 25, 2009, 10:00:31 PM8/25/09
to pympl...@googlegroups.com
Hi Brian.

Thanks for taking interest in the pympler project. Please know in advance, that
we are currently in the process of refactoring a lot of our three formerly
unrelated projects in order to make this more usable and easier to understand.


Brian Granger wrote:
> Hi,
>
> I am doing some work on IPython and part of this is cleaning up our
> memory footprint. I have done some very simple things and I am a bit
> confused.

Yeah, memory profiling can be quite confusing, because except for the most
simple applications, a lot of seemingly stuff will be involved.

> First, I did a muppy.diff before and after instantiating IPython:
>
> types | # objects | total size
> ======================================= | =========== | ============
> list | 1254 | 1.18 MB
> str | 12272 | 904.17 KB
> dict | 792 | 391.03 KB
> tuple | 8040 | 352.01 KB
> code | 2210 | 212.18 KB
> type | 135 | 58.01 KB
> wrapper_descriptor | 353 | 13.79 KB
> StgDict | 35 | 11.16 KB
> _ctypes.SimpleType | 24 | 10.31 KB
> weakref | 239 | 10.27 KB
> builtin_function_or_method | 266 | 9.35 KB
> int | 791 | 9.27 KB
> function (__init__) | 139 | 8.14 KB
> <class 'IPython.utils.ipstruct.Struct | 21 | 7.56 KB
> getset_descriptor | 213 | 7.49 KB
>
>
> This looks pretty scary as all I have done is create IPython - not use
> it. I know IPython is a mess, but this seems a bit crazy. How can I
> begin to get a grip on what is causing this?

This definitely looks like a lot of things that got into the game just for
loading. I am unfamiliar with IPython project, but taking a quick look into the
tarball it seems that IPython is quite a complex project. A couple of pointers,
though:

- Did you run muppy.diff before instantiating IPython? If not, make sure the
diff is zero (e.g., by running it twice). On the first invocation, muppy itself
will require some memory which might affect your output.

- It helps to understand the entry point of an invocation. Could you give me a
snippet of how you 'instantiated IPython'? A complete, but minimal example
would be best. Also, please add how you made use of pympler.

- A lot of memory goes into strings and lists. Maybe you can use
muppy.get_objects() to manually retrieve all object, then muppy.filter() to
filter out lists and/or strings and finally muppy.sort() to get the biggest.
Although for strings it looks the size just comes from the massive numbers, but
the lists are fairly large.
Now since these wouldn't exist by themselves, next it might be interesting who
actually refers to them, because these referrers are probably the ones causing
you the trouble. In order to do so, gc.get_referrers() can be helpful (gc is
the Python garbage collector module).

> Next, I am trying to see why IPython's __del__ method is not being
> called. So I created an InteractiveBrowser to see who is holding refs
> to IPython. I got this *massive* tree that appears to be completely
> recursive. I am attaching a screen shot to show this. It basically
> looks like we have self-referential lists all over the place.

This isn't necessarily your fault. This circular references can caused by a
couple of things, among them the sys module which knows other loaded modules
which know the sys module.
Looking closer, I think what you see here is the frame object of the
muppy.get_object() function. I wonder why you see it. This might be an issue
with muppy and seems like it doesn't have to do with IPython. Maybe this even
causes the large lists issue, but currently I would say rather not.

If you guide me through your steps, I might find the time to look at this
particular issue myself on Friday.


cheers,
robert

Brian Granger

unread,
Aug 26, 2009, 1:29:38 PM8/26/09
to pympl...@googlegroups.com
Robert,

Thanks for taking interest in the pympler project. Please know in advance, that
we are currently in the process of refactoring a lot of our three formerly
unrelated projects in order to make this more usable and easier to understand.

Thanks for your quick reply and help!  I spent some more time this morning following your suggestions.  Here is what I have found:

1.  The main hit comes from importing our main class:

    >>> tr.print_diff()

      types |   # objects |   total size
    ======= | =========== | ============

Now let's just import our class::

    >>> from IPython.core.iplib import InteractiveShell
    >>> tr.print_diff()

                           types |   # objects |   total size
    ============================ | =========== | ============
                             str |       10114 |    716.80 KB
                            list |         139 |    350.43 KB
                            dict |         634 |    344.68 KB
                           tuple |        7031 |    306.70 KB
                            code |        1952 |    185.82 KB
                            type |         119 |     51.13 KB

                         StgDict |          35 |     11.16 KB
              _ctypes.SimpleType |          24 |     10.31 KB
              wrapper_descriptor |         248 |      9.69 KB
                         weakref |         213 |      9.15 KB
      builtin_function_or_method |         255 |      8.96 KB
                  module(curses) |           1 |      6.97 KB
                 module(_curses) |           1 |      6.93 KB
             function (__init__) |         114 |      6.68 KB
               getset_descriptor |         187 |      6.57 KB

Actually creating an InteractiveShell has a much smaller impact and is nothing to worry about.

2.  Almost all the strings that show up at import time are docstrings from all the modules we import.  There are many dozens of modules, so this is not surprising.

3.  Most of the lists are small and not problems.  But, the biggest list is, the list of all objects that muppy itself creates when I call get_objects.  I don't know why this shows up, but it definitely does.

Question, is there a way to do get_objects on a diff, so I can looks at only the objects that are created after a certain step in my app?  That way, I can focus on the smaller # of objects that get created when our class is instantiated.

Thanks!

Brian

ps - if I you want to use IPython to debug this issue in muppy I can show you the steps I used to do this.  It is just using a dev branch of ipython.

 

Robert Schuppenies

unread,
Aug 27, 2009, 11:46:17 AM8/27/09
to pympl...@googlegroups.com
Brian Granger wrote:
> Now let's just import our class::
>
> >>> from IPython.core.iplib import InteractiveShell
> >>> tr.print_diff()
> types | # objects | total size
> ============================ | =========== | ============
> str | 10114 | 716.80 KB
> list | 139 | 350.43 KB
> dict | 634 | 344.68 KB
> tuple | 7031 | 306.70 KB
> code | 1952 | 185.82 KB
> type | 119 | 51.13 KB
> StgDict | 35 | 11.16 KB
> _ctypes.SimpleType | 24 | 10.31 KB
> wrapper_descriptor | 248 | 9.69 KB
> weakref | 213 | 9.15 KB
> builtin_function_or_method | 255 | 8.96 KB
> module(curses) | 1 | 6.97 KB
> module(_curses) | 1 | 6.93 KB
> function (__init__) | 114 | 6.68 KB
> getset_descriptor | 187 | 6.57 KB
>
> Actually creating an InteractiveShell has a much smaller impact and is
> nothing to worry about.

Good. It seems it often turns out that way. Luckily :)

> 3. Most of the lists are small and not problems. But, the biggest list
> is, the list of all objects that muppy itself creates when I call
> get_objects. I don't know why this shows up, but it definitely does.

Mh, looks like a bug. I will take a closer look at it.

> Question, is there a way to do get_objects on a diff, so I can looks at
> only the objects that are created after a certain step in my app? That
> way, I can focus on the smaller # of objects that get created when our
> class is instantiated.

This comes down to basically 3 steps:
1. Storing references to all objects before a snapshot,
2. Performing whatever action you want to do
3 Go through all objects after the snapshot and compare them to the
objects from before the snapshot.

Now you might already see the problem. If you have references to all
objects, the might not be garbage collected, although they would be
otherwise. This is usually helpful to track down memory leaks where you
have a presumably stable state, but new objects appear unexpectedly.
In addition, a comparison of all objects gets very expensive very fast,
depending on the number of objects.

I was working on a feature which allows to compare two object states
before and after an invocation, but I would call it experimental and you
won't find it in the docs. If you are interested, I can help you running
it for you app. But you've been warned :)

I am surprised by the large number of tuples that were created. Again, I
am not familiar with IPython, but it seems to make use of a lot of other
modules which may or may not be required but my guess is this is what
causes much of the memory usage.

Would it be possible to reproduce single steps involved in creating the
InteractiveShell without taking all of them? For example looking only at
individual modules used by the InteractiveShell, one by one. Memory
profiling definitely profits from a good understanding of the profiled
application which I am lacking in this case :|

Also, take a look at the asizeof module which summarizes the memory
usage of objects neatly. That might be helpful in understanding used
modules and their memory usage.

And as a last point, we are grateful for any feedback you give on the
usage, lack of features and pros and cons in general of Pympler :)


cheers,
robert

Ludwig Hähne

unread,
Sep 5, 2009, 8:17:25 AM9/5/09
to pympl...@googlegroups.com
Hi Robert,

Robert Schuppenies wrote on 26.08.2009 04:00:
> Looking closer, I think what you see here is the frame object of the
> muppy.get_object() function. I wonder why you see it. This might be an issue
> with muppy and seems like it doesn't have to do with IPython. Maybe this even
> causes the large lists issue, but currently I would say rather not.

I have had a similar issue when investigating large reference cycles in
SCons caused by frame objects. I think the problem here is that
get_objects gets a reference to its own stack frame which immediately
creates a cycle. I've just checked it:

>>> for o in gc.get_objects():
... if inspect.isframe(o):
... if inspect.currentframe() is o:
... print str(o)
...
<frame object at 0x87bc874>

Evidently, the problem with frame objects is that those are referencing
the parent stack frames and all local variables in each layer - thus the
reference tree gets very large. I'm not sure if this will suffice, but
it might help to filter the current frame object from the object list in
muppy and explicitly remove all references using del.

> if inspect.isframe(o) and inspect.currentframe() is o:
> del o

Hope this helps,
Ludwig

Reply all
Reply to author
Forward
0 new messages