Splitting the Heapmonitor

2 views
Skip to first unread message

Ludwig Hähne

unread,
Dec 17, 2008, 6:51:10 AM12/17/08
to pympl...@googlegroups.com
The heapmonitor module is way too big and deals with a number of
unrelated issues. I'd like to split it into three or more different
modules and get rid of the name "heapmonitor".

There is a bunch of functions that queries the operating system for the
virtual size or working set size of the current process. I think it
would be nice to expose this kind of functions to the user so that the
allocated virtual/physical memory of the Python process can be reported
in a platform independent way. Putting these functions in a new module
would concentrate the highly platform specific code in one file and
isolate the code that isn't related to anything else in Pympler anyway.
I still haven't found a good name for the module (or where to put it).
Suggestions?

Another alien in the heapmonitor is the "garbage analyzer" API. What it
does: It takes the gc.garbage list and produces an annotated graph
(using graphviz) to easily spot what objects are responsible for
creating reference cycles. Leafs are eliminated to make it easier to
spot the actual cycles. This has been very helpful to avoid garbage
creation in SCons. Maybe this could be added to a tool in Muppy (if
there is any, where this functionality might fit)? Otherwise I plan to
put it in a standalone module.

The core of the heapmonitor should get a new name. It tracks and
profiles selected objects (preferably large amounts of that). So I guess
"profiler" would be a good name, but that could probably be assigned to
a module of Muppy as well.

The last issue is the presentation logic. The most useful part (HTML
visualization) is actually the most messy (and virtually untested) part
of the heapmonitor. I have thought about switching to XML/XSLT but I'm
not sure if this would be over-engineering regarding this specific task.
Anyway, at least some tests are in order and maybe separating this into
a new module along with the remains of the heapmonitor (profiler/tracker/?).

Comments welcome.
Ludwig

Robert Schuppenies

unread,
Dec 17, 2008, 11:58:13 AM12/17/08
to pympl...@googlegroups.com
Ludwig Hähne wrote:
> The heapmonitor module is way too big and deals with a number of
> unrelated issues. I'd like to split it into three or more different
> modules and get rid of the name "heapmonitor".
>
> There is a bunch of functions that queries the operating system for the
> virtual size or working set size of the current process. I think it
> would be nice to expose this kind of functions to the user so that the
> allocated virtual/physical memory of the Python process can be reported
> in a platform independent way. Putting these functions in a new module
> would concentrate the highly platform specific code in one file and
> isolate the code that isn't related to anything else in Pympler anyway.
> I still haven't found a good name for the module (or where to put it).
> Suggestions?

Maybe just process?

> Another alien in the heapmonitor is the "garbage analyzer" API. What it
> does: It takes the gc.garbage list and produces an annotated graph
> (using graphviz) to easily spot what objects are responsible for
> creating reference cycles. Leafs are eliminated to make it easier to
> spot the actual cycles. This has been very helpful to avoid garbage
> creation in SCons. Maybe this could be added to a tool in Muppy (if
> there is any, where this functionality might fit)? Otherwise I plan to
> put it in a standalone module.

This is a graphical representation and as such a gui package sounds
reasonable. But probably the gui should be organized by
functionality, which we havn't clearly laid out, yet. muppy also has
one gui feature, which could than be grouped with the dependency graph.
Also I wonder if a single type of gui is suitable. I generally
prefer web-based guis, but I think an interactive one would pose
quite some overhead on the development (ajax and such).

> The core of the heapmonitor should get a new name. It tracks and
> profiles selected objects (preferably large amounts of that). So I guess
> "profiler" would be a good name, but that could probably be assigned to
> a module of Muppy as well.

I created another wiki page with a feature overview I've started a
couple of weeks back (see
http://code.google.com/p/pympler/wiki/MergerPotential). Maybe we can
use this to identify features which belong in one place and base the
name/module schema on this.

> The last issue is the presentation logic. The most useful part (HTML
> visualization) is actually the most messy (and virtually untested) part
> of the heapmonitor. I have thought about switching to XML/XSLT but I'm
> not sure if this would be over-engineering regarding this specific task.
> Anyway, at least some tests are in order and maybe separating this into
> a new module along with the remains of the heapmonitor (profiler/tracker/?).

Sounds like another GUI/module. I suggest to keep it in mind, but
postpone any coding till we have some idea which way we should go.

Ludwig Hähne

unread,
Dec 21, 2008, 8:27:39 AM12/21/08
to pympl...@googlegroups.com
Robert Schuppenies wrote on 17.12.2008 17:58:
> Ludwig Hähne wrote:
>> The heapmonitor module is way too big and deals with a number of
>> unrelated issues. I'd like to split it into three or more different
>> modules and get rid of the name "heapmonitor".
>>
>> There is a bunch of functions that queries the operating system for the
>> virtual size or working set size of the current process. I think it
>> would be nice to expose this kind of functions to the user so that the
>> allocated virtual/physical memory of the Python process can be reported
>> in a platform independent way. Putting these functions in a new module
>> would concentrate the highly platform specific code in one file and
>> isolate the code that isn't related to anything else in Pympler anyway.
>> I still haven't found a good name for the module (or where to put it).
>> Suggestions?
>
> Maybe just process?

This may be a bit too general IMO but it'll do - and I don't have a
better idea anyway ;)

>> Another alien in the heapmonitor is the "garbage analyzer" API. What it
>> does: It takes the gc.garbage list and produces an annotated graph
>> (using graphviz) to easily spot what objects are responsible for
>> creating reference cycles. Leafs are eliminated to make it easier to
>> spot the actual cycles. This has been very helpful to avoid garbage
>> creation in SCons. Maybe this could be added to a tool in Muppy (if
>> there is any, where this functionality might fit)? Otherwise I plan to
>> put it in a standalone module.
>
> This is a graphical representation and as such a gui package sounds
> reasonable. But probably the gui should be organized by
> functionality, which we havn't clearly laid out, yet. muppy also has
> one gui feature, which could than be grouped with the dependency graph.

I gave it some thought. I think it could be implemented as a derived
class of the Muppy RefBrowser module. It could also be abstracted from
illustrating only gc.garbage to a more general GraphBrowser
implementation by using the gc.garbage list as the root object. I'm
going to try it and see if this is doable. If it is feasible, we can
talk about where this module should reside.

> Also I wonder if a single type of gui is suitable. I generally
> prefer web-based guis, but I think an interactive one would pose
> quite some overhead on the development (ajax and such).

Hmm, even though I think web-based representation is the way to go. A
good (traditional) GUI will also require some overhead. Anyway, we
probably should discuss this later, when we know what kind of
representations (and interactions) have to be supported.

>> The core of the heapmonitor should get a new name. It tracks and
>> profiles selected objects (preferably large amounts of that). So I guess
>> "profiler" would be a good name, but that could probably be assigned to
>> a module of Muppy as well.
>
> I created another wiki page with a feature overview I've started a
> couple of weeks back (see
> http://code.google.com/p/pympler/wiki/MergerPotential). Maybe we can
> use this to identify features which belong in one place and base the
> name/module schema on this.

This nicely illustrates what a merge could be provide for each
module/feature. I think the ref-browser is a good place to start,
because it does not depend that much on other modules.

>> The last issue is the presentation logic. The most useful part (HTML
>> visualization) is actually the most messy (and virtually untested) part
>> of the heapmonitor. I have thought about switching to XML/XSLT but I'm
>> not sure if this would be over-engineering regarding this specific task.
>> Anyway, at least some tests are in order and maybe separating this into
>> a new module along with the remains of the heapmonitor (profiler/tracker/?).
>
> Sounds like another GUI/module. I suggest to keep it in mind, but
> postpone any coding till we have some idea which way we should go.

Yep.

Ludwig

Ludwig Hähne

unread,
Dec 27, 2008, 11:02:10 AM12/27/08
to pympl...@googlegroups.com
FYI, I created a "splitheapmonitor" branch and started to add a number
of modules derived from functionalities in the "classic" heapmonitor.
The branch will be temporarily broken when it comes to documentation and
tests. Once code, docs and tests will be consistent again, we can
discuss the changes I made and eventually merge those back to the trunk.

Ludwig

Reply all
Reply to author
Forward
0 new messages