Memory profiling possibilities in NYTProf

251 views
Skip to first unread message

Jeremy K

unread,
Sep 11, 2009, 6:04:55 PM9/11/09
to Devel::NYTProf Dev
So I made the perhaps unwise move of twitter-kvetching about NYTProf
and memory profiling [0] in which I said: "Perl needs better memory-
use profiling. NYTProf is *amazing* but only addresses CPU usage."

Tim Bunce popped up (hurrah for twitter search functions) and in the
best, most politest, most open-source sort of way, invited me to put
up or shut up [1]: "@trochee Memory profiling of some kind is on the
roadmap for NYTProf, but won't happen soon without help. Want to
contribute?"

and yes, I would like to contribute. I do love Devel::NYTProf, but
AFAICT, there is no memory-use profiling. Followup conversations
(also twittered!) suggested that I should post here to raise the
question and spur Tim into laying out his plans for Devel::NYTProf
memory-profiling support.

WHAT:
What I'd really like from a good memory profiler, is a *non-invasive*
way of finding out the size (perhaps only the maximum size?) and
duration (or at least if it was garbage-collected before termination)
of any variable, whether lexically- or package-scoped. Ideally,
objects could also report their entire size, but that might get
complicated with things like singletons and inside-out objects.

When I say non-invasive, I really mean "in the current spirit of
Devel::NYTProf", in that NYTProf does non-invasive CPU/time profiling
(can you imagine how much you'd have to litter your code with
Benchmark calls to get the same kind of information?). In this
respect, Devel::Size and Devel::Cycle are "invasive", because they
require explicit markup of code one might be interested in.

There are some good resources in packages like Devel::Size and
Devel::Cycle that might be folded into such a project. Also, I found
some preliminary work from Jonathan T. Rockaway [2] that looks like it
might be worth giving a poke at.

HOW MUCH:
I'd like to help, with the caveat that I know very little about
perlguts, but I'm a generally savvy programmer and problem-solver.
But I do like tools, and NYTProf is one of my favorites!

--Jeremy

[0] http://twitter.com/trochee/status/3874077719
[1] http://twitter.com/timbunce/status/3884518663
[2] http://blog.jrock.us/articles/Memory%20Profiling%20part%201.pod

Tim Bunce

unread,
Sep 14, 2009, 6:27:43 AM9/14/09
to develnyt...@googlegroups.com
On Fri, Sep 11, 2009 at 03:04:55PM -0700, Jeremy K wrote:
> WHAT:
> What I'd really like from a good memory profiler, is a *non-invasive*
> way of finding out the size (perhaps only the maximum size?) and
> duration (or at least if it was garbage-collected before termination)
> of any variable, whether lexically- or package-scoped. Ideally,
> objects could also report their entire size, but that might get
> complicated with things like singletons and inside-out objects.
>
> When I say non-invasive, I really mean "in the current spirit of
> Devel::NYTProf", in that NYTProf does non-invasive CPU/time profiling
> (can you imagine how much you'd have to litter your code with
> Benchmark calls to get the same kind of information?). In this
> respect, Devel::Size and Devel::Cycle are "invasive", because they
> require explicit markup of code one might be interested in.
>
> There are some good resources in packages like Devel::Size and
> Devel::Cycle that might be folded into such a project. Also, I found
> some preliminary work from Jonathan T. Rockaway [2] that looks like it
> might be worth giving a poke at.

Here's an outline of the landscape (so to speak) as I see it:

Use cases:
- find a memory leak
- see where memory is used in a large process
(code bloat as well as data)

What can be measured:
- process memory - fast but (very) fuzzy, see below
- allocator pools - XXX needs specially compiled perl?
- SV arenas - slow but accurate (though not complete)

When to measure:
- at sub call & return
- at end of process/profiling

Perl has its own memory allocation system with a pool of 'free' memory.
Perl only asks for more memory from the OS when there isn't a *suitably
sized* chunk in its own free pool. And when it does ask for more it asks
for a big chunk which it then manages itself. The key point is: *you
can't draw meaningful conclusions by detecting when perl asks the OS for
more memory*. (Jonathan: "What did I learn from this? Nothing.")

The next level up in terms of detail is to ask perl's own allocator how
much memory it has allocated. If perl's own allocator is being used then
that information is available via the get_mstats() function.

The next level up is to iterate over the SV arenas and measure the
memory allocated to each. That's a little slower as there are lots of
SVs. It's also incomplete as not all memory is allocated via arenas
(OPs for example). I suspect this is the highest level that would be
practical for per-subcall profiling.

The next level up, and the most detailed of all, is to crawl over
higher-level data structures, like OP trees and packages recursing to
try to find everything 'in context', including closures etc.
Obviously this would be expensive, somewhat like the cost of cloning the
interpreter when starting a perl thread.


Given the above, here's a vision of where I'd like us to get to:

Picture a treemap where the outermost rectangle represents data memory
usage reported by the OS. Within that we'd have top-level boxes that
represent:

1) an area for memory that the perl allocator regards as free
(possibly subdivided by chunk size).
2) an area for each package namespace level, like the current NYTProf
treemap, in which we could have boxes for memory used by:
2a) package globals
2b) package lexicals
2c) package subroutines
2c1) memory used by code - measured by traversing the OP trees
2c2) memory use by lexicals, including recursive pads
3) other data, subdivided by type (AV, HV, etc) & flags (tmp,pad etc)
(would be nice to identify ref loops)
4) the remaining 'dark memory' that we can't account for
ideally zero if we can account for all memory the OS gave us

We need a policy for dealing with reference counts > 1. For example,
when measuring the memory usage of a global variable that holds a
reference, keep following the reference(s) so long as the ref count == 1.
It wouldn't be reasonable to associate any 'deeper' data with the
specific global variable unless we made much more expensive checks
for ref loops (which we could do in future).


So. What to do? Some suggestions:

Add a write_memory_usage() function. Call it from close_output_file().

In write_memory_usage() call get_mstats() (defined in perl's malloc.c)
and write the information returned to the data file in some reasonable
form. (See Perl_dump_mstats() for how to interpret the data.)
Need to handle the case where perl's malloc isn't being used.

In write_memory_usage() call some function, which you'll have to write,
to find out how much data memory the OS thinks perl has allocated.
Calling getrusage() and using the .ru_idrss struct element would work
for most non-linux unix systems. Linux can stat some /proc/... thingy.
A windows port will probably appear magically via someone like Jan.

Get those bits of info in the reports would be the next step.
A simple summary on the index page. Plus a new page giving the
per-bucket details from get_mstats.

To get deeper than this we need to start walking arenas and the package
tree. I'm not very keen on adding that walking code to NYTProf. I think
it would be better to extend packages like Devel::Gladiator and
Devel::Size to expose vistor functions (in C) that will do the walking
and call a callback for each item visited. NYTProf (and other code)
could then use those functions.


> HOW MUCH:
> I'd like to help, with the caveat that I know very little about
> perlguts, but I'm a generally savvy programmer and problem-solver.

You and Robin Smidsrød have shown particular interest in this, though
I'm sure there are many others who could and would help.
Hopefully between you all some progress can be made.

My priority is to get v3 released (once I get around to fixing the
exception-thrown-from-xsub issue - for which I think I have a simple
workaround) and then to work on java2perl6 / DBI for perl6.
So I won't be driving the memory profiling effort, but I will
certainly help.

> But I do like tools, and NYTProf is one of my favorites!

Mine too!

Tim.

p.s. See also
http://groups.google.com/group/develnytprof-dev/browse_frm/thread/1df4cba3001cd4e4#
ihttp://perl.markmail.org/search/?q=measuring+memory+footprint+date%3A200906

p.p.s. Someone *really* neads to refactor the reporting code into modules.
Partly for our ongoing maintenance sanity, but mainly to enable others
to develop plug-in modules to perform extra kinds of reporting. I think
some sort of pluggable MVC approach is needed.

p.p.p.s. I'm happy to create a branch for this. Also, I wonder if moving
to git would help encourage more contribution.

Tim Bunce

unread,
Sep 15, 2009, 4:45:07 AM9/15/09
to develnyt...@googlegroups.com
On Mon, Sep 14, 2009 at 11:27:43AM +0100, Tim Bunce wrote:
>
> The next level up in terms of detail is to ask perl's own allocator how
> much memory it has allocated. If perl's own allocator is being used then
> that information is available via the get_mstats() function.

An alternative and complementary approach: add a function that calls
get_mstats() and returns a simple total of allocated memory.
That would be fast enough to invoke on every perl subroutine call.

In the NYTProf subroutine profiler code, call that sub and record the
value, in the same way that it handles the time. In other words, call it
before and after the sub call and record the difference, and also
accumulate the differece into a global to be factored into the
calculation for sub calls higher up the call stack.

That would give us, for each subroutine calling location, the memory
growth/shrinkage caused by the sub, and caused by the subs that the sub
called. Like inclusive and exclusive time, we'd have inclusive and
exclusive memory allocation. Cool!

Tim.

p.s. A separate and complementary feature would be an option to stream
sub call info out to the data file as sub calls happen. That would let
users track allocation and freeing over time, rather than just seeing
totals at the end.

Reply all
Reply to author
Forward
0 new messages