Here's an outline of the landscape (so to speak) as I see it:
Use cases:
- find a memory leak
- see where memory is used in a large process
(code bloat as well as data)
What can be measured:
- process memory - fast but (very) fuzzy, see below
- allocator pools - XXX needs specially compiled perl?
- SV arenas - slow but accurate (though not complete)
When to measure:
- at sub call & return
- at end of process/profiling
Perl has its own memory allocation system with a pool of 'free' memory.
Perl only asks for more memory from the OS when there isn't a *suitably
sized* chunk in its own free pool. And when it does ask for more it asks
for a big chunk which it then manages itself. The key point is: *you
can't draw meaningful conclusions by detecting when perl asks the OS for
more memory*. (Jonathan: "What did I learn from this? Nothing.")
The next level up in terms of detail is to ask perl's own allocator how
much memory it has allocated. If perl's own allocator is being used then
that information is available via the get_mstats() function.
The next level up is to iterate over the SV arenas and measure the
memory allocated to each. That's a little slower as there are lots of
SVs. It's also incomplete as not all memory is allocated via arenas
(OPs for example). I suspect this is the highest level that would be
practical for per-subcall profiling.
The next level up, and the most detailed of all, is to crawl over
higher-level data structures, like OP trees and packages recursing to
try to find everything 'in context', including closures etc.
Obviously this would be expensive, somewhat like the cost of cloning the
interpreter when starting a perl thread.
Given the above, here's a vision of where I'd like us to get to:
Picture a treemap where the outermost rectangle represents data memory
usage reported by the OS. Within that we'd have top-level boxes that
represent:
1) an area for memory that the perl allocator regards as free
(possibly subdivided by chunk size).
2) an area for each package namespace level, like the current NYTProf
treemap, in which we could have boxes for memory used by:
2a) package globals
2b) package lexicals
2c) package subroutines
2c1) memory used by code - measured by traversing the OP trees
2c2) memory use by lexicals, including recursive pads
3) other data, subdivided by type (AV, HV, etc) & flags (tmp,pad etc)
(would be nice to identify ref loops)
4) the remaining 'dark memory' that we can't account for
ideally zero if we can account for all memory the OS gave us
We need a policy for dealing with reference counts > 1. For example,
when measuring the memory usage of a global variable that holds a
reference, keep following the reference(s) so long as the ref count == 1.
It wouldn't be reasonable to associate any 'deeper' data with the
specific global variable unless we made much more expensive checks
for ref loops (which we could do in future).
So. What to do? Some suggestions:
Add a write_memory_usage() function. Call it from close_output_file().
In write_memory_usage() call get_mstats() (defined in perl's malloc.c)
and write the information returned to the data file in some reasonable
form. (See Perl_dump_mstats() for how to interpret the data.)
Need to handle the case where perl's malloc isn't being used.
In write_memory_usage() call some function, which you'll have to write,
to find out how much data memory the OS thinks perl has allocated.
Calling getrusage() and using the .ru_idrss struct element would work
for most non-linux unix systems. Linux can stat some /proc/... thingy.
A windows port will probably appear magically via someone like Jan.
Get those bits of info in the reports would be the next step.
A simple summary on the index page. Plus a new page giving the
per-bucket details from get_mstats.
To get deeper than this we need to start walking arenas and the package
tree. I'm not very keen on adding that walking code to NYTProf. I think
it would be better to extend packages like Devel::Gladiator and
Devel::Size to expose vistor functions (in C) that will do the walking
and call a callback for each item visited. NYTProf (and other code)
could then use those functions.
> HOW MUCH:
> I'd like to help, with the caveat that I know very little about
> perlguts, but I'm a generally savvy programmer and problem-solver.
You and Robin Smidsrød have shown particular interest in this, though
I'm sure there are many others who could and would help.
Hopefully between you all some progress can be made.
My priority is to get v3 released (once I get around to fixing the
exception-thrown-from-xsub issue - for which I think I have a simple
workaround) and then to work on java2perl6 / DBI for perl6.
So I won't be driving the memory profiling effort, but I will
certainly help.
> But I do like tools, and NYTProf is one of my favorites!
Mine too!
Tim.
p.s. See also
http://groups.google.com/group/develnytprof-dev/browse_frm/thread/1df4cba3001cd4e4#
ihttp://perl.markmail.org/search/?q=measuring+memory+footprint+date%3A200906
p.p.s. Someone *really* neads to refactor the reporting code into modules.
Partly for our ongoing maintenance sanity, but mainly to enable others
to develop plug-in modules to perform extra kinds of reporting. I think
some sort of pluggable MVC approach is needed.
p.p.p.s. I'm happy to create a branch for this. Also, I wonder if moving
to git would help encourage more contribution.
An alternative and complementary approach: add a function that calls
get_mstats() and returns a simple total of allocated memory.
That would be fast enough to invoke on every perl subroutine call.
In the NYTProf subroutine profiler code, call that sub and record the
value, in the same way that it handles the time. In other words, call it
before and after the sub call and record the difference, and also
accumulate the differece into a global to be factored into the
calculation for sub calls higher up the call stack.
That would give us, for each subroutine calling location, the memory
growth/shrinkage caused by the sub, and caused by the subs that the sub
called. Like inclusive and exclusive time, we'd have inclusive and
exclusive memory allocation. Cool!
Tim.
p.s. A separate and complementary feature would be an option to stream
sub call info out to the data file as sub calls happen. That would let
users track allocation and freeing over time, rather than just seeing
totals at the end.