Warren
_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
I would have to say what's wrong with using gprof?
On a related topic: Has anyone tried using SystemTap with userspace
DTrace-compatible probes to measure anything about total kernel +
userspace usage of OCaml programs? ie, this sort of thing:
http://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps
but with OCaml programs.
Rich.
--
Richard Jones
Red Hat
> On Sun, Feb 28, 2010 at 04:16:03PM -0800, Warren Harris wrote:
>> I would like to determine what percentage of my application's cpu
>> time
>> is spent in the garbage collector (for tuning purposes, but also just
>> to monitor the overhead). Is there any way to obtain this information
>> short of using gprof? Additional information provided by Gc.stat
>> would
>> be ideal, or perhaps a Gc.alarm that was called at the beginning of
>> the gc cycle, but neither of these seem to exist.
>
> I would have to say what's wrong with using gprof?
What's wrong with it is that it provides no way to monitor gc overhead
in an active service.
On Tue, Mar 2, 2010 at 12:11 PM, Warren Harris <warrens...@gmail.com> wrote:
>
> On Mar 1, 2010, at 12:54 AM, Richard Jones wrote:
>
>> On Sun, Feb 28, 2010 at 04:16:03PM -0800, Warren Harris wrote:
>>>
>>> I would like to determine what percentage of my application's cpu time
>>> is spent in the garbage collector (for tuning purposes, but also just
>>> to monitor the overhead). Is there any way to obtain this information
>>> short of using gprof? Additional information provided by Gc.stat would
>>> be ideal, or perhaps a Gc.alarm that was called at the beginning of
>>> the gc cycle, but neither of these seem to exist.
>>
>> I would have to say what's wrong with using gprof?
>
> What's wrong with it is that it provides no way to monitor gc overhead in an
> active service.
I would have recommended using oprofile on linux, which I greatly
prefer to GCC's built-in profiling support for profiling C programs.
It has a low and tunable overhead, and because it's a sampling
profiler it doesn't perturb the results anywhere near as much as
standard profiling instrumentation.
Unfortunately last time I checked it had poor OCaml support (no
support for unwinding the OCaml call stack, so no context-sensitivity
in the profiles). That said, you probably don't need
context-sensitivity to determine the fraction of execution time spent
in the GC.
It might be worth a try...
Peter
You can have a look at:
http://ocamlviz.forge.ocamlcore.org
This allow to instrument your code and watch GC activity. I think that
with a little a little help on program side, you can be quite precise
about GC without using gprof at all. This should also be more
lightweight than gprof.
Regards,
Sylvain Le Gall
>
> I would have recommended using oprofile on linux, which I greatly
> prefer to GCC's built-in profiling support for profiling C programs.
> It has a low and tunable overhead, and because it's a sampling
> profiler it doesn't perturb the results anywhere near as much as
> standard profiling instrumentation.
>
> Unfortunately last time I checked it had poor OCaml support (no
> support for unwinding the OCaml call stack, so no context-sensitivity
> in the profiles). That said, you probably don't need
> context-sensitivity to determine the fraction of execution time spent
> in the GC.
Peter - gprof with ocaml works quite well: http://caml.inria.fr/pub/docs/manual-ocaml/manual031.html
Warren
Sylvain,
Thanks! This looks very promising. I'll give it a try.
Warren
On Tue, Mar 2, 2010 at 3:08 PM, Warren Harris <warrens...@gmail.com> wrote:
>
> Peter - gprof with ocaml works quite well:
> http://caml.inria.fr/pub/docs/manual-ocaml/manual031.html
I'm fully aware of gprof and ocaml's support of profiling.
OCaml's profiling support works by adding calls to the _mcount library
function at the entry point to every compiled function, which takes
approximately 10 instructions on x86 (pushes and pops to save
registers, and a call instruction). The _mcount function records
function call counts, and is also responsible for producing the call
graph. Separately, the profile library samples the program counter at
some frequency, which lets us work out in which functions the program
is spending its time.
Using OCaml's profiling support has three problems:
1) programs compiled with profiling are slower, and
2) the profiling instrumentation itself distorts the resulting profile, and
3) the call graph accounting is inaccurate.
Let's discuss each of these in turn:
Problem (1) is simply that your program has extra overhead from all of
those _mcount calls, which occur on every function invocation. You
can't turn them off, and you can't make them happen less frequently.
It's an all-or-nothing proposition. It would be unusual to include
profiling instrumentation in a production system.
Problem (2) is a little more subtle. Recall that the profiling
instrumentation adds ~10 instructions to the start of each function,
regardless of its size. For a large function, this may be a negligible
overhead. For a small function, say one that was only 5 or 10
instructions in size to begin with, that is a substantial overhead.
Since we determine how much time is spent in each function by sampling
the program counter, small and frequently called functions will appear
to take relatively longer than larger functions in the resulting
profile. Small functions are common in OCaml code so we should see an
appreciable amount of distortion.
Problem (3) is a criticism of the _mcount mechanism in general. For
each function f(), the profiler knows (a) how long we spent executing
f() in total, and (b) how many times each of f()'s callers invoked
f(). We do not know how much time f() spent executing on behalf of any
given caller. If we assume that all of f()'s invocations took
approximately the same amount of time, then we can use the caller
counts to approximate the time spent executing f() on behalf of each
caller. However, the assumption that f() always takes approximately
the same amount of time is not necessarily a good one. I think it's an
especially bad assumption in a functional program.
These problems are avoided by using a sampling profiler like oprofile
or shark, which samples an _uninstrumented_ binary at a particular
frequency. Because the binary is unmodified, we can turn profiling on
and off on a running system, avoiding point (1); furthermore we can
adjust the sampling rate so profiling overhead is low enough to be
tolerable. Since there is no instrumentation added to the program, the
resulting profile does not suffer from the distortion of point (2).
Some profilers (e.g. shark on Mac OS X) can deal with point (3) as
well --- all we need to do is record a complete stack trace at
sampling time.
My point was that oprofile or one of its cousins (e.g. shark) is
probably adequate for your needs. You can set the sampling rate low
enough that your service can run more or less as normal. To determine
GC overhead, you simply need to look at the total amount of time spent
in the various GC functions of the runtime.
Peter
This said, I've wanted to measure GC overhead with it, and found it
lacking in that regard. If anyone finds a way to do this, I'm interested.
I've not done much with its tree viewer, and the hashtbl monitor only
indicated that the Hashtbl.t I was using had an amazingly horrible hash,
and was filling only 3% of its buckets, and had thousands of entries in
a few buckets. I tried to fix the hash, but ended up switching to a Map.
Lastly, the ability to mark in memory certain values and have it count
the total usage and/or count of those values seems interesting, but gets
quite slow. I've not had much luck with it.
Overall, good job. But is it going to die or stay maintained?
E
Thanks, this is excellent info. I've been using both gprof and shark
and understand the tradeoffs. I really was looking for a way to just
provide a simple live "gc overhead" number that we could graph along
with a bunch of other server health stats for our zenoss monitors.
Looks like I'd need to hack my runtime a bit to get this though.
Warren
On 03-03-2010, Edgar Friendly <thele...@gmail.com> wrote:
> On 03/02/2010 06:09 PM, Warren Harris wrote:
>> On Mar 2, 2010, at 2:03 PM, Sylvain Le Gall wrote:
>>>
>>> You can have a look at:
>>> http://ocamlviz.forge.ocamlcore.org
>>>
>> Thanks! This looks very promising. I'll give it a try.
>>
>
> Overall, good job. But is it going to die or stay maintained?
>
Well, I hope it will stay maintained. At least source code, bugs and
release on the forge will stay there for a long time (I can make promise
on this part). And whenever current developpers become inactive,
OCamlCore.org administrators can move ownership to other (with notice to
current owner, of course):
http://www.ocamlcore.org/philosophy/ (point 4)
But anyway, this kind of tool is targeted at debugging on the first
place. It is not a mandatory piece of a software/library. You can lie
without it, when you have finished your job debugging/profiling your
program.
So I would say that long term maintainance should not bother user for
now. It is actually something that is lightweight and that works. To my
mind this is enough to consider using it.
If a lot of people start using it, it is highly probable that it will
stay maintained.
Regards
2010/3/3 Warren Harris <warrens...@gmail.com>:
> Thanks, this is excellent info. I've been using both gprof and shark and
> understand the tradeoffs. I really was looking for a way to just provide a
> simple live "gc overhead" number that we could graph along with a bunch of
> other server health stats for our zenoss monitors.
So simply enable gprof on OCaml binaries and look at the total
fraction of time spent in OCaml GC functions!
http://caml.inria.fr/pub/ml-archives/caml-list/2003/01/e8ee9d44073ff9cb7d257fef86bc8f53.en.html
Best regards,
david
Here's what I use to measure GC overhead in my programs.
There's a small modification to the runtime, so as to track the time
spent in caml_minor_collection, and a helper ml module. It tracks and
prints the time spent between calls to the start() and stop() function
of the helper module, as well the number of collections, number of
bytes allocated, etc.
It is rather coarse-grained of course. I use it to profile the
different parts of a compiler: parsing, typing, optimizations, code
generation, etc.
--
Olivier
> I would like to determine what percentage of my application's cpu time
> is spent in the garbage collector (for tuning purposes, but also just
> to monitor the overhead). Is there any way to obtain this information
> short of using gprof? Additional information provided by Gc.stat would
> be ideal, or perhaps a Gc.alarm that was called at the beginning of
> the gc cycle, but neither of these seem to exist.
>
> Warren
In my code I have a cache structure that implements a last-recently-used
ordering of cached objects. As objects are used the list grows an every
now and then I have to shrink it. It would be real nice if I could
shrink it exactly before evrey GC major cycle or compaction.
I also saw that Gc.alarm is called AFTER each GC cycle. But maybe that
doesn't matter as after a cycle is before a cycle. With the major cycle
being done in increments the next cycle probably starts right after the
last one finished compared to the time the cycle takes overall.
MfG
Goswin