> It's been a few years since I last profiled the kernel, (probably a decade :-) but it was always a monolithic compiled kernel. Now I want to profile a module, but I'm not finding a lot of specific instructions as to how to do this, For example how to get the loaded addresses for the modules taken into account, or what arguments need to be added to the compile of the module to make sure it has any code stubs that may be needed, etc.
>
> if you know anything about these subjects, or related (e.g. usin
> the performance counters in the kernel/modules I'd love to get your inpout and maybe turn out a doc on how to do this.
>
> This would be for -current (9) or 8.0R.
Hi Julian:
You might want to take a look at using the 'lockstat' command, in particular its '-I' option (see http://docs.sun.com/app/docs/doc/816-5212/6mbcdgk0m?a=view or lockstat(1)). lockstat depends on the dtrace and the pseudo driver ksyms bits in the kernel. The ksyms driver provides lockstat with a complete symbol table (including all the symbols of any loaded kernel modules). See http://people.freebsd.org/~sson/ksyms/ksyms.4.txt or ksyms(4).
See http://wiki.freebsd.org/DTrace for adding the dtrace bits and add 'device ksyms' to your kernel config file. You will need to load both the 'dtraceall' and 'ksyms' kernel modules before you can successfully use the 'lockstat' command.
Best Regards,
-stacey.
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"
1) If hwpmc is not compiled into your kernel, kldload hwpmc
2) Run pmcstat to begin taking samples(make sure that whatever you are
profiling is busy doing work first!):
pmcstat -S unhalted-cycles -O /tmp/samples.out
The -S option specifies what event you want to use to trigger
sampling. The unhalted-cycles is the best event to use if your
hardware supports it; pmc will take a sample every 64K non-idle CPU
cycles, which is basically equivalent to sampling based on time. If
the unhalted-cycles event is not supported by your hardware then the
instructions event will probably be the next best choice(although it's
nowhere near as good, as it will not be able to tell you, for example,
if a particular function is very expensive because it takes a lot of
cache misses compared to the rest of your program). One caveat with
the unhalted-cycles event is that time spent spinning on a spinlock or
adaptively spinning on a MTX_DEF mutex will not be counted by this
event, because most of the spinning time is spent executing an hlt
instruction that idles the CPU for a short period of time.
Modern Intel and AMD CPUs offer a dizzying array of events. They're
mostly only useful if you suspect that a particular kind of event is
hurting your performance and you would like to know what is causing
those events. For example, if you suspect that data cache misses are
causing you problems you can take samples on cache misses.
Unfortunately on some of the newer CPUs(namely the Core2 family,
because that's what I'm doing most of my profiling on nowadays) I find
it difficult to figure out just what event to use to profile based on
cache misses. man pmc will give you an overview of pmc, and there are
manpages for every CPU family supported(eg man pmc.core2)
3) After you've run pmcstat for "long enough"(a proper definition of
long enough requires a statistician, which I most certainly am not,
but I find that for a busy system 10 seconds is enough), Control-C it
to stop it*. You can use pmcstat to post-process the samples into
human-readable text:
pmcstat -R /tmp/samples.out -G /tmp/graph.txt
The graph.txt file will show leaf functions on the left and their
callers beneath them, indented to reflect the callchain. It's not too
easy to describe and I don't have sample output available right now.
Another interesting tool for post-processing the samples is
pmcannotate. I've never actually used the tool before but it will
annotate the program's source to show which lines are the most
expensive. This of course needs unstripped modules to work. I think
that it will also work if the GNU "debug link" is in the stripped
module pointing to the location of the file with symbols.
* Here's a tip I picked up from Joseph Koshy's blog: to collect
samples for a fixed period of time(say 1 minute), have pmcstat run the
sleep command:
pmcstat -S unhalted-cycles -O /tmp/samples.out sleep 60
thanks for all this.
BTW I just tried the old kgmon/gprof profiling as a control.
it appears that on amd64 it doesn't work. gprof can't read the file
that the kernel puts out. (useful!).