However, I'm having difficulty in understanding the results, or maybe
I'm just not finding it in the manual. How do I interpret the fields?
What are cnt/cntT/cntP/sys/sysr/sysP? Also, have you considered
generating output that can be visualized by kcachegrind?
|-java.lang.ref.ReferenceQueue.remove() cnt=41522/
cntT=9.223372036854776E15%/cntP=9.223372036854776E15%
sys=4167152.247ms/sysr=0.0ms/sysP=100.0%
| |-java.lang.ref.ReferenceQueue.remove() cnt=20761/
cntT=9.223372036854776E15%/cntP=50.0% sys=2083576.124ms/sysr=0.0ms/
sysP=50.0%
| |-java.lang.Object.wait() cnt=20761/
cntT=9.223372036854776E15%/cntP=50.0% sys=2083576.124ms/
sysr=2083576.124ms/sysP=50.0%
| <|-java.lang.ref.Finalizer$FinalizerThread.run() cnt=20761/
cntT=9.223372036854776E15%/cntP=50.0% sys=2083576.124ms/sysr=0.0ms/
sysP=50.0%
| <|-java.lang.ref.ReferenceQueue.remove() cnt=20761/
cntT=9.223372036854776E15%/cntP=50.0% sys=2083576.124ms/sysr=0.0ms/
sysP=50.0%
I'm glad that you have met your expectations with this profiler.
Although the next version seems to take a bit longer time than
expected, it is on the way out...
> However, I'm having difficulty in understanding the results, or maybe
> I'm just not finding it in the manual. How do I interpret the fields?
> What are cnt/cntT/cntP/sys/sysr/sysP? Also, have you considered
> generating output that can be visualized by kcachegrind?
cnt = sample count
cntT = percentage of the sample count compared to the total. the
actual live version contains a bug, it has been fixed in the SVN
version
cntP = percentage of the sample count compared to the parent node's
count. the same bug here
sys = Estimated system time. This is an experimental calculated value
that tried to give an time guess in the sampling profiler field,
because it looked good at first in our measurements. I wouldn't
suggest to take it serious now...
sysr = the 'real' time spent in the actual method, this is an other
estimate, same notes go here...
sysP = the percentage compared to the parent node's value. Given that
we are talking about the division of two estimated value, this is not
something you should highly rely on...
I'd use only the cnt value to look for performance bottlenecks. Please
let us know if you have further questions or need guidance.
Regards,
Istvan
I'm definitely interested to see what improvements you can add. I have
a java profiler that can pinpoint my performance bottlenecks, and the
results seem to be fairly non-perturbing. If I divide the number of
samples in non-blocked situations by the sampling rate, I should be
measuring the used CPU time. For CPU bound applications, the number is
within 25\% of correct.
I have a few improvements of my own to share.
I've made a generator to kcachegrind, which is a *really* nice open
source profile visualization tool. Check out the screen shots and the
website. My generator uses the undocumented 'output.method.filter' to
indicate functions to be completely removed from the profile, meaning
that the counts through these functions are subtracted from their
parents. Using this to filter thread sleep and thread blocking
functions, I can pinpoint where CPU is being used. 'Method.filter'
remains and functions by removing functions from appearing in the
profile, but accounts for their time in their parents.
Further integration is possible. For instance, I've not included
source code lines, which can be used to disambiguate overloaded
function names. Overloaded functions with the same name are merged
together; these might be distinguishable by using source code lines?
An example command line, generating a report on a profile of the
report generator is:
java -cp ~/source/Misc/oktech-profiler/hu.oktech.profiler/bin
hu.oktech.profiler.report.kcachegrind.KCachegrindReport input=tmp/
profiler/2010-04-09-08-10-24.dump.gz report.sampling.tree=true
'output.method.filter=-java.lang.Object.wait!,-
java.lang.Thread.sleep!,-sun.misc.Unsafe.park!,-
sun.nio.ch.FileDispatcher.read0!,-
sun.nio.ch.ServerSocketChannelImpl.accept!,-
java.net.SocketInputStream.socketRead0!,+.*' > /tmp/grind1.out &
Running this on my current codebase indicates that the report
generator spends about 80% of its time parsing strings and integers,
15% in decompressing, and 4% in generating the kcachegrind report.
(Contrast the output with using method.filter or no filter.)
I have a few other changes. As it only takes 10-20 minutes to collect
1gb of stacktraces in the dump file, I added a hack that gzips the
dump files on the fly giving 30:1 compression. The correct fix would
be a denser and redundancy-reducing format for dumping stack traces,
which would also make the report generator faster. I also added some
code to make the reader more robust to corrupt dump files, so I can
visualize the profile results before the program finishes running.
I will be submitting issues in the tracker with the individual
patches.
Scott