Comment on reporting methodology

3 views
Skip to first unread message

Shri

unread,
Nov 14, 2009, 12:52:07 AM11/14/09
to Ruby Benchmark Suite
Hi,

I had sent this to Antonio's email, but it may not have gotten past
the spam filters. Resending it here.

I work on the IronRuby project. Thanks for the work with the RBS and
publishing numbers periodically. Its very useful to have a standard
suite for us to do performance tuning with.

I had a few comments on the methodology of the numbers you published
at http://antoniocangiano.com/2009/08/03/performance-of-ironruby-ruby-on-windows/
and suggestions to make the numbers more useful for future reports:

1. The large table with the micro-benchmark table reports the “Total
time which is the runtime for the subset of benchmarks that were
successfully executed by all three implementations”. IMO, this total
is a sum of apples and oranges, and hence not a good number to report.
For example, if IronRuby took 10 times longer (compared to the current
time of 0.063) to run micro-benchmarks/bm_app_factorial.rb, it would
hardly change the total. However, even a 1% degrade in micro-
benchmarks/bm_lucas_lehmer.rb (with a current execution time of
159.078) would have a larger impact on the total. Since the different
rows cannot be compared to each other, it would be better to report a
ratio relative to one of the implementations (using negative ratios
when the target is slower than the baseline). The total would then be
a better number which gave equal weightage to all the benchmarks.

2. Could you run IronRuby as “ir.exe -X:NoAdaptiveCompilation“? That
command line options forces compilation to happen eagerly instead of
doing it in the background. Using the option will give more stable
numbers and be the best performance IronRuby can deliver, and also
yield in rations that are independent on the number of iterations a
benchmark is run. (Without that option, IronRuby will initially
interpret all methods for faster startup)

3. Besides reporting the average (arithmetic mean), could you also
report the median? This will filter out extreme cases where one
implementation happens to do blazingly fast or blazingly slow, and
give more weightage to the numbers in the middle, which will be more
representative of how average apps will perform.

Hope the comments make sense. Let me know if you have any questions

Regards,
Shri
Reply all
Reply to author
Forward
0 new messages