Cool! I'll be glad to help test the MagLev part. There will probably be some unforseen gotchas.I'll see if I can come up with some magic to factor out or at least measure the overhead on the level 0 tier measurements. Ilya Grigorik said he was concerned about startup times so from my perspective it would be fine to include them even though MagLev's are probably longer.-- Monty----- Forwarded Message -----
From: "Wayne E. Seguin" <wayne...@gmail.com>
To: "Monty Williams" <monty.w...@gemstone.com>
Cc: "Wayne E. Seguin" <wayne...@gmail.com>
Sent: Friday, January 8, 2010 1:16:46 PM GMT -08:00 US/Canada Pacific
Subject: Re: RVM and MagLev
Monty,After some discussion with Evan we have concluded that the internal timeout will always be more accurate than using an external timeout. When using an external timeout you end up including the VM startup times (which we do not want factored in, we want post-startup times). Additionally you will record any other unknown process interactions like process scheduling interruptions. These items are removed and/or minimized by doing it all internal.Does that make sense?~Wayne----- Original Message -----
From: "Wayne E. Seguin" <wayne...@gmail.com>
To: "Monty Williams" <monty.w...@gemstone.com>
Cc: "Wayne E. Seguin" <wayne...@gmail.com>
Sent: Friday, January 8, 2010 1:11:21 PM GMT -08:00 US/Canada Pacific
Subject: Re: RVM and MagLev
Thanks for bringing this to my attention, I am talking with Evan about it right now and will get back to you on it.Additionally I am currently in the middle of adding maglev to rvm :)~WayneOn Jan 08, 2010, at 15:46 , Monty Williams wrote:Do the "Tiers" scripts you run use the Rubinius compare.rb? If so they may not be measuring exactly what you think.
Here is my observation:
The old RBS scripts ran in MRI and spawned a separate process which executed the code to be measured.
cmd = "#{timeout} -t #{limit} #{vm} #{runner} #{name} #{iterations} #{report} #{meter_memory}"
However, compare.rb executes the harness code in the system under test, and it includes an additional component beyond the actual code to be benchmarked in the total time reported. It's most significant in the tier 0 tests.
I found this when a 3x difference between MagLev and RBX (rbx being faster) turned into a 10x difference when run using compare.rb.
It seems to me the prior RBS methodology was more accurate, and something similar should be used, or else just use "time ruby benchmark.rb" and let the OS be the leveler by keeping Ruby out of everything except the code under test.
What do you think? Or maybe you've already accounted for this?
-- Monty
----- Original Message -----
From: "Wayne E. Seguin" <wayne...@gmail.com>
To: "Monty Williams" <monty.w...@gemstone.com>
Cc: "Wayne E. Seguin" <wayne...@gmail.com>
Sent: Thursday, January 7, 2010 6:07:37 PM GMT -08:00 US/Canada Pacific
Subject: Re: RVM and MagLevI wrote a few scripts to generate those benchmarks and results. I am working on refining it and will be publishing it sometime soon. For now I am intending on running the benchmarks every few days and posting them to that site.We should have MagLev in rvm before the weekend is out I believe.~Wayne
What we do on the IronRuby/IronPython team with our legacy internal perf infrastructure is to calibrate the iteration count for all new benchmarks so that the variance is below some low number. This is similar to your proposal of ensuring a minimum execution time, but more directly tied to reducing variance. Some benchmarks could do with less than 100 ms and some might need more than 100 ms to get the same low variance.
Btw, we also throw at the extreme end-points as there do seem to be outliers even if the overall variance is low.
--
The GitHub project is located at http://github.com/acangiano/ruby-benchmark-suite
You received this message because you are subscribed to the Google
Groups "Ruby Benchmark Suite" group.
To post to this group, send email to
ruby-bench...@googlegroups.com
To unsubscribe from this group, send email to
ruby-benchmark-s...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/ruby-benchmark-suite?hl=en
RBS has macro-benchmarks for RDoc and Rails. We could add more benchmarks using other real-world libraries, apps or gems. If we all add just a couple of benchmarks each, we can get a good collection. The next shootout could then also request library authors to contribute benchmarks for their gems, which will create an even larger suite of real-world code (though you would want to restrict it to the most popular gems for the suite to be considered relevant).
I may not be able to get to this for a couple of weeks, but can try to add a couple of macro-benchmarks after that.
Yeah asking the community for more would be good.
My next thought for a macro benchmark would be a sinatra benchmark
somehow (since sinatra seems like a common benchmark and runnable on
more Ruby VM's than rails is).
-rp