benchmark server almost ready

Chuck Remes

unread,

Jun 19, 2011, 9:36:54 AM6/19/11

to rubini...@googlegroups.com, us...@jruby.codehaus.org

(Cross-posted to JRuby ML)

I am nearly done with building out the benchmarking server for JRuby and Rubinius. I am using the benchmarks from the Rubinius project (rubinius/benchmark/core/*). There are around 90 benchmark files each with a varying number of tests, so each run produces 522 reports right now.

I have several questions and concerns that I'll detail here. We can discuss these either here or on irc; I have no preference.

Issues...

1. Benchmark names
The name given in the benchmark (x.report("this is the name")) acts as the database key for the codespeed server. For continuity, we do *not* want these names changing once they are in the system otherwise old results will become decoupled from new results.

e.g. core/array/bench_append.rb, "append array" would be a different report from "Append array"

So, I'd like to "lock down" the benchmark files so the names on old benchmarks don't get changed without an extra sign-off.

Additionally, some of the names are a tad long (though very descriptive). The codespeed server wants to limit names to 30 chars (I modified the source to get beyond this) but we probably do want to cap the length at something reasonable like 100 chars. I pulled that out of the air. The length limit increases readability on the codespeed website.

2. Number of runs (EC2 charges)
At this time I have it configured to do two runs for Jruby and two for Rubinius on each commit to their respective repositories. The JRuby runs are for '--client' and '--server' while for Rubinius they are '-Xint' and JIT-enabled. Based on the number of benchmarks, each commit takes around 1 hour to complete. Since EC2 charges by the instance hour consumed, this may get expensive.

Couple that with the fact that *every* commit causes a run and suddenly we see a bunch of runs queue up due to documentation changes, fixing typos, etc. Perhaps the commits need to be filtered so that those that do not contain any source file changes (*.java, *.rb, *.hpp, *.cpp, etc) will be ignored.

Overall, I'm concerned that this benchmark server is going to be expensive.

3. Benchmark repository
I think the benchmarks need to be moved out from under the Rubinius repository and into a separate one. Also, they each need to be modified to work out-of-the-box with the benchmark_suite gem (they don't work with it now). I already know what changes need to be made, so this is relatively simple.

The concern becomes project ownership of these benchmarks. Should it be under the JRuby or Rubinius organizations? What happens when we add MRI or Maglev or IronRuby or ?? to the list of supported runtimes?

I suggest that the benchmarks be spun off to a new "ruby community" organization. This change will ensure more open access to runtimes that are not part of the Engine Yard ecosystem (if that's a concern to anyone) while also providing a semblance of impartiality.

4. Database & webserver performance
I need to get back onto my "day job" now that this works. I don't have the time or skills to replace sqlite3 with a more enterprisey DB nor do I know how to reconfigure django to use apache/nginx/etc instead of the default python-based web server.

So, if this site is popular and gets lots of hits, it may fall over.

I'd like to put a general call out to the Ruby community to ask for some volunteer(s) to step in and do this optimization. It's probably okay to wait and see if the site actually does get crushed before spending any more time on it.

5. Github post-receive hook
The code that runs the benchmarks looks at commits in two ways. Upon startup it updates each repository and looks up the last 25 commits. It enqueues the commits it hasn't seen yet (stored in a small sqlite db). After this initialization step, it starts listening for commits that are published from github.

So we'll need each project to add a post-receive hook that points at this server. It's super easy to do this through the admin control panel. Both projects probably already have these setup for the CI servers.

The code I wrote to handle all of this bookkeeping will be pushed to github early this week. The project name is currently 'benchmark_pipeline' so if you hate the name please speak up now. The code ain't perfect but it is a decent foundation to work from. With a little refactoring it could probably be adopted by any project that wanted to setup their own codespeed server to track performance.

cr

Chuck Remes

unread,

Jun 19, 2011, 10:17:11 AM6/19/11

to us...@jruby.codehaus.org, rubini...@googlegroups.com

On Jun 19, 2011, at 8:44 AM, Michael Klishin wrote:

> Chuck Remes escribió:

>> The concern becomes project ownership of these benchmarks. Should it be under the JRuby or Rubinius organizations? What happens when we add MRI or Maglev or IronRuby or ?? to the list of supported runtimes?
>>
>> I suggest that the benchmarks be spun off to a new "ruby community" organization. This change will ensure more open access to runtimes that are not part of the Engine Yard ecosystem (if that's a concern to anyone) while also providing a semblance of impartiality.

> Chuck,
>
> I think at least initially this project can reside under github.com/rubyspec umbrella. rubyspec is what (hopefully) all Ruby implementations collaborate on and many people in the Ruby community are aware of this fact. Vendor neutrality was also one of the rubyspec goals.
>
> Just an idea.

Good point. I like the idea.

cr

Chuck Remes

unread,

Jun 19, 2011, 10:19:16 AM6/19/11

to us...@jruby.codehaus.org, rubini...@googlegroups.com

On Jun 19, 2011, at 8:36 AM, Chuck Remes wrote:

> 2. Number of runs (EC2 charges)
> At this time I have it configured to do two runs for Jruby and two for Rubinius on each commit to their respective repositories. The JRuby runs are for '--client' and '--server' while for Rubinius they are '-Xint' and JIT-enabled. Based on the number of benchmarks, each commit takes around 1 hour to complete. Since EC2 charges by the instance hour consumed, this may get expensive.
>
> Couple that with the fact that *every* commit causes a run and suddenly we see a bunch of runs queue up due to documentation changes, fixing typos, etc. Perhaps the commits need to be filtered so that those that do not contain any source file changes (*.java, *.rb, *.hpp, *.cpp, etc) will be ignored.
>
> Overall, I'm concerned that this benchmark server is going to be expensive.
>
>

> 5. Github post-receive hook

I suppose I should have written that we don't *need* to benchmark every commit. Like the pypy project, this could be a cron job that runs once per day and benchmarks the latest HEAD. That would certainly eliminate the concern over EC2 costs.

cr

Karol Hosiawa

unread,

Jun 19, 2011, 12:36:28 PM6/19/11

to rubinius-dev

On Jun 19, 3:36 pm, Chuck Remes <cremes.devl...@mac.com> wrote:

> 1. Benchmark names
> The name given in the benchmark (x.report("this is the name")) acts as the database key for the codespeed server. For continuity, we do *not* want these names changing once they are in the system otherwise old results will become decoupled from new results.
>
> e.g. core/array/bench_append.rb, "append array" would be a different report from "Append array"
>
> So, I'd like to "lock down" the benchmark files so the names on old benchmarks don't get changed without an extra sign-off.
>
> Additionally, some of the names are a tad long (though very descriptive). The codespeed server wants to limit names to 30 chars (I modified the source to get beyond this) but we probably do want to cap the length at something reasonable like 100 chars. I pulled that out of the air. The length limit increases readability on the codespeed website.

Could we use the syntax tree of do end block in x.report("name")
do ... end instead of its name ? Something along these lines:

Digest::MD5.hexdigest("proc {|i| i.to_i}".to_sexp.to_s)

will generate a usable key in the db. If the benchmark code changes
the id will change but that's good, because it's not the same
benchmark anymore, right (Renaming a variable will change the sexp
signature and the id but I guess this is less likely to happen
(renaming the variable for the sake of renaming it vs. renaming a
report) ? We'll be able to rename benchmarks (eg. if there are many
more benchmarks in the future, we get similar names and want to name
them slightly differently and won't be limited by the number of chars
etc.)

>
> 2. Number of runs (EC2 charges)
> At this time I have it configured to do two runs for Jruby and two for Rubinius on each commit to their respective repositories. The JRuby runs are for '--client' and '--server' while for Rubinius they are '-Xint' and JIT-enabled. Based on the number of benchmarks, each commit takes around 1 hour to complete. Since EC2 charges by the instance hour consumed, this may get expensive.
>
> Couple that with the fact that *every* commit causes a run and suddenly we see a bunch of runs queue up due to documentation changes, fixing typos, etc. Perhaps the commits need to be filtered so that those that do not contain any source file changes (*.java, *.rb, *.hpp, *.cpp, etc) will be ignored.
>
> Overall, I'm concerned that this benchmark server is going to be expensive.
>

If the cost is going to be high for an EC2 instance and we're planning
things like post-commit hooks maybe it's worth to consider a dedicated
box ?
Prices in Europe start from 29 EUR/month.

Thanks
--
Karol Hosiawa

Michael Klishin

unread,

Jun 19, 2011, 9:44:27 AM6/19/11

to us...@jruby.codehaus.org, rubini...@googlegroups.com

Chuck Remes escribió:

> The concern becomes project ownership of these benchmarks. Should it be under the JRuby or Rubinius organizations? What happens when we add MRI or Maglev or IronRuby or ?? to the list of supported runtimes?
>
> I suggest that the benchmarks be spun off to a new "ruby community" organization. This change will ensure more open access to runtimes that are not part of the Engine Yard ecosystem (if that's a concern to anyone) while also providing a semblance of impartiality.

Chuck,

I think at least initially this project can reside under github.com/rubyspec umbrella. rubyspec is what (hopefully) all Ruby implementations collaborate on and many people in the Ruby community are aware of this fact. Vendor neutrality was also one of the rubyspec goals.

Just an idea.

MK

http://github.com/michaelklishin
http://twitter.com/michaelklishin

Rob Heittman

unread,

Jun 19, 2011, 11:31:07 AM6/19/11

to rubini...@googlegroups.com, us...@jruby.codehaus.org

I'm concerned that spawning an EC2 instance for each run may result in a lot of noise in the emerged benchmarks. I've found that the hardware that acquires each instance can differ noticeably in characteristics and shared load from run to run, even within the same AZ. Every now and then "bad," underperforming, instance launches may raise spurious alarm bells. On a large operation across multiple instances, we always expect something like a 20% difference between the first instances to report a result and the last ones ... and are never surprised by a few failures.

The noise would be less for a single long-lived instance, or best with a dedicated piece of hardware with no competing loads. Not that I have one I could reliably volunteer ... but for this very specific mission, it seems like run-to-run improvement or degradation only really has meaning if as many other variables as we can manage are taken away. Maybe some Ruby activist has a nice old server spinning away in their data center that feels lonely because all the fun has gone to the cloud?

Of course, I could be fretting over nothing -- this could be in the "try it and see" category.

Reply all

Reply to author

Forward