Refactoring RBS

18 views
Skip to first unread message

Brian Ford

unread,
Feb 17, 2009, 1:34:28 PM2/17/09
to Ruby Benchmark Suite, Evan Phoenix, Joe Arnold
Hi folks,

I've imported the RBS files into the Rubinius repository and
refactored the structure quite a bit. You can have a look at the files
here (see the rbs and utils directories):

http://github.com/evanphx/rubinius/tree/74afe0f2cdedc06f716fcf9df98e425edfef78a7/benchmark
http://github.com/evanphx/rubinius/blob/74afe0f2cdedc06f716fcf9df98e425edfef78a7/rakelib/bench.rake

See the benchmark/utils/README for a more detailed explanation of the
layered approach I took to organizing the benchmarks.

Basically, the implementation being benchmarked should not be required
to actually implement a lot of fancy features like Threads. Only the
features being benchmarked should be required. The Bench runner needs
just a few things like classes, methods, instance variables, Time.now,
the ability to open and write to a file. The harness/runner/framework
should never place too big a burden on the implementation. That is a
lesson well learned from working with RubySpec.

The one big outstanding issue to be resolved is a monitor script/
program for Windows. I outline a few solutions in the README. It's not
that big a problem and I'd be happy to maintain the monitor program.

Anyway, if you are interested in the approach I've used and someone
wants to port RBS along these lines, I think it would be a big
benefit. If you'd like me to, I can fork RBS and commit these changes
and someone can pull that in, or I could commit directly.

If you don't like this approach for fundamental reasons, no worries. I
would be curious to understand your reasoning. In its current state, I
don't see RBS being very useful for any but the most mature
implementations. Experimentation and benchmarking should go hand in
hand and the current format of the benchmarks really hampers that.
Also, having every benchmark file include chunks of the runner,
reporter, and require a bunch of files just clutters up the
benchmarking code.

Cheers,
Brian

Roger Pack

unread,
Feb 18, 2009, 12:49:14 PM2/18/09
to ruby-bench...@googlegroups.com
That's a good idea.
So you're saying basically move the "tests" to a separate file, kind of like

ITERATIONS.times {
system("#{RUBY_VM} scriptname")
}
type of thing?

If so, one advantage is that it clears the garbage each time between runs.

A drawback is that how does jruby get much of a chance to warm up and
show the effects of its warmup by tracking the iteration speed?
[i.e. iteration 1 is slow, iteration 2 is faster, 3 faster, choose the fastest]

If we want to still allow jruby a nice warmup time we'd seem to want
to do that in every single script?

Should we assume the "running" ruby implementation is fully featured?

Thoughts?
-=r

Brian Ford

unread,
Feb 18, 2009, 1:09:20 PM2/18/09
to Ruby Benchmark Suite
On Feb 18, 9:49 am, Roger Pack <rogerdp...@gmail.com> wrote:
> That's a good idea.
> So you're saying basically move the "tests" to a separate file, kind of like
>
> ITERATIONS.times {
>   system("#{RUBY_VM} scriptname")}
>
> type of thing?

You should take a look at the code. Follow the links I pasted. In
particular, look at benchmark/utils/bench.rb and some of the benchmark
files. Also, read benchmark/utils/README.

>
> If so, one advantage is that it clears the garbage each time between runs.
>
> A drawback is that how does jruby get much of a chance to warm up and
> show the effects of its warmup by tracking the iteration speed?
> [i.e. iteration 1 is slow, iteration 2 is faster, 3 faster, choose the fastest]

Not sure what you mean by jruby not getting a chance to warm up. If
previous code run in other benchmarks has affected hotspot's behavior
such that a particular benchmark's results are different, how reliable
is the benchmark at all?

I think my structure will give much more accurate and isolated
results. Also, the Bench class emits more information. The times array
is emitted in the order the times were captured, so you could post-
process and see what changes between runs. Also, max, min, mean,
median, and standard deviation are emitted to the YAML file. For
Rubinius, I'm going to add GC stats collected during the run. Other
implementations could add stats, too. That's the flexibility offered
by emitting a YAML file.

>
> If we want to still allow jruby a nice warmup time we'd seem to want
> to do that in every single script?
>
> Should we assume the "running" ruby implementation is fully featured?

No, emphatically we should not assume this. As I already mentioned,
benchmarking should be available to help with experimentation. Only
the Ruby features needed in the benchmark should be required. The
Bench runner I wrote adds a bare minimum of additional features.

Cheers,
Brian

>
> Thoughts?
> -=r
>
> On Tue, Feb 17, 2009 at 11:34 AM, Brian Ford <bri...@gmail.com> wrote:
>
> > Hi folks,
>
> > I've imported the RBS files into the Rubinius repository and
> > refactored the structure quite a bit. You can have a look at the files
> > here (see the rbs and utils directories):
>
> >http://github.com/evanphx/rubinius/tree/74afe0f2cdedc06f716fcf9df98e4...>http://github.com/evanphx/rubinius/blob/74afe0f2cdedc06f716fcf9df98e4...

Brian Ford

unread,
Feb 18, 2009, 1:19:08 PM2/18/09
to Ruby Benchmark Suite
Actually, I may have misunderstood you. Let's use more precise
terminology consistent with the code (the code I refactored) we are
talking about. The Ruby implementation that runs Rake should be full
featured. But the Ruby implementation than runs bench.rb and the
benchmark files themselves should not be assumed to be full featured.

There are 4 layers in my approach, from inside (the bench file) out
(rake):

1. the bench file itself, it does the work being timed.
2. the bench.rb (Bench) runner
3. the monitor program that exec's the implementation being
benchmarked, which run bench.rb and the benchmark file(s).
4. rake tasks for conveniently running a whole directory or the whole
suite.

Only 1 and 2 are run by (or need to be run by) the implementation
being benchmarked.

Hope this clears up the confusion.

Brian

Charles Oliver Nutter

unread,
Feb 18, 2009, 1:41:38 PM2/18/09
to ruby-bench...@googlegroups.com
Brian Ford wrote:
>> A drawback is that how does jruby get much of a chance to warm up and
>> show the effects of its warmup by tracking the iteration speed?
>> [i.e. iteration 1 is slow, iteration 2 is faster, 3 faster, choose the fastest]
>
> Not sure what you mean by jruby not getting a chance to warm up. If
> previous code run in other benchmarks has affected hotspot's behavior
> such that a particular benchmark's results are different, how reliable
> is the benchmark at all?

I think the problem here is that implementations with slow startup, or
that perform optimizations later in their runs (JRuby, IronRuby,
probably MagLev, and some day Rubinius) are going to be "starting from
scratch" if a short benchmark is launched anew for every iteration. It
penalizes implementations that exchange startup speed for long-term
execution speed.

Granted, I haven't looked at your modifications, but one key thing I
know Antonio was trying to avoid was startup/warmup time skewing results.

I'm also agree that the benchmarks themselves shouldn't assume a full
complement of Ruby features, but I'm not sure how interesting it is to
benchmark Ruby implementations that can't run the entire suite. It's
become obvious to me that it's that last 5% of Ruby features that end up
really biting you in the ass as far as performance, so running only 95%
of Ruby code is probably not indicative of eventual "complete impl"
performance.

- Charlie

Brian Ford

unread,
Feb 18, 2009, 2:13:52 PM2/18/09
to Ruby Benchmark Suite
On Feb 18, 10:41 am, Charles Oliver Nutter <charles.nut...@sun.com>
wrote:
> Brian Ford wrote:
> >> A drawback is that how does jruby get much of a chance to warm up and
> >> show the effects of its warmup by tracking the iteration speed?
> >> [i.e. iteration 1 is slow, iteration 2 is faster, 3 faster, choose the fastest]
>
> > Not sure what you mean by jruby not getting a chance to warm up. If
> > previous code run in other benchmarks has affected hotspot's behavior
> > such that a particular benchmark's results are different, how reliable
> > is the benchmark at all?
>
> I think the problem here is that implementations with slow startup, or
> that perform optimizations later in their runs (JRuby, IronRuby,
> probably MagLev, and some day Rubinius) are going to be "starting from
> scratch" if a short benchmark is launched anew for every iteration. It
> penalizes implementations that exchange startup speed for long-term
> execution speed.

If warmup is an issue, then standard "warmup" code should be run, not
just haphazardly whatever has run before you get to a particular
bench. That is so murky as to be totally useless IMO.

This is simply solved be having Bench.run invoke a block passed to
Bench.warmup for a particular implementation.

I'm still not convinced on the warmup question, though. If code that
"warms up" the vm is not in the code path exercised for the
implementation on that particular bench code, I don't see how it is
relevant. If it is, you can increase the iterations and see the times
decrease further into the run. That would seem to be giving relevant
results.

Brian

Tim Bray

unread,
Feb 18, 2009, 2:28:50 PM2/18/09
to ruby-bench...@googlegroups.com
So, I'm going to argue both for and against Charlie here:

On Wed, Feb 18, 2009 at 10:41 AM, Charles Oliver Nutter <charles...@sun.com> wrote:

I think the problem here is that implementations with slow startup, or
that perform optimizations later in their runs (JRuby, IronRuby,
probably MagLev, and some day Rubinius) are going to be "starting from
scratch" if a short benchmark is launched anew for every iteration. It
penalizes implementations that exchange startup speed for long-term
execution speed.

Right, but both data points are interesting; the cold-start and warm-start.  Like it or not, Ruby is among other things a *scripting* language; I always use MRI for quick command-line one-offs.  One of the really useful things a benchmark can tell us is how bad the startup performance penalty is for some implementations, and conversely, how well they can run once they've self-tuned a bit.  So I wouldn't want to discard either set of information.
 
I'm also agree that the benchmarks themselves shouldn't assume a full
complement of Ruby features, but I'm not sure how interesting it is to
benchmark Ruby implementations that can't run the entire suite.

It's interesting if you're building an implementation of Ruby.  That's one audience.  I'm in the other audience, a person who writes Ruby apps that assume a full working Ruby, and need information as to performance trade-offs between implementations.  I don't begrudge Brian and friends the use of RBS as a dev tool, as long as that doesn't compromise its utility as a real-world-ish performance reporter.  -T

Roger Pack

unread,
Feb 18, 2009, 2:39:31 PM2/18/09
to ruby-bench...@googlegroups.com
On Wed, Feb 18, 2009 at 11:09 AM, Brian Ford <bri...@gmail.com> wrote:
>
> On Feb 18, 9:49 am, Roger Pack <rogerdp...@gmail.com> wrote:
>> That's a good idea.
>> So you're saying basically move the "tests" to a separate file, kind of like
>>
>> ITERATIONS.times {
>> system("#{RUBY_VM} scriptname")}
>>
>> type of thing?
>
> You should take a look at the code. Follow the links I pasted. In
> particular, look at benchmark/utils/bench.rb and some of the benchmark
> files. Also, read benchmark/utils/README.

I'll take a look :)
-=r

Brian Ford

unread,
Feb 18, 2009, 3:02:18 PM2/18/09
to Ruby Benchmark Suite
On Feb 18, 11:28 am, Tim Bray <timb...@gmail.com> wrote:
> So, I'm going to argue both for and against Charlie here:
>
> On Wed, Feb 18, 2009 at 10:41 AM, Charles Oliver Nutter <
>
> charles.nut...@sun.com> wrote:
>
> > I think the problem here is that implementations with slow startup, or
> > that perform optimizations later in their runs (JRuby, IronRuby,
> > probably MagLev, and some day Rubinius) are going to be "starting from
> > scratch" if a short benchmark is launched anew for every iteration. It
> > penalizes implementations that exchange startup speed for long-term
> > execution speed.
>
> Right, but both data points are interesting; the cold-start and warm-start.
>  Like it or not, Ruby is among other things a *scripting* language; I always
> use MRI for quick command-line one-offs.  One of the really useful things a
> benchmark can tell us is how bad the startup performance penalty is for some
> implementations, and conversely, how well they can run once they've
> self-tuned a bit.  So I wouldn't want to discard either set of information.

Let's make these discussions concrete. It seems we're talking past
each other with at least two or three different meanings of "warmup".

The code is at the links I pasted, but just to save a click or two,
here's the relevant bits:

class Bench

#...

def self.run(parameters, &block)
bench.run(parameters, &block)
end

def reset
@times = []
@mean = nil
end

def run(inputs)
parameterized inputs do |input|
n.times do
start = Time.now

yield input

finish = Time.now
times << finish - start
end
end
end

def parameterized(inputs)
write_parameters inputs

inputs.each do |input|
reset
self.parameter = input

yield input

@sorted = times.sort
write_report
end
end

# ...
end

# bm_my_neat_method.rb

def make_some_hay(n)
# do it
end

Bench.run [10, 20, 30, 40] do |n|
make_some_hay n
end

This does not benchmark executable startup time. There is a single,
separate benchmark for that.

What this does is run the worker method "iterations" times (n in the
Bench class) for each input in the inputs array passed to Bench.run.

The times are recorded and reported in the order they are generated.
The affect of "warmup" is something that can be post-processed from
the results (possibly JIT and GC jitter as well).

If I were to rework these benches even further, I would force each
input to be a separate bench run, starting a new executable for each
"input" and iterating "iterations" times. But I tried to stay as close
to the existing benchmark files as possible.

The quality of the benchmarks is a whole other discussion. They need
substantial work as well.

>
> > I'm also agree that the benchmarks themselves shouldn't assume a full
> > complement of Ruby features, but I'm not sure how interesting it is to
> > benchmark Ruby implementations that can't run the entire suite.
>
> It's interesting if you're building an implementation of Ruby.  That's one
> audience.  I'm in the other audience, a person who writes Ruby apps that
> assume a full working Ruby, and need information as to performance
> trade-offs between implementations.  I don't begrudge Brian and friends the
> use of RBS as a dev tool, as long as that doesn't compromise its utility as
> a real-world-ish performance reporter.  -T

The rake bench task runs all the benchmarks just as before. Removing
the necessity for the benchmarked implementation to fully support
Gems, Timeout, and Thread in no way restricts the richness of what can
be benchmarked. It merely compartmentalizes it.

Brian

M. Edward (Ed) Borasky

unread,
Feb 18, 2009, 3:49:10 PM2/18/09
to ruby-bench...@googlegroups.com
On Wed, Feb 18, 2009 at 11:28 AM, Tim Bray <tim...@gmail.com> wrote:
> It's interesting if you're building an implementation of Ruby. That's one
> audience. I'm in the other audience, a person who writes Ruby apps that
> assume a full working Ruby, and need information as to performance
> trade-offs between implementations. I don't begrudge Brian and friends the
> use of RBS as a dev tool, as long as that doesn't compromise its utility as
> a real-world-ish performance reporter. -T

1. If your app is sensitive to start-up time, that's a problem with
the app IMHO. :)

2. I would think with proper statistical summarization, one could use
the RBS as a first guess at which implementation is "fastest in
general overall". But if the application is performance-critical,
you'll need to do performance testing with the actual code anyhow,
rather than a benchmark suite. And once you do select an
implementation, you'll need to optimize the Ruby code. You can't use
"premature optimization is the root of all evil" as an excuse for not
doing performance engineering, at least not while *I'm* watching. :)
--
M. Edward (Ed) Borasky
http://www.linkedin.com/in/edborasky

I've never met a happy clam. In fact, most of them were pretty steamed.

Charles Oliver Nutter

unread,
Feb 18, 2009, 3:50:18 PM2/18/09
to ruby-bench...@googlegroups.com
Brian Ford wrote:
> I'm still not convinced on the warmup question, though. If code that
> "warms up" the vm is not in the code path exercised for the
> implementation on that particular bench code, I don't see how it is
> relevant. If it is, you can increase the iterations and see the times
> decrease further into the run. That would seem to be giving relevant
> results.

Yeah, like I mentioned, I haven't looked at your refactoring yet...I was
just clarifying what Roger meant. It sounds like you understand, and
that the benchmark files themselves would still allow for appropriate
in-process warmup.

- Charlie

Charles Oliver Nutter

unread,
Feb 18, 2009, 3:52:44 PM2/18/09
to ruby-bench...@googlegroups.com
Tim Bray wrote:
> Right, but both data points are interesting; the cold-start and
> warm-start. Like it or not, Ruby is among other things a *scripting*
> language; I always use MRI for quick command-line one-offs. One of the
> really useful things a benchmark can tell us is how bad the startup
> performance penalty is for some implementations, and conversely, how
> well they can run once they've self-tuned a bit. So I wouldn't want to
> discard either set of information.

Both data points are certainly interesting, though the data point about
"performance across a wide range of benchmarks" is probably on the same
level as "startup time". I think RBS's current handling of startup as a
separate benchmark is correct; we want to know the good and the bad of
both colde/startup performance and warm/runtime performance isolated
from one another.

> It's interesting if you're building an implementation of Ruby. That's
> one audience. I'm in the other audience, a person who writes Ruby apps
> that assume a full working Ruby, and need information as to performance
> trade-offs between implementations. I don't begrudge Brian and friends
> the use of RBS as a dev tool, as long as that doesn't compromise its
> utility as a real-world-ish performance reporter. -T

Fair enough; I tend to forget that aspect. I retract my statement about
benchmarking new impls being less interesting; it's more a matter of
perception in the community about those numbers and less about their
value in general.

- Charlie

Charles Oliver Nutter

unread,
Feb 18, 2009, 3:53:47 PM2/18/09
to ruby-bench...@googlegroups.com
Brian Ford wrote:
> # bm_my_neat_method.rb
>
> def make_some_hay(n)
> # do it
> end
>
> Bench.run [10, 20, 30, 40] do |n|
> make_some_hay n
> end
>
> This does not benchmark executable startup time. There is a single,
> separate benchmark for that.

Looks fine to me.

- Charlie

roger...@gmail.com

unread,
Feb 21, 2009, 1:40:16 PM2/21/09
to Ruby Benchmark Suite

> Let's make these discussions concrete. It seems we're talking past
> each other with at least two or three different meanings of "warmup".
>
> The code is at the links I pasted, but just to save a click or two,
> here's the relevant bits:
<snip>

Oh I gotcha so you're suggesting removal of rubygems and timeout from
the benchmark files themselves.
The only concern I have there is that I currently use rubygems to use
hitimes [if installed] for more accurate timings, and ruby-proc + WMI
[if installed] for determining RSS on windows.
it looks something like
begin
require 'rubygems'
require 'hitimes'
rescue LoadError
end

in bench.rb

So my concern is that I would lose my precious hitimes! :)
It also attempts to parse /proc/pid/status to determine the RSS on
linux. Would that also be a problem?

re: threading [via the use of timeout].
Wait a minute---we use threads currently for every single benchmark.
That is not good at all, since it only shows you how good it can run
each benchmark within a threaded environment [not as a single thread--
known to be far faster for MRI]. I'll make the default timeout -1
[none] [but still require the file timeout.rb]. Would that work?

So currently that would leave us with a benchmark style that is
within the same process, run this benchmark X times, one after the
other.
It checks the RSS from within the running process, after each
iteration.
So [in the case of MRI] the first runs might be faster [less garbage
kicking around] and, in the case of Jruby, the latter runs might be
faster [warmup has occurred].


> The rake bench task runs all the benchmarks just as before. Removing
> the necessity for the benchmarked implementation to fully support
> Gems, Timeout, and Thread in no way restricts the richness of what can
> be benchmarked. It merely compartmentalizes it.

Thoughts?
-=r

Monty Williams

unread,
Feb 21, 2009, 4:08:29 PM2/21/09
to ruby-bench...@googlegroups.com
/prod/pid/status isn't available on Macs. Need something there, too.

Brian Ford

unread,
Feb 21, 2009, 4:25:07 PM2/21/09
to Ruby Benchmark Suite
On Feb 21, 10:40 am, "rogerdp...@gmail.com" <rogerdp...@gmail.com>
wrote:
> > Let's make these discussions concrete. It seems we're talking past
> > each other with at least two or three different meanings of "warmup".
>
> > The code is at the links I pasted, but just to save a click or two,
> > here's the relevant bits:
>
> <snip>
>
> Oh I gotcha so you're suggesting removal of rubygems and timeout from
> the benchmark files themselves.
> The only concern I have there is that I currently use rubygems to use
> hitimes [if installed] for more accurate timings, and ruby-proc + WMI
> [if installed] for determining RSS on windows.
> it looks something like
> begin
>  require 'rubygems'
>  require 'hitimes'
> rescue LoadError
> end
>
> in bench.rb
>
> So my concern is that I would lose my precious hitimes! :)
> It also attempts to parse /proc/pid/status to determine the RSS on
> linux.  Would that also be a problem?

The Bench runner could conditionally require other features, likely
configured with environment variables. Capturing times for runs is
orthogonal to aborting a process if it exceeds time limits. That is
why I separated the monitor from the runner. Also, as stated
previously, the bare minimum features should be required for simply
recording the timing of the work being benchmarked.

It is also fine to capture additional stats besides times. For
example, I'm going to add GC stats for Rubinius. The YAML output is
flexible and accommodates extending in this way.

Brian

Roger Pack

unread,
Feb 21, 2009, 4:46:56 PM2/21/09
to ruby-bench...@googlegroups.com
> The Bench runner could conditionally require other features, likely
> configured with environment variables. Capturing times for runs is
> orthogonal to aborting a process if it exceeds time limits. That is
> why I separated the monitor from the runner. Also, as stated
> previously, the bare minimum features should be required for simply
> recording the timing of the work being benchmarked.

We could add another environment variable "BARE_MINIMUM" or what not,
which doesn't require anything. Would that work?

re: macs
patches welcome, as I no longer have my Macbook Pro :)

Thoughts?
-=r

M. Edward (Ed) Borasky

unread,
Feb 21, 2009, 5:25:26 PM2/21/09
to ruby-bench...@googlegroups.com
I don't have a Mac either, but I'm under the impression that the
"native" way to capture performance metrics is with DTrace, and I
think that should also work on Solaris. I *know* we have Solaris
people on this list. :)

Let me poke around and see if I can find something for Windows. :)

Roger Pack

unread,
Feb 21, 2009, 5:30:44 PM2/21/09
to ruby-bench...@googlegroups.com
On Sat, Feb 21, 2009 at 3:25 PM, M. Edward (Ed) Borasky
<zzn...@gmail.com> wrote:
>
> I don't have a Mac either, but I'm under the impression that the
> "native" way to capture performance metrics is with DTrace, and I
> think that should also work on Solaris. I *know* we have Solaris
> people on this list. :)
>
> Let me poke around and see if I can find something for Windows. :)

my current answer for windows is something like
require 'rubygems'
require 'sys/proctable'
require 'time' # accomodate for sys-proctable 0.7.6 bug
return Sys::ProcTable.ps(Process.pid).working_set_size
whether good or bad I know not :)
-=r

Roger Pack

unread,
Feb 26, 2009, 1:59:04 PM2/26/09
to ruby-bench...@googlegroups.com
> It is also fine to capture additional stats besides times. For
> example, I'm going to add GC stats for Rubinius. The YAML output is
> flexible and accommodates extending in this way.


k I'll assume that BARE_BONES can attempt to capture RAM, if set [i.e.
it's not totally bare bones].
Anything else?
-=r

Roger Pack

unread,
Feb 28, 2009, 12:29:01 PM2/28/09
to ruby-bench...@googlegroups.com
On Tue, Feb 17, 2009 at 11:34 AM, Brian Ford <bri...@gmail.com> wrote:
>
> Hi folks,
>
> I've imported the RBS files into the Rubinius repository and
> refactored the structure quite a bit. You can have a look at the files
> here (see the rbs and utils directories):
>
> http://github.com/evanphx/rubinius/tree/74afe0f2cdedc06f716fcf9df98e425edfef78a7/benchmark
> http://github.com/evanphx/rubinius/blob/74afe0f2cdedc06f716fcf9df98e425edfef78a7/rakelib/bench.rake
>
> See the benchmark/utils/README for a more detailed explanation of the
> layered approach I took to organizing the benchmarks.

Oh I see what you mean now. Have a surrounding script which does the
timeout for you.

Yeah the layered approach looks way nicer. If I'm reading it
correctly it should be about the same functionality as what we have
here--just less duplicated code [plus working timeouts]. Patches are
welcome. In the meantime
I added a BARE_BONES=1 option to it in a hacky kludgey attempt to not
have to refactor it as widely as you have done.
Thanks for your work on this.
-=r

Brian Ford

unread,
Feb 28, 2009, 2:39:11 PM2/28/09
to ruby-bench...@googlegroups.com
Yeah, I think the layered approach has many advantages. I'm going to
continue experimenting with and refactoring what I imported into the
Rubinius repo. Some things I think need to be done: add statistically
significant iteration counts, more data analysis option, and audit the
benchmarks to ensure they are timing what they claim to be.

A bigger question is the goals of the RBS project. Whether it is just
for conducting shootouts, or if it is useful for experimentation, and
whether those goals conflict at all. One possible goal in constructing
benchmarks for RBS is to determine whether characteristics can be
defined that predict and/or identify the reasons for the difference in
micro vs macro performance on a particular implementation.

Cheers,
Brian

>
> >
>

Charles Oliver Nutter

unread,
Feb 28, 2009, 3:03:16 PM2/28/09
to ruby-bench...@googlegroups.com
Brian Ford wrote:
> A bigger question is the goals of the RBS project. Whether it is just
> for conducting shootouts, or if it is useful for experimentation, and
> whether those goals conflict at all. One possible goal in constructing
> benchmarks for RBS is to determine whether characteristics can be
> defined that predict and/or identify the reasons for the difference in
> micro vs macro performance on a particular implementation.

After reading this I'm convinced we should at least merge in JRuby's
bench/language benchmarks, since they go a long way toward identifying
execution-level bottlenecks and being useful for implementers, as well
as for users choosing an implementation.

The benchmarks basically go right down the list of AST nodes in a
typical Ruby parser and test each of them in various scenarios. This has
been extremely helpful for us in identifying compiler/interpreter
bottlenecks.

It's not 100% complete, mainly because there's a lot of different AST
nodes in Ruby. But it's a solid start.

http://svn.codehaus.org/jruby/trunk/jruby/bench/language/

- Charlie

M. Edward (Ed) Borasky

unread,
Feb 28, 2009, 10:49:37 PM2/28/09
to ruby-bench...@googlegroups.com
On Sat, Feb 28, 2009 at 12:03 PM, Charles Oliver Nutter
<charles...@sun.com> wrote:
>
> Brian Ford wrote:
>> A bigger question is the goals of the RBS project. Whether it is just
>> for conducting shootouts, or if it is useful for experimentation, and
>> whether those goals conflict at all. One possible goal in constructing
>> benchmarks for RBS is to determine whether characteristics can be
>> defined that predict and/or identify the reasons for the difference in
>> micro vs macro performance on a particular implementation.

I think Tim Bray posted about this a week or so ago. Tim, feel free to
repost / correct my interpretation, but I thought there were two
classes of users for RBS -- Ruby interpreter / compiler implementers
and non-implementers. I guess my belief is that the implementers ought
to be the *principal* audience of the suite. Indeed, if the bulk of
the suite came from the KRI benchmark suite and more pieces are coming
from the JRuby benchmark suite, plus the one that I built (Hilbert
Matrix), it would seem to me that "all that's left to do" is beef up
the analysis tools.

For identifying the reasons for differences between benchmarks for a
single implementation and differences between implementations for a
single benchmark, on any given platform, one ought to be able to use
tools like "oprofile" (Linux), "dtrace" (Mac, Solaris, and IIRC BSD),
CodeAnalyst (AMD processors on Linux and Windows) and VTune (Intel
processors on at least Linux and Windows, and quite possibly Open
Solaris and BSD). I don't own an Intel processor, nor do I have any
platforms except 64-bit Windows Vista and Linux on AMD64 dual core
processors.

I personally think tweaking the "virtual machines" for AMD and Intel
64-bit chips, using the tools provided by AMD and Intel, plus other
technologies available on the web, is an effort worth doing, at least
for JRuby, Rubinius and KRI / 1.9.1. I'm willing to let all versions /
forks of MRI, all the 32-bit versions, and architectures other than
AMD64 / x86_64 be as slow as they are. :) Surely if someone can make a
business case for tuning them, they can afford to hire the developers,
but I think the "community" should do the tweaks I just outlined on
the three main operating systems: Linux, Windows and MacOS X.

I have the "oprofile" / Linux infrastructure pretty much built -- it
could stand some refactoring, but it's sitting up on RubyForge in the
"cougar" project.

svn checkout http://cougar.rubyforge.org/svn/trunk/PTR2 PTR2

I've had CodeAnalyst running but haven't done much with it, since it
isn't formally supported on my version of Linux and the Linux version
is pretty much just a wrapper around "oprofile". But I have it
installed on the Windows system and would use it there.

> After reading this I'm convinced we should at least merge in JRuby's
> bench/language benchmarks, since they go a long way toward identifying
> execution-level bottlenecks and being useful for implementers, as well
> as for users choosing an implementation.
>
> The benchmarks basically go right down the list of AST nodes in a
> typical Ruby parser and test each of them in various scenarios. This has
> been extremely helpful for us in identifying compiler/interpreter
> bottlenecks.
>
> It's not 100% complete, mainly because there's a lot of different AST
> nodes in Ruby. But it's a solid start.
>
> http://svn.codehaus.org/jruby/trunk/jruby/bench/language/
>
> - Charlie

Yes ... the more benchmarks we have that exercise known bottlenecks the better.

Roger Pack

unread,
Mar 2, 2009, 7:39:46 AM3/2/09
to ruby-bench...@googlegroups.com
>> Brian Ford wrote:
>>> A bigger question is the goals of the RBS project. Whether it is just
>>> for conducting shootouts, or if it is useful for experimentation, and
>>> whether those goals conflict at all. One possible goal in constructing
>>> benchmarks for RBS is to determine whether characteristics can be
>>> defined that predict and/or identify the reasons for the difference in
>>> micro vs macro performance on a particular implementation.

>
> I think Tim Bray posted about this a week or so ago. Tim, feel free to
> repost / correct my interpretation, but I thought there were two
> classes of users for RBS -- Ruby interpreter / compiler implementers
> and non-implementers.

Yeah I think that it would be nice if RBS could satisfy both
roles--comparing ruby versions as well as helping implementers find
bottlenecks.

So these are tools for implementors? Do you suggest their integration
into RBS or is this more of a "here's tools use them if you'd like"?
Take care.
-=r

Monty Williams

unread,
Mar 2, 2009, 11:53:40 AM3/2/09
to ruby-bench...@googlegroups.com
I run the RBS on MagLev after every build just to make sure there are no significant changes. I'm switching to Brian Ford's version for that. I like the flexibility provided by the YAML output, and the cleanness of his layered approach. It's quite useful to me that any error conditions are preserved so I usually don't have to rerun broken ones manually to get an idea what happened.

I also run the original version since I think that's what most non-implementers would expect to see. I'd prefer not to maintain two versions of any benchmarks I come up with, though. If we add more output processors to Brian's, e.g. web pages & wiki markup, maybe it will drive the implementation more towards his approach.

At present I don't need Ed's analysis tools, but they could be useful down the road when we have time to focus on performance. MagLev generates hundreds of tuning statistics already, and they'll meet our needs for quite a while. Unfortunately, they're not portable to other implementations.

-- Monty

----- Original Message -----
From: "M. Edward (Ed) Borasky" <zzn...@gmail.com>
To: ruby-bench...@googlegroups.com
Sent: Saturday, February 28, 2009 7:49:37 PM GMT -08:00 US/Canada Pacific
Subject: [RBS] Re: Refactoring RBS


On Sat, Feb 28, 2009 at 12:03 PM, Charles Oliver Nutter
<charles...@sun.com> wrote:
>
> Brian Ford wrote:
>> A bigger question is the goals of the RBS project. Whether it is just
>> for conducting shootouts, or if it is useful for experimentation, and
>> whether those goals conflict at all. One possible goal in constructing
>> benchmarks for RBS is to determine whether characteristics can be
>> defined that predict and/or identify the reasons for the difference in
>> micro vs macro performance on a particular implementation.

I think Tim Bray posted about this a week or so ago. Tim, feel free to
repost / correct my interpretation, but I thought there were two
classes of users for RBS -- Ruby interpreter / compiler implementers
and non-implementers. I guess my belief is that the implementers ought

Charles Oliver Nutter

unread,
Mar 3, 2009, 11:37:03 AM3/3/09
to ruby-bench...@googlegroups.com
M. Edward (Ed) Borasky wrote:
> I personally think tweaking the "virtual machines" for AMD and Intel
> 64-bit chips, using the tools provided by AMD and Intel, plus other
> technologies available on the web, is an effort worth doing, at least
> for JRuby, Rubinius and KRI / 1.9.1. I'm willing to let all versions /
> forks of MRI, all the 32-bit versions, and architectures other than
> AMD64 / x86_64 be as slow as they are. :) Surely if someone can make a
> business case for tuning them, they can afford to hire the developers,
> but I think the "community" should do the tweaks I just outlined on
> the three main operating systems: Linux, Windows and MacOS X.

For JRuby there's no tweaking necessary; Fixnum is always 64-bit, and
JRuby runs unmodified on 64-bit JVMs, gaining whatever optimizations
they include.

- Charlie

Reply all
Reply to author
Forward
0 new messages