I think the problem here is that implementations with slow startup, or
that perform optimizations later in their runs (JRuby, IronRuby,
probably MagLev, and some day Rubinius) are going to be "starting from
scratch" if a short benchmark is launched anew for every iteration. It
penalizes implementations that exchange startup speed for long-term
execution speed.
Granted, I haven't looked at your modifications, but one key thing I
know Antonio was trying to avoid was startup/warmup time skewing results.
I'm also agree that the benchmarks themselves shouldn't assume a full
complement of Ruby features, but I'm not sure how interesting it is to
benchmark Ruby implementations that can't run the entire suite. It's
become obvious to me that it's that last 5% of Ruby features that end up
really biting you in the ass as far as performance, so running only 95%
of Ruby code is probably not indicative of eventual "complete impl"
performance.
- Charlie
I think the problem here is that implementations with slow startup, or
that perform optimizations later in their runs (JRuby, IronRuby,
probably MagLev, and some day Rubinius) are going to be "starting from
scratch" if a short benchmark is launched anew for every iteration. It
penalizes implementations that exchange startup speed for long-term
execution speed.
I'm also agree that the benchmarks themselves shouldn't assume a full
complement of Ruby features, but I'm not sure how interesting it is to
benchmark Ruby implementations that can't run the entire suite.
I'll take a look :)
-=r
1. If your app is sensitive to start-up time, that's a problem with
the app IMHO. :)
2. I would think with proper statistical summarization, one could use
the RBS as a first guess at which implementation is "fastest in
general overall". But if the application is performance-critical,
you'll need to do performance testing with the actual code anyhow,
rather than a benchmark suite. And once you do select an
implementation, you'll need to optimize the Ruby code. You can't use
"premature optimization is the root of all evil" as an excuse for not
doing performance engineering, at least not while *I'm* watching. :)
--
M. Edward (Ed) Borasky
http://www.linkedin.com/in/edborasky
I've never met a happy clam. In fact, most of them were pretty steamed.
Yeah, like I mentioned, I haven't looked at your refactoring yet...I was
just clarifying what Roger meant. It sounds like you understand, and
that the benchmark files themselves would still allow for appropriate
in-process warmup.
- Charlie
Both data points are certainly interesting, though the data point about
"performance across a wide range of benchmarks" is probably on the same
level as "startup time". I think RBS's current handling of startup as a
separate benchmark is correct; we want to know the good and the bad of
both colde/startup performance and warm/runtime performance isolated
from one another.
> It's interesting if you're building an implementation of Ruby. That's
> one audience. I'm in the other audience, a person who writes Ruby apps
> that assume a full working Ruby, and need information as to performance
> trade-offs between implementations. I don't begrudge Brian and friends
> the use of RBS as a dev tool, as long as that doesn't compromise its
> utility as a real-world-ish performance reporter. -T
Fair enough; I tend to forget that aspect. I retract my statement about
benchmarking new impls being less interesting; it's more a matter of
perception in the community about those numbers and less about their
value in general.
- Charlie
Looks fine to me.
- Charlie
We could add another environment variable "BARE_MINIMUM" or what not,
which doesn't require anything. Would that work?
re: macs
patches welcome, as I no longer have my Macbook Pro :)
Thoughts?
-=r
my current answer for windows is something like
require 'rubygems'
require 'sys/proctable'
require 'time' # accomodate for sys-proctable 0.7.6 bug
return Sys::ProcTable.ps(Process.pid).working_set_size
whether good or bad I know not :)
-=r
k I'll assume that BARE_BONES can attempt to capture RAM, if set [i.e.
it's not totally bare bones].
Anything else?
-=r
Oh I see what you mean now. Have a surrounding script which does the
timeout for you.
Yeah the layered approach looks way nicer. If I'm reading it
correctly it should be about the same functionality as what we have
here--just less duplicated code [plus working timeouts]. Patches are
welcome. In the meantime
I added a BARE_BONES=1 option to it in a hacky kludgey attempt to not
have to refactor it as widely as you have done.
Thanks for your work on this.
-=r
After reading this I'm convinced we should at least merge in JRuby's
bench/language benchmarks, since they go a long way toward identifying
execution-level bottlenecks and being useful for implementers, as well
as for users choosing an implementation.
The benchmarks basically go right down the list of AST nodes in a
typical Ruby parser and test each of them in various scenarios. This has
been extremely helpful for us in identifying compiler/interpreter
bottlenecks.
It's not 100% complete, mainly because there's a lot of different AST
nodes in Ruby. But it's a solid start.
http://svn.codehaus.org/jruby/trunk/jruby/bench/language/
- Charlie
I think Tim Bray posted about this a week or so ago. Tim, feel free to
repost / correct my interpretation, but I thought there were two
classes of users for RBS -- Ruby interpreter / compiler implementers
and non-implementers. I guess my belief is that the implementers ought
to be the *principal* audience of the suite. Indeed, if the bulk of
the suite came from the KRI benchmark suite and more pieces are coming
from the JRuby benchmark suite, plus the one that I built (Hilbert
Matrix), it would seem to me that "all that's left to do" is beef up
the analysis tools.
For identifying the reasons for differences between benchmarks for a
single implementation and differences between implementations for a
single benchmark, on any given platform, one ought to be able to use
tools like "oprofile" (Linux), "dtrace" (Mac, Solaris, and IIRC BSD),
CodeAnalyst (AMD processors on Linux and Windows) and VTune (Intel
processors on at least Linux and Windows, and quite possibly Open
Solaris and BSD). I don't own an Intel processor, nor do I have any
platforms except 64-bit Windows Vista and Linux on AMD64 dual core
processors.
I personally think tweaking the "virtual machines" for AMD and Intel
64-bit chips, using the tools provided by AMD and Intel, plus other
technologies available on the web, is an effort worth doing, at least
for JRuby, Rubinius and KRI / 1.9.1. I'm willing to let all versions /
forks of MRI, all the 32-bit versions, and architectures other than
AMD64 / x86_64 be as slow as they are. :) Surely if someone can make a
business case for tuning them, they can afford to hire the developers,
but I think the "community" should do the tweaks I just outlined on
the three main operating systems: Linux, Windows and MacOS X.
I have the "oprofile" / Linux infrastructure pretty much built -- it
could stand some refactoring, but it's sitting up on RubyForge in the
"cougar" project.
svn checkout http://cougar.rubyforge.org/svn/trunk/PTR2 PTR2
I've had CodeAnalyst running but haven't done much with it, since it
isn't formally supported on my version of Linux and the Linux version
is pretty much just a wrapper around "oprofile". But I have it
installed on the Windows system and would use it there.
> After reading this I'm convinced we should at least merge in JRuby's
> bench/language benchmarks, since they go a long way toward identifying
> execution-level bottlenecks and being useful for implementers, as well
> as for users choosing an implementation.
>
> The benchmarks basically go right down the list of AST nodes in a
> typical Ruby parser and test each of them in various scenarios. This has
> been extremely helpful for us in identifying compiler/interpreter
> bottlenecks.
>
> It's not 100% complete, mainly because there's a lot of different AST
> nodes in Ruby. But it's a solid start.
>
> http://svn.codehaus.org/jruby/trunk/jruby/bench/language/
>
> - Charlie
Yes ... the more benchmarks we have that exercise known bottlenecks the better.
>
> I think Tim Bray posted about this a week or so ago. Tim, feel free to
> repost / correct my interpretation, but I thought there were two
> classes of users for RBS -- Ruby interpreter / compiler implementers
> and non-implementers.
Yeah I think that it would be nice if RBS could satisfy both
roles--comparing ruby versions as well as helping implementers find
bottlenecks.
So these are tools for implementors? Do you suggest their integration
into RBS or is this more of a "here's tools use them if you'd like"?
Take care.
-=r
For JRuby there's no tweaking necessary; Fixnum is always 64-bit, and
JRuby runs unmodified on 64-bit JVMs, gaining whatever optimizations
they include.
- Charlie