Micro-nature of macro benchmarks (bm_norvig_spelling.rb)

12 views
Skip to first unread message

Shri

unread,
Nov 14, 2009, 1:01:55 AM11/14/09
to Ruby Benchmark Suite
In macro-benchmarks/bm_norvig_spelling.rb, about 80% of the time is
spent by MRI in the call to the function “words” shown below which
processes a large string (0.5Mb or so)

def words(text)
text.downcase.scan(/[a-z]+/)
end

Of the remaining time, 90% is spent in “train” which basically fills
up a Hash.

def train(features)
model = Hash.new(1)
features.each {|f| model[f] += 1 }
return model
end

So this is really a micro-benchmark for String#scan, and to a lesser
degree, Hash#+. The rest of the Ruby code hardly shows up in the
measurements. Should the benchmark be fixed such that most of the time
is spent in the other functions? The training phase could be moved to
a setup phase outside of the main benchmark loop.

Ideallly, the macro-benchmarks would not have any single function
accounting for more than 5% (say) of the entire execution time.
Otherwise, it is measuring a narrow aspect of the Ruby implementation
and is not really macro. Ofcourse, it is separately useful to have
micro benchmarks for individual library types and methods.

Regards
Shri

rogerdpack

unread,
Nov 16, 2009, 11:33:47 AM11/16/09
to Ruby Benchmark Suite

> So this is really a micro-benchmark for String#scan, and to a lesser
> degree, Hash#+. The rest of the Ruby code hardly shows up in the
> measurements. Should the benchmark be fixed such that most of the time
> is spent in the other functions? The training phase could be moved to
> a setup phase outside of the main benchmark loop.


Perhaps it originated from here?

http://norvig.com/spell-correct.html

One option would be to move it to micro-benchmarks.
Another might be to have two tests--one as is one that just shows the
latter half.
-r

Shri Borde

unread,
Nov 16, 2009, 7:13:19 PM11/16/09
to ruby-bench...@googlegroups.com
As another example, in bm_list, for high number of iterations, about 80% of the time is spent in concatentating a huge string (the string representation of all the elements in the list) in MRI. If I add a statement to cap the length of the string, the benchmark runs faster even though it is doing more computation. So this is another example of a macro benchmark degenerating to a micro-benchmark for String#<<.

I could modify the benchmark to remove such unintended hotspots. However, whether this is the right approach depends on whether the goal for these shootout benchmarks is to compare Ruby implementations or if the goal is to compare different languages. If the goal is to compare Ruby implementations, then I could remove the unintended hotspots. However, a better approach will be to drop those benchmarks, and instead write new benchmarks that use large existing real-world Ruby libraries (erb, rdoc, optparse, rexml, Date, pathname, Rails, etc). If the goal is to compare different languages, then removing the unintended hotspots is not the right solution.

I don't think duplicating the benchmark as both a micro and a macro benchmark is a good idea as the benchmark is not a great one to begin with. Ideally, we would remove such benchmarks (or atleast move them to a folder called "shootout" where they are considered neither micro or macro) and add other better micro and macro benchmarks, but that is not going to be easy. So assuming the main goal is to compare the Ruby implementation, I will submit patches to remove the unintended hotspots. Let me know if I should pursue any other approach...

Antonio Cangiano

unread,
Nov 16, 2009, 8:12:23 PM11/16/09
to ruby-bench...@googlegroups.com
On Mon, Nov 16, 2009 at 7:13 PM, Shri Borde <Shri....@microsoft.com> wrote:
I will submit patches to remove the unintended hotspots.

What's your github username, Shri?
--
http://ThinkCode.TV - Screencast e videocorsi di programmazione
http://antoniocangiano.com - Zen and the Art of Programming
http://math-blog.com - Mathematics is wonderful!
Follow me on Twitter: http://twitter.com/acangiano
Author of "Ruby on Rails for Microsoft Developers" (Wrox, 2009)

Shri Borde

unread,
Nov 16, 2009, 8:16:34 PM11/16/09
to ruby-bench...@googlegroups.com

Its “shri”. I can work with Jim Deville to figure out the right process. Thanks!

 

From: ruby-bench...@googlegroups.com [mailto:ruby-bench...@googlegroups.com] On Behalf Of Antonio Cangiano
Sent: Monday, November 16, 2009 5:12 PM
To: ruby-bench...@googlegroups.com
Subject: [RBS] Re: Micro-nature of macro benchmarks (bm_norvig_spelling.rb)

 

On Mon, Nov 16, 2009 at 7:13 PM, Shri Borde <Shri....@microsoft.com> wrote:

Evan Phoenix

unread,
Nov 16, 2009, 8:21:40 PM11/16/09
to ruby-bench...@googlegroups.com
Just a quick note:

I'm currently working on a new suite which is a reorganization the existing RBS and the addition of more benchmarks. It will be release this week at rubyconf. It strives to not try and exercise every syntax element of an implementation, but rather get a broader feel for the performance.

The biggest change is the organization into tiers. Each benchmark is examined strictly to see what exactly it exercises and put into a tier which properly represents how low level it is.

It will stress that performance in tier0, the most trivial benchmarks, does not always translate to performance in higher tiers, and that all of them must be run to begin to get an accurate picture of the performance of a system.

- Evan

Monty Williams

unread,
Nov 17, 2009, 8:00:00 AM11/17/09
to ruby-bench...@googlegroups.com
Roger,

I thought I'd try your RDoc benchmark, but I can't seem to get it to run. Is it just me? Anyone else hitting problems?

I invoked it as:
$ rake bench:file FILE=benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb

Do I need to do anything else first? Later version of 1.9?
Here's what I got as a result:
$ cat RBS-ruby19-091117.044947.yaml
---
name: benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb
parameters:
- 1
ruby_ver: 1.9.1 2009-01-30 0 x86_64-linux; -O2 -g -Wall -Wno-parentheses; '--prefix=/usr/local/ruby1.9'
---
name: benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb
parameter: 1
status: "NoMethodError undefined method `<=>' for nil:NilClass"
---
name: benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb
status: Terminated for unknown reason

Do you have result times on any impls?

Thanks,
Monty

Monty Williams

unread,
Nov 17, 2009, 8:09:33 AM11/17/09
to ruby-bench...@googlegroups.com
When I try it using MRI 1.8.6 or 1.8.7 I have this problem. If it works for other people I'll debug my environment, but thought I'd just check for prerequisites first.

cat results/rbs/RBS-ruby-091117.050656.yaml


---
name: benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb
parameters:
- 1

ruby_ver: 1.8.6 2008-08-11 287 x86_64-linux;-g -O2; '--prefix=/usr/local/ruby186p287'


---
name: benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb
parameter: 1

status: "Gem::LoadError Could not find RubyGem rdoc (>= 2.4)\n"


---
name: benchmarks/rdoc/bm_rdoc_against_itself_rdoc.rb
status: Terminated for unknown reason

Roger Pack

unread,
Nov 17, 2009, 10:46:07 AM11/17/09
to ruby-bench...@googlegroups.com
> I'm currently working on a new suite which is a reorganization the existing RBS and the addition of more benchmarks. It will be release this week at rubyconf. It strives to not try and exercise every syntax element of an implementation, but rather get a broader feel for the performance.


Also check my recent addition of rdoc benchmarks--they and the rails
benchmarks are about the closest to reality I think is in there.
-r

Roger Pack

unread,
Nov 17, 2009, 11:46:29 AM11/17/09
to ruby-bench...@googlegroups.com
> status: "Gem::LoadError Could not find RubyGem rdoc (>= 2.4)\n"

Hmm. Appears that my attempt to not require gems failed. Try it now
[there are a few more tests in there, now, too].

> do you have any result times for any impls?

Attached is 1.8.6 windows (i.e. about as bad as ruby can be)
-r

---
name: benchmarks/rdoc/bm_rdoc_against_itself_darkfish.rb
parameters:
- 1
ruby_ver: 1.8.6 2009-03-31 368 i386-mingw32;-g -O2 ; '--with-winsock2'
'--disable-install-doc' '--enable-shared' '--prefix='
---
name: benchmarks/rdoc/bm_rdoc_against_itself_darkfish.rb
parameter: 1
iterations: 2
max: 30.9375
min: 26.65625
median: 30.9375
mean: 28.796875
standard_deviation: 2.140625
times:
- 30.9375
- 26.65625
memory_usages:
- 58273792
- 58449920
---
name: benchmarks/rdoc/bm_rdoc_against_itself_darkfish.rb
status: success
---
name: benchmarks/rdoc/bm_rdoc_against_itself_ri.rb
parameters:
- 1
ruby_ver: 1.8.6 2009-03-31 368 i386-mingw32;-g -O2 ; '--with-winsock2'
'--disable-install-doc' '--enable-shared' '--prefix='
---
name: benchmarks/rdoc/bm_rdoc_against_itself_ri.rb
parameter: 1
iterations: 2
max: 54.3125
min: 37.71875
median: 54.3125
mean: 46.015625
standard_deviation: 8.296875
times:
- 54.3125
- 37.71875
memory_usages:
- 57831424
- 59006976
---
name: benchmarks/rdoc/bm_rdoc_against_itself_ri.rb
status: success
---
name: benchmarks/rdoc/bm_rdoc_core_darkfish.rb
parameters:
- 1
ruby_ver: 1.8.6 2009-03-31 368 i386-mingw32;-g -O2 ; '--with-winsock2'
'--disable-install-doc' '--enable-shared' '--prefix='
---
name: benchmarks/rdoc/bm_rdoc_core_darkfish.rb
parameter: 1
iterations: 2
max: 552.296875
min: 518.34375
median: 552.296875
mean: 535.3203125
standard_deviation: 16.9765625
times:
- 518.34375
- 552.296875
memory_usages:
- 187084800
- 229449728
---
name: benchmarks/rdoc/bm_rdoc_core_darkfish.rb
status: success

Shri Borde

unread,
Nov 17, 2009, 2:44:59 PM11/17/09
to ruby-bench...@googlegroups.com
Yup, rdoc and rails tests will be good.

I got the "Gem::LoadError Could not find RubyGem rdoc (>= 2.4)" error message with rdoc. Will try your fixes.

The Rails app has many dependencies at the moment. I will have to see which ones work with IronRuby. The MySql gem is definitely a blocking issue as there is currently no MySql adapter for IronRuby (there are adapters for SQLServer and SQLite). Would be nice if there was an additional Rails app that did not depend on any native adapters (it could use an in-memory database to exercise the ActiveRecord logic, if such a thing exists).

-----Original Message-----
From: ruby-bench...@googlegroups.com [mailto:ruby-bench...@googlegroups.com] On Behalf Of Roger Pack
Sent: Tuesday, November 17, 2009 7:46 AM
To: ruby-bench...@googlegroups.com
Subject: [RBS] Re: Micro-nature of macro benchmarks (bm_norvig_spelling.rb)


Monty Williams

unread,
Nov 17, 2009, 3:10:43 PM11/17/09
to ruby-bench...@googlegroups.com
It appears to work now.

nohup rake bench:file FILE=benchmarks/rdoc/bm_rdoc_against_itself_darkfish.rb

lots of output and finished with a real report. I have to run to catch a plane right now, and won't be back in the states until tomorrow noon. But if I find problems I'll let you know later this week.

-- Monty

Shri Borde

unread,
Nov 17, 2009, 7:32:35 PM11/17/09
to ruby-bench...@googlegroups.com
With the latest pull, I see diffs in some files that I have not touched because of line ending differences. For eg, benchmarks/rdoc/ruby_trunk/symbian/configure.bat.

"git config core.autocrlf" is true (the default) for me. Roger, do you have this set to false? If so, you could please do "git config core.autocrlf true"? If everyone sets it to true, GIT will only store \n, but Windows users will have \r\n in the checked out files as expected, and Unix users will have \n as expected.

Shri

Roger Pack

unread,
Nov 18, 2009, 11:28:38 AM11/18/09
to ruby-bench...@googlegroups.com
> "git config core.autocrlf" is true (the default) for me. Roger, do you have this set to false? If so, you could please do "git config core.autocrlf true"? If everyone sets it to true, GIT will only store \n, but Windows users will have \r\n in the checked out files as expected, and Unix users will have \n as expected.

Sure. Guess svn must have given me those files in with windows style
line endings...

Feel free to clobber the line endings they're not necessary for any
tests, AFAIK.

-r

Antonio Cangiano

unread,
Nov 24, 2009, 9:27:11 PM11/24/09
to ruby-benchmark-suite
On Mon, Nov 16, 2009 at 8:21 PM, Evan Phoenix <ev...@fallingsnow.net> wrote:
I'm currently working on a new suite which is a reorganization the existing RBS and the addition of more benchmarks. It will be release this week at rubyconf. It strives to not try and exercise every syntax element of an implementation, but rather get a broader feel for the performance.

Was it released? Do you plan to replace the RBS with it or simply merge the code?

Shri, excellent suggestions.

Cheers,
Antonio
Reply all
Reply to author
Forward
0 new messages