[erlang-questions] [ANN] Silly benchmarking

55 views
Skip to first unread message

Garrett Smith

unread,
Apr 30, 2013, 9:44:48 AM4/30/13
to Erlang Questions
This is not an announcement of anything -- but [ANN] seems to flag
"something I can maybe use" which does apply in this case :)

Occasionally I wonder, "what's faster"? It's not often, but it happens.

I've found the best way to answer this is to measure things.

So I have this silly project:

https://github.com/gar1t/erlang-bench

It's not rigorous but it's simple and I can experiment quickly with
different implementations. My goal is just to get a sense of things --
not to formally prove anything.

It's so trivial it's almost not worth sharing/reusing -- *however* it
may provide value as a distributed repository for what people are
interested in. As it's in github there's no ownership -- please feel
free to fork and use for your own concerns!

Garrett
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Michael Truog

unread,
Apr 30, 2013, 10:35:28 AM4/30/13
to Garrett Smith, Erlang Questions
You might want to look at erlbench here https://github.com/okeuday/erlbench since it has the same basic purpose, and allows you to use different compilation methods now (through the makefile specifying an optimization level). The erlbench project is also ad-hoc, but it has been enough to produce results in the past.

The other option is trying to use basho_bench here https://github.com/basho/basho_bench, if you are testing key/value storage.

- Michael

Garrett Smith

unread,
Apr 30, 2013, 12:42:37 PM4/30/13
to Michael Truog, Erlang Questions
On Tue, Apr 30, 2013 at 9:35 AM, Michael Truog <mjt...@gmail.com> wrote:
> On 04/30/2013 06:44 AM, Garrett Smith wrote:
>> This is not an announcement of anything -- but [ANN] seems to flag
>> "something I can maybe use" which does apply in this case :)
>>
>> Occasionally I wonder, "what's faster"? It's not often, but it happens.
>>
>> I've found the best way to answer this is to measure things.
>>
>> So I have this silly project:
>>
>> https://github.com/gar1t/erlang-bench
>>
>> It's not rigorous but it's simple and I can experiment quickly with
>> different implementations. My goal is just to get a sense of things --
>> not to formally prove anything.
>>
>> It's so trivial it's almost not worth sharing/reusing -- *however* it
>> may provide value as a distributed repository for what people are
>> interested in. As it's in github there's no ownership -- please feel
>> free to fork and use for your own concerns!
>>
> You might want to look at erlbench here https://github.com/okeuday/erlbench since it has the same basic purpose, and allows you to use different compilation methods now (through the makefile specifying an optimization level). The erlbench project is also ad-hoc, but it has been enough to produce results in the past.
>
> The other option is trying to use basho_bench here https://github.com/basho/basho_bench, if you are testing key/value storage.

Yes, but you'll notice how *easy* it is to use erlang-bench, which is
nothing more than escript files with a 10 line include file.

I'm an extraordinarily lazy person :)

Though seriously, thanks for the references. If I was more concerned
about benchmark integrity, those might be good options -- but this is
just a sniff test approach to satisfy my curiosity about various
topics.

Jeremy Ong

unread,
Apr 30, 2013, 12:57:08 PM4/30/13
to Garrett Smith, Erlang Questions
I'd be very interested if we got a wiki going on one of these projects
with community updated numbers of various benchmark runs.

For example:

keylists vs orddicts vs dicts
fold vs recursion
fibonacci
all the other various standard benchmarks

Loïc Hoguin

unread,
Apr 30, 2013, 12:59:20 PM4/30/13
to Jeremy Ong, Erlang Questions
It's not very interesting unless the numbers are also available for
systems under load.
--
Loïc Hoguin
Erlang Cowboy
Nine Nines
http://ninenines.eu

Garrett Smith

unread,
Apr 30, 2013, 1:02:04 PM4/30/13
to Jeremy Ong, Erlang Questions
This is sort of my thinking with the github project:

- Better than a wiki because you can run the files
- Easily forked and modified -- so people can share and modify code trivially
- Distributed is better than centralized!

There's probably a temptation to view anything like this as
authoritative -- it of course is not and cannot be (e.g. the three
stupid files I have are totally arbitrary). I think distributed source
like this is a good push-back for that temptation.

Garrett Smith

unread,
Apr 30, 2013, 1:03:18 PM4/30/13
to Loïc Hoguin, Erlang Questions
Not very interesting to you. Of course you can write whatever you
like. For *me* I was curious about some relative performance
characteristics. No religion here.

> Loďc Hoguin

Loïc Hoguin

unread,
Apr 30, 2013, 1:06:04 PM4/30/13
to Garrett Smith, Erlang Questions
I'm sure *you* know these results might not be true for systems under
load, but others might not. Not having both could cause more harm than
good, and would not be interesting in that sense.


--
Loïc Hoguin

Jeremy Ong

unread,
Apr 30, 2013, 1:15:49 PM4/30/13
to Loïc Hoguin, Erlang Questions
I think comparison benchmarks are still useful. Even against a system
with no other load, knowing that X does the equivalent thing that Y
does but faster could provide initial direction in the implementation
of something new at least.

Agreed that benchmarks can be harmful if they aren't viewed
objectively though. We just need a caveat that the user measures his
or her own application as well!

Garrett Smith

unread,
Apr 30, 2013, 1:20:16 PM4/30/13
to Loïc Hoguin, Erlang Questions
You have a point. Though "load" would be pretty arbitrary -- I could
see some load-xxx.escript files that did terrible things while the
other scripts were run. But it would depend on what else is running on
your system. And your system architecture. And your system hardware.
Waiiit a minute, this just got real complicated!

So, don't make decisions unless you have reasonably complete
information, whatever that means :)

Fred Hebert

unread,
Apr 30, 2013, 1:35:09 PM4/30/13
to Garrett Smith, Erlang Questions
A rather simple trick to simulate load is to have a few busy-looping
processes on the VM, and give it a higher priority than the rest. It
should eat time more than anything else and make it look like there's a
lot of contention for schedulers, with reduced execution time for what
you want to measure.

I've used it a few times, although I am not 100% sure it's super
representative o a real system being under load.

Garrett Smith

unread,
Apr 30, 2013, 1:36:30 PM4/30/13
to Fred Hebert, Erlang Questions
I'm curious what the OTP team does (in the lab) to vet various
implementation options.

Michael Truog

unread,
Apr 30, 2013, 1:52:26 PM4/30/13
to Garrett Smith, Erlang Questions
I like the idea of having a site like http://benchmarksgame.alioth.debian.org/ that is more specific to Erlang. Ideally we could test fun stuff, like comparing actors in Scala/Akka and JActor (https://github.com/laforge49/JActor) with Erlang. However, to keep stuff transparent and useful we would need to document hardware and avoid short runs that skew results. To have a serious site requires a monetary commitment and I am not sure who would be willing to pursue that (Ericsson might be the best organization to pursue this and show how Erlang shines). Either way, having the source code available is a requirement for making sure the tests are repeatable, so I think we can pursue that direction in the meantime.

One other benchmarking project I am aware of is here:
http://www.softlab.ntua.gr/release/bencherl/index.html

So there is probably a lot of individual testing efforts that could be combined, to provide definitive data for decision making. That probably requires discussion and coordination at a higher-level.
> .

Erik Søe Sørensen

unread,
Apr 30, 2013, 2:11:57 PM4/30/13
to Erlang Questions

I think also ecriterion should be mentioned - it does an honest attempt at verifying the reliability of the results.

Michael Truog

unread,
Apr 30, 2013, 2:32:56 PM4/30/13
to Erik Søe Sørensen, Erlang Questions
Thank you for mentioning ecriterion (https://github.com/jlouis/ecriterion/), I hadn't seen that previously.  It doesn't seem to have any tests or analysis yet, but I think it would be great if we could get a common Erlang testing project going.

Richard Carlsson

unread,
Apr 30, 2013, 3:17:22 PM4/30/13
to erlang-q...@erlang.org
And here's my unfinished benchmarking project:

https://github.com/richcarl/berk

/Richard

On 2013-04-30 20:32, Michael Truog wrote:
> Thank you for mentioning ecriterion
> (https://github.com/jlouis/ecriterion/), I hadn't seen that previously.
> It doesn't seem to have any tests or analysis yet, but I think it would
> be great if we could get a common Erlang testing project going.
>
> On 04/30/2013 11:11 AM, Erik Søe Sørensen wrote:
>>
>> I think also ecriterion should be mentioned - it does an honest
>> attempt at verifying the reliability of the results.
>>
>> Den 30/04/2013 16.35 skrev "Michael Truog" <mjt...@gmail.com
>> <mailto:mjt...@gmail.com>>:
>>
>> On 04/30/2013 06:44 AM, Garrett Smith wrote:
>> > This is not an announcement of anything -- but [ANN] seems to flag
>> > "something I can maybe use" which does apply in this case :)
>> >
>> > Occasionally I wonder, "what's faster"? It's not often, but it
>> happens.
>> >
>> > I've found the best way to answer this is to measure things.
>> >
>> > So I have this silly project:
>> >
>> > https://github.com/gar1t/erlang-bench
>> >
>> > It's not rigorous but it's simple and I can experiment quickly with
>> > different implementations. My goal is just to get a sense of
>> things --
>> > not to formally prove anything.
>> >
>> > It's so trivial it's almost not worth sharing/reusing --
>> *however* it
>> > may provide value as a distributed repository for what people are
>> > interested in. As it's in github there's no ownership -- please feel
>> > free to fork and use for your own concerns!
>> >
>> > Garrett
>> > _______________________________________________
>> > erlang-questions mailing list
>> > erlang-q...@erlang.org <mailto:erlang-q...@erlang.org>
>> > http://erlang.org/mailman/listinfo/erlang-questions
>> >
>> You might want to look at erlbench here
>> https://github.com/okeuday/erlbench since it has the same basic
>> purpose, and allows you to use different compilation methods now
>> (through the makefile specifying an optimization level). The
>> erlbench project is also ad-hoc, but it has been enough to produce
>> results in the past.
>>
>> The other option is trying to use basho_bench here
>> https://github.com/basho/basho_bench, if you are testing key/value
>> storage.
>>
>> - Michael
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-q...@erlang.org <mailto:erlang-q...@erlang.org>

Motiejus Jakštys

unread,
Apr 30, 2013, 3:42:35 PM4/30/13
to Richard Carlsson, erlang-q...@erlang.org
On Tue, Apr 30, 2013 at 8:17 PM, Richard Carlsson
<carlsson...@gmail.com> wrote:
> And here's my unfinished benchmarking project:
>
> https://github.com/richcarl/berk
>

This looks quite big and unusual compared to others.

Can you explain what you calibrate, and what is the purpose of sending
a message many times to a dead process? Does it have some well-defined
behavior in the virtual machine?

It was not obvious from a 5-minute peek at the source, so an answer
might save time for others.

Thanks,
Motiejus

Noah Diewald

unread,
Apr 30, 2013, 3:54:34 PM4/30/13
to erlang-q...@erlang.org
I've always wished that they had a network of computers as one of the options
instead of just single computers.
signature.asc

David Mercer

unread,
Apr 30, 2013, 4:01:19 PM4/30/13
to Loïc Hoguin, Jeremy Ong, Erlang Questions
On Tuesday, April 30, 2013, Loïc Hoguin wrote:

> It's not very interesting unless the numbers are also available for
> systems under load.

Wouldn’t you want benchmarks to eliminate confounding variables rather than add them?

If you wanted to benchmark under load, however, why not just put the system under the load you want to test under and then run your benchmarks. I would not expect the benchmarking framework to provide that load for you.

Cheers,

DBM

Richard Carlsson

unread,
Apr 30, 2013, 4:02:22 PM4/30/13
to Motiejus Jakštys, erlang-q...@erlang.org
On 2013-04-30 21:42, Motiejus Jakštys wrote:
> On Tue, Apr 30, 2013 at 8:17 PM, Richard Carlsson
> <carlsson...@gmail.com> wrote:
>> And here's my unfinished benchmarking project:
>>
>> https://github.com/richcarl/berk
>>
>
> This looks quite big and unusual compared to others.
>
> Can you explain what you calibrate,

I started out wanting a really stable core measurement loop, and a way
to calibrate the system under test so that it could automatically set
good defaults for number of iterations, and be aware of the clock
precision and typical variance. As I went on, I started abstracting more
and more, and some ideas (like the "berps" - bogus erlang reductions per
second) got left as rough sketches, and there's nothing yet that
actually saves the result of a calibration and reuses it.

The parts that are working well are the runner/1, gather/1, and stats/1
functions, and the functions run/2, run_for/2, and calibrate/0, which
you can study to understand how the runner and gather functions work.
There's still some tweaking to be done - the calibration doesn't always
converge as well as I'd like it.

> and what is the purpose of sending
> a message many times to a dead process? Does it have some well-defined
> behavior in the virtual machine?

No, it was just a way to set up a calibration loop that did something
very cheap but that the compiler couldn't optimize away. But that part
isn't very exciting - I'm not using it for anything yet. I just dropped
all of the code in a single module to keep it simple at this stage.

/Richard

Tim Watson

unread,
Apr 30, 2013, 5:26:33 PM4/30/13
to Noah Diewald, erlang-q...@erlang.org
On 30 Apr 2013, at 20:54, Noah Diewald wrote:

> I've always wished that they had a network of computers as one of the options
> instead of just single computers.
>

https://github.com/nebularis/systest exists to address the issue of running tests against multiple nodes, although it's still in quite an early stage (no remote nodes via ssh support yet) and already in need of some refactoring. It currently supports running in standalone (or shell) mode and integrating with common_test (via the ct_hooks mechanism) and its goal is to manage test resources (e.g., a node, a cluster of nodes, etc) during a test run. There's nothing to prevent this from being used for benchmarks instead of integration tests.

You can see example (test) usage in https://github.com/rabbitmq/rabbitmq-test/tree/bug25421/multi-node, common_test integration via https://github.com/rabbitmq/rabbitmq-test/tree/bug25421/multi-node/test and SysTest API calls in https://github.com/rabbitmq/rabbitmq-test/blob/bug25421/multi-node/src/rabbit_ha_test_utils.erl

There's plenty of things wrong with the current implementation though - put together in a bit of a rush - if you're interested in the dirty laundry and/or state of usefulness, https://github.com/nebularis/systest/wiki/The-Big-Refactor outlines some of the existing (structural) things that I'm in the process of fixing. Next big ticket feature is an ssh based layer to manage remote nodes during test runs. Nonetheless, we've found and fixed plenty of bugs using it to test clustering/HA scenarios.

Suggestions, comments and/or pull requests are welcome, though do bear in mind I'm going to rip up large swathes of it and rewrite them very soon. :)

Cheers,
Tim
signature.asc

Scott Lystig Fritchie

unread,
Apr 30, 2013, 7:14:38 PM4/30/13
to Michael Truog, Erlang Questions
Michael Truog <mjt...@gmail.com> wrote:

mt> The other option is trying to use basho_bench here
mt> https://github.com/basho/basho_bench, if you are testing key/value
mt> storage.

Actually, you can use it to measure whatever you want, if you write a
callback module with a new/1 and run/4 function. Tracking throughput is
always good, but also tracking min, mean, median, 95th, 99th, 99_9th,
and maximum latency stats plus error rates is An Even BetterThing(tm).
Nice R and Gnuplot graphs is also very convenient.

For an example graph, see
http://www.snookles.com/scotttmp/basho_bench_null_summary.png ... It
used the bundled examples/null_test.config config, which tests
approximately nothing but basho_bench's internal overhead. :-)

That graph was generated on an 8 HT core MacBook Pro (with a fair amount
of other unrelated stuff running in parallel, CPU frequency changing due
to heat, etc etc.), using today's 'master' branch:

git clone git://github.com/basho/basho_bench.git
cd basho_bench
make
./basho_bench examples/null_test.config
make results

... and the graph is at tests/current/summary.png.

-Scott

Jesper Louis Andersen

unread,
May 1, 2013, 5:17:48 AM5/1/13
to Scott Lystig Fritchie, Erlang Questions


On May 1, 2013, at 1:14 AM, Scott Lystig Fritchie <frit...@snookles.com> wrote:

>
> Actually, you can use it to measure whatever you want, if you write a
> callback module with a new/1 and run/4 function. Tracking throughput is
> always good, but also tracking min, mean, median, 95th, 99th, 99_9th,
> and maximum latency stats plus error rates is An Even BetterThing(tm).
> Nice R and Gnuplot graphs is also very convenient.
>

Also, bootstrapping the samples to track how much your median and mean vary inside 100000 bootstrapping runs could tell you how stable your measurement seems to be. R has a really nice multi-core bootstrapping module call I intended to do for this. But that project is dormant at the moment.

Jesper Louis Andersen
Erlang Solutions Ltd., Copenhagen
Reply all
Reply to author
Forward
0 new messages