JMH vs Caliper: reference thread

Gleb Smirnov

unread,

Feb 1, 2014, 3:59:39 AM2/1/14

to mechanica...@googlegroups.com

Hi All,

I have recently seen several discussions where people were trying to decide on which tool to use for benchmarking. Here's one of those.

I think there should exist some trusted reference material on the subject. A thread on mechanical-sympathy seems just fine. Hence, I kindly ask you to:

* List any and all problems and pitfalls that you believe either of the systems has;

* Share your relevant experience in using any of the systems;

* And generally anything that you think will help a party in doubt make the right choice.

The summary of the discussion should also make for an excellent blog post, I think.

Cheers,

Gleb

Aleksey Shipilev

unread,

Feb 1, 2014, 7:19:40 AM2/1/14

to mechanica...@googlegroups.com

Full disclosure: I work for Oracle, and do Java performance work in

OpenJDK. I also develop and maintain JMH, and JMH is my 4-th (I think)

benchmark harness. Hence, my opinion is biased, and I try to stay objective

because we've been in Caliper's shoes...

Disclaimer: I am not the only maintainer and developer for JMH. It was

developed with heavy contributions from both JRockit (where it came

originally) and HotSpot performance teams. Hence, when I say "we", I mean

many JMH contributors.

IMO, Caliper is not as bad for large benchmarks. In fact, Caliper feels just

like pre-JMH harnesses we had internally in Sun/BEA. And that is not a

coincidence, because Caliper benchmark interface is very intuitive and an

obvious one. The sad revelation that cas upon me over previous several

years is that the simplicity of benchmark APIs does not correlate with

benchmark reliability.

I don't follow Caliper development, and I'm not in position to bash Caliper,

so instead of claiming anything that Caliper does or does not do, let me

highlight the history of JMH redesigns over the years. That should help

to review other harnesses, since I can easily say "been there, tried that,

it's broken <in this way>". Most of the things can even be guessed from the

API choices the harness makes. If API can not provide the instruments to

avoid a pitfall, then it is very probable harness makes no moves to avoid it

(except for the cases where magic dust is involved).

I tend to think this is a natural way for a benchmark harness to evolve, and

you can map this timeline back to your favorite benchmark harness. The

pitfalls are many and tough, the non-extensive "important list" is as follows:

A. Dynamic selection of benchmarks.

Since you don't know at "harness" compile time what benchmarks it would run,

the obvious choice would be calling the benchmark methods via Reflection.

Back in the days, this pushed us to accept the same "repetition" counter in

the method to amortize the reflective costs. This already introduces the

major pitfall about looping, see below.

But infrastructure-wise, harness then should intelligently choose the

repetition count. This almost always leads to calibrating mechanics, which

is almost always broken when loop optimizations are in effect. If one

benchmark is "slower" and autobalances with lower reps count, and another

benchmark is "faster" and autobalances with higher reps count, then

optimizer have more opportunity to optimize "faster" benchmark even further.

Which departs us from seeing how exactly the benchmark performs and

introduces another (hidden! and uncontrollable!) degree of freedom.

In retrospect, the early days decision of JMH to generate synthetic

benchmark code around the method, which contains the loop (carefully chosen

by us to avoid the optimizations in current VMs -- separation on concerns,

basically), is paying off *very* nicely. We can then call that synthetic

stub via Reflection without even bothering about the costs.

...That is not to mention users can actually review the generated benchmark

code looking for the explanations for the weird effects. We do that

frequently as the additional control.

B. Loop optimizations.

This is by far my major shenanigan about almost every harness. What is the

usual answer to "My operation is very small, and timers' granularity/latency

is not able to catch the effect"? My, yes, of course, warp it in the indexed-loop.

This mistake is painfully obvious, and real pain-in-the-back to prevent. We

even have the JMH sample to break the habit of people coming to build the

same style benchmarks:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_11_Loops.java

(BTW, the last time I tried Caliper a few years ago, it even refused to run

when calibration says the running time does not change when changing the

reps count. Well, THANK YOU, but I really WANT to run that benchmark!)

C. Dead-code elimination.

This is my favorite pet-peeve. It is remarkably hard to introduce the

side-effect to the benchmark which is both reliable and low-overhead.

Low-overhead parts really require the JVM expertise to get right, and

pushing that on to users is very, very dumb. JMH's Blackhole classes took

a significant amount of our time to implement correctly, and we still doing

the tunings here and there to minimize their costs [to the extreme we are

thinking about the proper VM interface to consume the values]. Remarkably,

we can hide all that complexity behind the simple user interface, and let

users concentrate on their workloads. This is what good harnesses do.

Examples:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.java

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java

The usual ways to deal with DCE are broken in subtle ways:

a) Returning the value from the reflective call: JIT inflates the

reflective call, and inlines it as usual Java method, DCE ensues.

b) Writing the values in the fields: doing that in the loop means runtime

can only write the latest value, and DCE anything else; storing Objects in

fields usually entail GC store barriers; storing fields usually entail false

sharing with some other important data...

c) Accumulate the values in locals and print them: still allows loop

pipelining and partial DCE-ing; and also, good luck with Objects!

You might want to investigate which one your favorite harness is using.

D. Constant foldings

As much as dead-code elimination is a buzzword in benchmarking community,

the symmetric effect is mostly overlooked. That is, DCE works by eliminating

the part of program graph because of the unclaimed outputs. But there is

also the optimization that eliminates the part of program graph because of

the predictable inputs. This JMH sample demonstrates the effect:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_10_ConstantFold.java

Avoiding this issue again requires JVM expertise, and it is cruel to push

users to do that. It takes a very careful design of benchmark loop to break the

load coalescing across loop iterations, when you *also* want to provide low

overhead for fine-grained benchmarks. We spent considerable amount of time

tuning up the measurement loops (and this is transparent for JMH users,

because you "just" recompile the benchmark code, and the new synthetic code

is being generated, voila).

When harness asks users to create benchmark loop on their own, it pushes

users to deal with the issue on their own as well. I can count the people

who have time, courage, and expertise to do this kind of code on the fingers

of one hand.

E. Non-throughput measures

Now, when the payload is wrapped in the benchmark loop, it seems impossible

to collect any non-throughput metrics. The two most significant that we

learned through our internal JMH uses are: sampling the execution time, and

single-shot measurements.

Measuring the individual timings is very tough, because timer overheads can

be very painful, and there is also coordinated omission tidbits, yada-yada...

That is, without a smart scheme that samples only *some* invocations, you

will mostly drown in the timing overheads. It turns out, sampling is rather

easy to implement with harness which *already* generates the synthetic code.

This is why JMH's support for SampleTime was so clean and easy to implement.

(Success story: measuring FJP latencies on JDK 8 Streams)

Measuring the single-invocation timings is needed for warmup studies: what's

the time to invoke the payload "in cold"? Again, once you generate the code

around the benchmark, it is easy to provide the proper timestamping. When

your harness implements multiple forks, it is very easy to have thousands

of "cold" invocations without leaving your coffee cold. What if your harness

requires reps count and requires calibration? Forget it.

The second-order concern is to provide the clean JVM environment for this

kind of run. In JMH, there is a separation between host JVM and the forked

JVM, where most of the heavy infrastructural heavy-lifting like regexp

matching, statistics, printing, etc is handled in the host VM. The forked VM

fast-pathes to "just" measure, not contaminating itself with most infra

stuff. This makes SingleShot benchmark modes very convenient in JMH.

(Success story: JDK 8 Lambda linkage/capture costs, and also JSR 292 things)

See the examples here:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_02_BenchmarkModes.java

It is educational to compile the benchmarks and look for the generated code

to see the loops we are generating for them (target/generated-sources/...)

F. Synchronize iterations

Everything significantly complicates when you start to support

multi-threaded benchmarks. It is *not* enough to shove in the executor and

run the benchmark in multiple threads. The simplest issue everyone overlooks

is that starting/stopping threads is not instantaneous, and so you need to

care if all your worker threads are indeed started. More in this JMH example:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_17_SyncIterations.java

Without this, most of heavily-threaded benchmarks are way, way off the

actual results. We routinely seen >30% difference prior introducing this

kind of workaround. The only other harness I know doing this is SPECjvm2008.

G. Multi-threaded sharing

Multi-threaded benchmarks are also interesting because they introduce

sharing. It is tempting to "just" make the benchmark object either shared

between the worker threads, or allocate completely distinct objects for

each worker thread. That's the obvious way to introduce sharing in the

benchmark API.

However, the reality begs to differ: in many cases, you want the

state-bearing objects to have *different* shareability domains. E.g. in many

concurrent benchmarks, I want to have the shared state which holds my

concurrent primitive to test, and a distinct state which keeps my scratch

data.

In JMH, it forces you to introduce @State:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java

...together with some clean way of injecting the state objects into the run,

since the default benchmark object is not the appropriate substitute (can't be

both shared and distinct).

H. Multi-threaded setup/teardown

States often require setup and teardown. It gets interesting for two

reasons: 1) in many cases, you don't want any non-worker thread to touch the

state object, and let only the worker threads to setup/teardown state objects,

like in the cases where you initialize thread-local structures or otherwise

care about NUMA and locality -- this calls for tricky lazy init schemes;

2) in many cases, you have to call setup/teardown on shared objects, which

means you need to synchronize workers, and you can't do that on hot-paths

with blocking the worker threads (schedulers kick in and ruin everything) -- this

calls for tricky busy-looping concurrency control.

Fortunately, it can be completely hidden under the API, like in JMH:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_06_FixtureLevel.java

I. False-god-damned-sharing

And of course, after you done with all the API support for multi-threaded

benchmarks, you have to dodge some new unfortunate effects.

False-god-damned-sharing included. The non-extensive list where we got the

false sharing, and it affected our results is: 1) can't afford false sharing

on the "terminate" flag, which can be polled every nanosecond; 2) can't

afford false sharing in blackholes, because you deal with nanosecond-scale

events there; 3) can't afford false sharing in state objects, because you

know why; 4) can't afford false sharing in any other control structure which

is accessed by worker threads.

In JMH, we did a lot, scratch that, *A LOT* to avoid false sharing in the

infra code. As well as we automatically pad the state objects providing at

least some level of protection for otherwise oblivious users.

J. Asymmetric benchmarks

Now that you take a breath after working hard dealing with all these issues,

you have to provide the support for the benchmarks which are asymmetric. I.e.

in the same run, you might want to have the benchmark methods executing

_different_ chunks of code, and measure them _distinctly_. Working example is

Nitsan's queuing experiments:

http://psy-lob-saw.blogspot.ru/2013/12/jaq-spsc-latency-benchmarks1.html

...but let me instead show the JMH example:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_15_Asymmetric.java

K. Inlining

The beast of the beasts: for many benchmarks, the performance differences

can only be explained by the inlining differences, which broke/enabled some

additional compiler optimizations. Hence, playing nice with the inliner is

essential for benchmark harness. Again, pushing users to deal with this

completely on their own is cruel, and we can ease their pain a bit.

JMH does two things: 1) It peels the hottest measurement loop in a separate

method, which provides the entry point for compilation, and the inlining

budget starts there; 2) @CompilerControl annotation to control inlining

in some known places (@GMB and Blackhole methods are forcefully inlined these

days, for example).

Of course, we have a sample for that:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java

BOTTOM-LINE:

----------------------------------

The benchmarking harness business is very hard, and very non-obvious. My own

experience tells me even the smartest people make horrible mistakes in them,

myself included. We try to get around that by fixing more and more things

in JMH as we discover more, even if that means significant API changes.

Please do not trust the names behind the projects: whether it's Google or

Oracle -- the only thing matters is whether the projects are up to technical

challenges they face.

The job for a benchmark harness it to provide reliable benchmarking

environment. It could go further than that (up to the point harness can

<strike>read mail</strike> submit results to GAE), but it is only prudent

if it does its primary job done.

The issues above explain why I get all amused when people bring up trivial

things like IDE support and/or the ability to draw the graphs as the

deal-breaker things for benchmark harness choices. It's like looking at the

cold fusion reactor and deciding to run the the coal power plant instead,

because the fusion reactor has an ugly shape, and painted in the color you

don't particularly like.

-Aleksey.

Martin Thompson

unread,

Feb 1, 2014, 7:44:54 AM2/1/14

to mechanica...@googlegroups.com

As someone who has written a lot of benchmarks over the years and made a lot of mistakes that result in character building experiences when they are bluntly, but rightly, pointed out by JVM engineers :-)

I'm finding that JMH is becoming my tool of choice. The more I use it the more I'm impressed by how it is correct. I've seen too many cases with my own benchmarks, and the likes of Caliper, were code got optimised to the point of being misleading because of things like loop unrolling, dead code elimination, or de-optimisations resulting in megamorphic dispatch.

However JMH could improve on the usability front. For example, it could be provided as a downloadable JAR and not require Maven to build it :-) Maybe even have an Ant task with ability to fail a build on a configurable deviation from a baseline. Performance testing needs to be part of continuous delivery pipeline so we have continuous performance profiling and testing.

Martin...

Aleksey Shipilev

unread,

Feb 1, 2014, 7:48:12 AM2/1/14

to mechanica...@googlegroups.com

Thanks Martin,

суббота, 1 февраля 2014 г., 16:44:54 UTC+4 пользователь Martin Thompson написал:

However JMH could improve on the usability front. For example, it could be provided as a downloadable JAR and not require Maven to build it :-)

JMH is at Maven Central for a few months now, see the update on OpenJDK page:

http://openjdk.java.net/projects/code-tools/jmh/

(you can even ask the Maven archetype to generate the benchmark project for you)

Maybe even have an Ant task with ability to fail a build on a configurable deviation from a baseline. Performance testing needs to be part of continuous delivery pipeline so we have continuous performance profiling and testing.

I will gladly accept such the task in mainline JMH workspace, subject to due OpenJDK contribution process :)

-Aleksey.

Norman Maurer

unread,

Feb 1, 2014, 7:48:33 AM2/1/14

to Martin Thompson, mechanica...@googlegroups.com

You could grab the jar just from here:

http://central.maven.org/maven2/org/openjdk/jmh/

Martin...

--
Norman Maurer

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Martin Thompson

unread,

Feb 1, 2014, 10:21:16 AM2/1/14

to mechanica...@googlegroups.com

Maybe I need some sort of treatment to overcome the adverse reaction I seem to have to Maven. :-) I've just seen way way too many projects end up in library bloat as mvn makes it so easy to download and then depend on half the Internet. We have roach motel semantics with locks; I feel maven turns your project into a roach motel for external dependencies. Oh how I dream of simpler days and makefiles... I've good reasons to really dislike Maven. If you ever had the misfortune of having no choice but to script within it with Jelly you will know the pain as just one example.

BTW thanks for the great work on JMH! I'd be happy to help integrate it with CI pipelines when I get the time.

Marshall Pierce

unread,

Feb 1, 2014, 1:26:54 PM2/1/14

to mechanica...@googlegroups.com, Martin Thompson

I have a general dislike of maven, though perhaps not as vehement as
Martin's. :)

This thread reminded me of how sad it made me to have to use maven the
last time I made a JMH project, so I threw together a demo project
showing how to use JMH with Gradle:
https://bitbucket.org/marshallpierce/gradle-jmh-demo

A Jenkins plugin that could enforce performance thresholds (and perhaps
generate pretty historical graphs) would be great. Maybe next weekend.

-Marshall

On 02/01/2014 04:48 AM, Norman Maurer wrote:
>
> An 1. Februar 2014 at 13:44:55, Martin Thompson (mjp...@gmail.com

> <mailto://mjp...@gmail.com>) schrieb:

Kirk Pepperdine

unread,

Feb 1, 2014, 3:01:50 PM2/1/14

to mechanica...@googlegroups.com

On Feb 1, 2014, at 4:21 PM, Martin Thompson <mjp...@gmail.com> wrote:

Maybe I need some sort of treatment to overcome the adverse reaction I seem to have to Maven. :-) I've just seen way way too many projects end up in library bloat as mvn makes it so easy to download and then depend on half the Internet. We have roach motel semantics with locks; I feel maven turns your project into a roach motel for external dependencies. Oh how I dream of simpler days and makefiles... I've good reasons to really dislike Maven. If you ever had the misfortune of having no choice but to script within it with Jelly you will know the pain as just one example.

+1000000

BTW thanks for the great work on JMH! I'd be happy to help integrate it with CI pipelines when I get the time.

On Saturday, 1 February 2014 12:48:12 UTC, Aleksey Shipilev wrote:
Thanks Martin,

суббота, 1 февраля 2014 г., 16:44:54 UTC+4 пользователь Martin Thompson написал:
However JMH could improve on the usability front. For example, it could be provided as a downloadable JAR and not require Maven to build it :-)

JMH is at Maven Central for a few months now, see the update on OpenJDK page:
http://openjdk.java.net/projects/code-tools/jmh/

(you can even ask the Maven archetype to generate the benchmark project for you)

Maybe even have an Ant task with ability to fail a build on a configurable deviation from a baseline. Performance testing needs to be part of continuous delivery pipeline so we have continuous performance profiling and testing.

I will gladly accept such the task in mainline JMH workspace, subject to due OpenJDK contribution process :)

-Aleksey.

Henri Tremblay

unread,

Feb 1, 2014, 3:52:28 PM2/1/14

to mechanica...@googlegroups.com

I tend to agree. Except that Maven isn't using Jelly since Maven 2 which went out years ago. But I can understand that the pain is still vivid (hey, let's do a programming language in xml!)

--

Martin Grajcar

unread,

Feb 1, 2014, 4:12:32 PM2/1/14

to mechanica...@googlegroups.com

Actually, programming in XML is very productive... when your metrics is the line count.

Back to the topic: I tried to convert one simple benchmark[1] from Caliper to JMH only to find out that there's nothing like com.google.caliper.Param. This is sort of showstopper when you want to measure how the branching probability influences the timing. Or is there some simple workaround?

[1]: http://stackoverflow.com/questions/19689214/strange-branching-performance

Aleksey Shipilev

unread,

Feb 1, 2014, 4:21:03 PM2/1/14

to mechanica...@googlegroups.com

воскресенье, 2 февраля 2014 г., 1:12:32 UTC+4 пользователь Martin Grajcar написал:

Back to the topic: I tried to convert one simple benchmark[1] from Caliper to JMH only to find out that there's nothing like com.google.caliper.Param. This is sort of showstopper when you want to measure how the branching probability influences the timing. Or is there some simple workaround?

We resist supporting first-class @Params in JMH, because it opens the significant can of worms (interaction with @State-s and asymmetric benchmarks, generally non-trivial logic of traversing the parameter space, representing parameters in stable machine-readable formats, overriding parameters from the command line, etc.). The "workaround" we have in JMH, or rather, the recommended way of doing this kind of thing is using JMH API, like in: https://github.com/shipilev/article-exception-benchmarks/blob/master/src/main/java/net/shipilev/perf/exceptions/ExceptionsVsFlagsBench.java (see Main there).

-Aleksey.

Aleksey Shipilev

unread,

Feb 1, 2014, 4:27:06 PM2/1/14

to mechanica...@googlegroups.com

воскресенье, 2 февраля 2014 г., 1:21:03 UTC+4 пользователь Aleksey Shipilev написал:

воскресенье, 2 февраля 2014 г., 1:12:32 UTC+4 пользователь Martin Grajcar написал:
Back to the topic: I tried to convert one simple benchmark[1] from Caliper to JMH only to find out that there's nothing like com.google.caliper.Param. This is sort of showstopper when you want to measure how the branching probability influences the timing. Or is there some simple workaround?

We resist supporting first-class @Params in JMH, because it opens the significant can of worms (interaction with @State-s and asymmetric benchmarks, generally non-trivial logic of traversing the parameter space, representing parameters in stable machine-readable formats, overriding parameters from the command line, etc.). The "workaround" we have in JMH, or rather, the recommended way of doing this kind of thing is using JMH API, like in: https://github.com/shipilev/article-exception-benchmarks/blob/master/src/main/java/net/shipilev/perf/exceptions/ExceptionsVsFlagsBench.java (see Main there)

...or get suffixed benchmarks, like: https://github.com/shipilev/article-exception-benchmarks/blob/master/src/main/java/net/shipilev/perf/exceptions/StackTraceConstructBench.java; some people find it more convenient.

-Aleksey.

Georges Gomes

unread,

Feb 1, 2014, 4:27:58 PM2/1/14

to mechanica...@googlegroups.com

Hi Aleksey,

Interesting thank you.

Is that also how you recommend to run the same bench through multiple implementations of the same interface?

I couldn't run the inheritance sample in anyway... copy/paste works obviously but it's super painful and error prone.

Many thanks

GG

--

Aleksey Shipilev

unread,

Feb 1, 2014, 4:32:06 PM2/1/14

to mechanica...@googlegroups.com

Hi Georges,

воскресенье, 2 февраля 2014 г., 1:27:58 UTC+4 пользователь Georges Gomes написал:

Interesting thank you.
Is that also how you recommend to run the same bench through multiple implementations of the same interface?
I couldn't run the inheritance sample in anyway... copy/paste works obviously but it's super painful and error prone.

This does not work for you? http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_24_Inheritance.java

Otherwise, we sometimes build the benchmarks which read String property and instantiate proper implementation in @Setup, like: https://github.com/shipilev/dbpools-bench/blob/master/src/main/java/org/sample/Benchmark1Ex.java

-Aleksey.

Georges Gomes

unread,

Feb 1, 2014, 4:55:07 PM2/1/14

to mechanica...@googlegroups.com

Thanks for the reply.

No I can make the JMHSample_24_Inheritance.java run.

It just don't process it up to the benchmark files in manifest.

If you think it should make the job I will dig into it and provide feedback.

The property technic is interesting but I think people would appreciate self contained benchmarks.

I appreciate to call "jmh ^queues.*spsc.*" to run all benchmarks of queues spsc (for exemple).

Having to call main() makes some benchmark not runnable like other...

It doesn't matter for benchmarks that your are running during development of a particular piece.

But, like Martin, we are trying to integrate performance benchmarking (in other words JMH) in our development process and having a single way to call them, self contained is important for the build process and CI automation. So the convention for declaring and running jmh benchs are important.

This been said, I have been using JMH intensively in the past few weeks and I'm impressed with the quality and stability of results. So many things are done right. It's difficult (impossible) to go back.

My favorite detail that makes a lot of difference for a multi-thread bench: the sync mode

that "warmup" threads and only measure during "true" concurrent processing.

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_17_SyncIterations.java

JMH Examples are great, except the Sample_24_Inheritance that doesn't work :)

Just kidding, must be me some how :)

Cheers

GG

--

Aleksey Shipilev

unread,

Feb 1, 2014, 5:02:34 PM2/1/14

to mechanica...@googlegroups.com

воскресенье, 2 февраля 2014 г., 1:55:07 UTC+4 пользователь Georges Gomes написал:

Thanks for the reply.
No I can make the JMHSample_24_Inheritance.java run.

It just don't process it up to the benchmark files in manifest.
If you think it should make the job I will dig into it and provide feedback.

Please get on jmh-dev: http://mail.openjdk.java.net/mailman/listinfo/jmh-dev, and we can follow up.

It doesn't matter for benchmarks that your are running during development of a particular piece.
But, like Martin, we are trying to integrate performance benchmarking (in other words JMH) in our development process and having a single way to call them, self contained is important for the build process and CI automation. So the convention for declaring and running jmh benchs are important.

I agree, but there are technicalities about @Param that make them hard to implement. Last time I tried almost a year ago, maybe it's time to try again.

-Aleksey.

tm jee

unread,

Feb 1, 2014, 7:01:20 PM2/1/14

to mechanica...@googlegroups.com

Hi guys,

what about

http://latencyutils.github.io/

It is pretty good as well.

Georges Gomes

unread,

Feb 2, 2014, 1:51:56 AM2/2/14

to mechanica...@googlegroups.com

Hi

Latencyutils is a great tool but it's only measuring latency (and correcting it as well).

You still need to write the benchmark. And that's where things are difficult to get right.

(just look at Aleksey's comments)

Gil will comment better than a I do but, in my point of view, LatencyUtils are more targeting measurement in live or simulated environments.

My colleague Jean-Philippe Bempel would say: "That's the only absolute truth!"

But, during the optimization process, measuring around a small piece of the code is more convenient.

That's were JMH and Caliper are helpful.

This been said, I do agree with Jean-Philippe, and a "real-life" full end-to-end benchmark is mandatory at the end or periodically.

Cheers

GG

--

Georges Gomes

unread,

Feb 2, 2014, 2:01:50 AM2/2/14

to mechanica...@googlegroups.com

Privately

Thanks for your work. JMH is great.

Regading @Param, if it's a hard problem fro you then I'm useless :)

But if I can help in anyway, test beta, write samples, etc...

Just let me know

Kind regards

GG

--

Georges Gomes

unread,

Feb 2, 2014, 2:02:39 AM2/2/14

to mechanica...@googlegroups.com

(privately failed! haha)

ymo

unread,

Feb 2, 2014, 11:52:21 PM2/2/14

to mechanica...@googlegroups.com

Wonder if anyone here used http://www.faban.org ? used to be a sun tool IIRC.

Aleksey Shipilev

unread,

Feb 16, 2014, 9:41:22 AM2/16/14

to mechanica...@googlegroups.com

On 02/02/2014 02:02 AM, Aleksey Shipilev wrote:
> It doesn't matter for benchmarks that your are running during
> development of a particular piece.
> But, like Martin, we are trying to integrate performance
> benchmarking (in other words JMH) in our development process and
> having a single way to call them, self contained is important for
> the build process and CI automation. So the convention for declaring
> and running jmh benchs are important.
>
>
> I agree, but there are technicalities about @Param that make them hard
> to implement. Last time I tried almost a year ago, maybe it's time to
> try again.

...and somewhat 2 weeks later, here's the basic support for @Params in
JMH:
http://mail.openjdk.java.net/pipermail/jmh-dev/2014-February/000453.html

-Aleksey.

Chris Vest

unread,

Feb 16, 2014, 12:26:00 PM2/16/14

to mechanica...@googlegroups.com

What if the thing I want to parameterise is the number of benchmark threads. Say I have a concurrent data structure, and I want to measure how different levels of concurrent access influence performance.

Cheers,

Chris

Aleksey Shipilev

unread,

Feb 16, 2014, 1:24:32 PM2/16/14

to mechanical-sympathy

Remember I was telling about the "can of worms"? Here you go.

Use the API then, Luke, that's the swiss-army knife.

@Param is just the convenient short-cut.

-Aleksey.

Peter Hughes

unread,

Feb 18, 2014, 5:58:19 PM2/18/14

to mechanica...@googlegroups.com

For what it's worth, as a relative newcomer to the field, using Maven I was able to get a usable JMH project running from scratch in less than a minute or two. Granted, it didn't do much of anything, but I felt like a dissenting opinion should be offered ;)

JMH itself is a dream to use; the code samples are truly excellent for getting to grips with how to approach various scenarios. The only thing that has proved less-than-excellent so far is hunting down particular annotations. For instance, I only discovered @OperationsPerInvocation after looking at Nitsan's JAQ benchmarks - a central documentation of these would prove very useful (if one already exists, then apologies, although I couldn't find any mentioned on the JMH homepage)

- Peter

Aleksey Shipilev

unread,

Feb 18, 2014, 6:04:40 PM2/18/14

to mechanica...@googlegroups.com

On 02/19/2014 02:58 AM, Peter Hughes wrote:
> The only thing that has proved less-than-excellent so far is hunting
> down particular annotations. For instance, I only discovered
> @OperationsPerInvocation after looking at Nitsan's JAQ benchmarks - a
> central documentation of these would prove very useful (if one
> already exists, then apologies, although I couldn't find any
> mentioned on the JMH homepage)

We were thinking the samples will gradually introduce all the useful
annotations. But I agree, Javadocs should be published somewhere. It
will take some time to figure out...

Meanwhile, it seems useful to link the annotation folder:
http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-core/src/main/java/org/openjdk/jmh/annotations/

-Aleksey.

Rüdiger Möller

unread,

Apr 22, 2015, 3:45:24 PM4/22/15

to mechanica...@googlegroups.com

What makes me shy away from using JMH more often is the ceremony required to run a bench (setup maven project & stuff). It really would make a difference if I would be able to quickly run benchmarks from within the IDE ad hoc & quick similar to how unittests can be run from IntelliJ. A simple entry point like JMH.runTest( Class, method, [options] ) would be great :-)

Aleksey Shipilev

unread,

Apr 22, 2015, 3:51:44 PM4/22/15

to mechanical-sympathy

But wait, there is a section "IDE Support" on JMH page...

There is also a link to IDEA plugin at the bottom of the same page...

And every JMH sample has the runnable main() method that is directly invoke-able...

Hm.

-Aleksey.

--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/m4opvy4xq3U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rüdiger Möller

unread,

Apr 22, 2015, 7:36:39 PM4/22/15

to mechanica...@googlegroups.com

Uuhh .. I admit I haven't looked into JMH for some time :-). Will have myself an update ...

To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.

pron

unread,

Apr 24, 2015, 9:08:06 AM4/24/15

to mechanica...@googlegroups.com

One more point in favor of JMH: its pluggable profilers, and, in particular, the awesome perfasm. It recently helped us pinpoint and fix some unnecessary cache misses in a hot data structure.

On Saturday, February 1, 2014 at 2:19:40 PM UTC+2, Aleksey Shipilev wrote:

Full disclosure: I work for Oracle, and do Java performance work in
OpenJDK. I also develop and maintain JMH, and JMH is my 4-th (I think)
benchmark harness. Hence, my opinion is biased, and I try to stay objective
because we've been in Caliper's shoes...

Disclaimer: I am not the only maintainer and developer for JMH. It was
developed with heavy contributions from both JRockit (where it came
originally) and HotSpot performance teams. Hence, when I say "we", I mean
many JMH contributors.

IMO, Caliper is not as bad for large benchmarks. In fact, Caliper feels just
like pre-JMH harnesses we had internally in Sun/BEA. And that is not a
coincidence, because Caliper benchmark interface is very intuitive and an
obvious one. The sad revelation that cas upon me over previous several
years is that the simplicity of benchmark APIs does not correlate with
benchmark reliability.

I don't follow Caliper development, and I'm not in position to bash Caliper,
so instead of claiming anything that Caliper does or does not do, let me
highlight the history of JMH redesigns over the years. That should help
to review other harnesses, since I can easily say "been there, tried that,
it's broken <in this way>". Most of the things can even be guessed from the
API choices the harness makes. If API can not provide the instruments to
avoid a pitfall, then it is very probable harness makes no moves to avoid it
(except for the cases where magic dust is involved).

I tend to think this is a natural way for a benchmark harness to evolve, and
you can map this timeline back to your favorite benchmark harness. The
pitfalls are many and tough, the non-extensive "important list" is as follows:

A. Dynamic selection of benchmarks.

Since you don't know at "harness" compile time what benchmarks it would run,
the obvious choice would be calling the benchmark methods via Reflection.
Back in the days, this pushed us to accept the same "repetition" counter in
the method to amortize the reflective costs. This already introduces the
major pitfall about looping, see below.

But infrastructure-wise, harness then should intelligently choose the
repetition count. This almost always leads to calibrating mechanics, which
is almost always broken when loop optimizations are in effect. If one
benchmark is "slower" and autobalances with lower reps count, and another
benchmark is "faster" and autobalances with higher reps count, then
optimizer have more opportunity to optimize "faster" benchmark even further.
Which departs us from seeing how exactly the benchmark performs and
introduces another (hidden! and uncontrollable!) degree of freedom.

In retrospect, the early days decision of JMH to generate synthetic
benchmark code around the method, which contains the loop (carefully chosen
by us to avoid the optimizations in current VMs -- separation on concerns,
basically), is paying off *very* nicely. We can then call that synthetic
stub via Reflection without even bothering about the costs.

...That is not to mention users can actually review the generated benchmark
code looking for the explanations for the weird effects. We do that
frequently as the additional control.

B. Loop optimizations.

This is by far my major shenanigan about almost every harness. What is the
usual answer to "My operation is very small, and timers' granularity/latency
is not able to catch the effect"? My, yes, of course, warp it in the indexed-loop.
This mistake is painfully obvious, and real pain-in-the-back to prevent. We
even have the JMH sample to break the habit of people coming to build the
same style benchmarks:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_11_Loops.java

(BTW, the last time I tried Caliper a few years ago, it even refused to run
when calibration says the running time does not change when changing the
reps count. Well, THANK YOU, but I really WANT to run that benchmark!)

C. Dead-code elimination.

This is my favorite pet-peeve. It is remarkably hard to introduce the
side-effect to the benchmark which is both reliable and low-overhead.
Low-overhead parts really require the JVM expertise to get right, and
pushing that on to users is very, very dumb. JMH's Blackhole classes took
a significant amount of our time to implement correctly, and we still doing
the tunings here and there to minimize their costs [to the extreme we are
thinking about the proper VM interface to consume the values]. Remarkably,
we can hide all that complexity behind the simple user interface, and let
users concentrate on their workloads. This is what good harnesses do.

Examples:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.java
http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.java

The usual ways to deal with DCE are broken in subtle ways:

a) Returning the value from the reflective call: JIT inflates the
reflective call, and inlines it as usual Java method, DCE ensues.

b) Writing the values in the fields: doing that in the loop means runtime
can only write the latest value, and DCE anything else; storing Objects in
fields usually entail GC store barriers; storing fields usually entail false
sharing with some other important data...

c) Accumulate the values in locals and print them: still allows loop
pipelining and partial DCE-ing; and also, good luck with Objects!

You might want to investigate which one your favorite harness is using.

D. Constant foldings

As much as dead-code elimination is a buzzword in benchmarking community,
the symmetric effect is mostly overlooked. That is, DCE works by eliminating
the part of program graph because of the unclaimed outputs. But there is
also the optimization that eliminates the part of program graph because of
the predictable inputs. This JMH sample demonstrates the effect:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_10_ConstantFold.java

Avoiding this issue again requires JVM expertise, and it is cruel to push
users to do that. It takes a very careful design of benchmark loop to break the
load coalescing across loop iterations, when you *also* want to provide low
overhead for fine-grained benchmarks. We spent considerable amount of time
tuning up the measurement loops (and this is transparent for JMH users,
because you "just" recompile the benchmark code, and the new synthetic code
is being generated, voila).

When harness asks users to create benchmark loop on their own, it pushes
users to deal with the issue on their own as well. I can count the people
who have time, courage, and expertise to do this kind of code on the fingers
of one hand.

E. Non-throughput measures

Now, when the payload is wrapped in the benchmark loop, it seems impossible
to collect any non-throughput metrics. The two most significant that we
learned through our internal JMH uses are: sampling the execution time, and
single-shot measurements.

Measuring the individual timings is very tough, because timer overheads can
be very painful, and there is also coordinated omission tidbits, yada-yada...
That is, without a smart scheme that samples only *some* invocations, you
will mostly drown in the timing overheads. It turns out, sampling is rather
easy to implement with harness which *already* generates the synthetic code.
This is why JMH's support for SampleTime was so clean and easy to implement.
(Success story: measuring FJP latencies on JDK 8 Streams)

Measuring the single-invocation timings is needed for warmup studies: what's
the time to invoke the payload "in cold"? Again, once you generate the code
around the benchmark, it is easy to provide the proper timestamping. When
your harness implements multiple forks, it is very easy to have thousands
of "cold" invocations without leaving your coffee cold. What if your harness
requires reps count and requires calibration? Forget it.

The second-order concern is to provide the clean JVM environment for this
kind of run. In JMH, there is a separation between host JVM and the forked
JVM, where most of the heavy infrastructural heavy-lifting like regexp
matching, statistics, printing, etc is handled in the host VM. The forked VM
fast-pathes to "just" measure, not contaminating itself with most infra
stuff. This makes SingleShot benchmark modes very convenient in JMH.
(Success story: JDK 8 Lambda linkage/capture costs, and also JSR 292 things)

See the examples here:
http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_02_BenchmarkModes.java

It is educational to compile the benchmarks and look for the generated code
to see the loops we are generating for them (target/generated-sources/...)

F. Synchronize iterations

Everything significantly complicates when you start to support
multi-threaded benchmarks. It is *not* enough to shove in the executor and
run the benchmark in multiple threads. The simplest issue everyone overlooks
is that starting/stopping threads is not instantaneous, and so you need to
care if all your worker threads are indeed started. More in this JMH example:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_17_SyncIterations.java

Without this, most of heavily-threaded benchmarks are way, way off the
actual results. We routinely seen >30% difference prior introducing this
kind of workaround. The only other harness I know doing this is SPECjvm2008.

G. Multi-threaded sharing

Multi-threaded benchmarks are also interesting because they introduce
sharing. It is tempting to "just" make the benchmark object either shared
between the worker threads, or allocate completely distinct objects for
each worker thread. That's the obvious way to introduce sharing in the
benchmark API.

However, the reality begs to differ: in many cases, you want the
state-bearing objects to have *different* shareability domains. E.g. in many
concurrent benchmarks, I want to have the shared state which holds my
concurrent primitive to test, and a distinct state which keeps my scratch
data.

In JMH, it forces you to introduce @State:
http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java

...together with some clean way of injecting the state objects into the run,
since the default benchmark object is not the appropriate substitute (can't be
both shared and distinct).

H. Multi-threaded setup/teardown

States often require setup and teardown. It gets interesting for two
reasons: 1) in many cases, you don't want any non-worker thread to touch the
state object, and let only the worker threads to setup/teardown state objects,
like in the cases where you initialize thread-local structures or otherwise
care about NUMA and locality -- this calls for tricky lazy init schemes;
2) in many cases, you have to call setup/teardown on shared objects, which
means you need to synchronize workers, and you can't do that on hot-paths
with blocking the worker threads (schedulers kick in and ruin everything) -- this
calls for tricky busy-looping concurrency control.

Fortunately, it can be completely hidden under the API, like in JMH:
http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_06_FixtureLevel.java

I. False-god-damned-sharing

And of course, after you done with all the API support for multi-threaded
benchmarks, you have to dodge some new unfortunate effects.
False-god-damned-sharing included. The non-extensive list where we got the
false sharing, and it affected our results is: 1) can't afford false sharing
on the "terminate" flag, which can be polled every nanosecond; 2) can't
afford false sharing in blackholes, because you deal with nanosecond-scale
events there; 3) can't afford false sharing in state objects, because you
know why; 4) can't afford false sharing in any other control structure which
is accessed by worker threads.

In JMH, we did a lot, scratch that, *A LOT* to avoid false sharing in the
infra code. As well as we automatically pad the state objects providing at
least some level of protection for otherwise oblivious users.

J. Asymmetric benchmarks

Now that you take a breath after working hard dealing with all these issues,
you have to provide the support for the benchmarks which are asymmetric. I.e.
in the same run, you might want to have the benchmark methods executing
_different_ chunks of code, and measure them _distinctly_. Working example is
Nitsan's queuing experiments:

http://psy-lob-saw.blogspot.ru/2013/12/jaq-spsc-latency-benchmarks1.html

...but let me instead show the JMH example:

http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_15_Asymmetric.java

K. Inlining

The beast of the beasts: for many benchmarks, the performance differences
can only be explained by the inlining differences, which broke/enabled some
additional compiler optimizations. Hence, playing nice with the inliner is
essential for benchmark harness. Again, pushing users to deal with this
completely on their own is cruel, and we can ease their pain a bit.

JMH does two things: 1) It peels the hottest measurement loop in a separate
method, which provides the entry point for compilation, and the inlining
budget starts there; 2) @CompilerControl annotation to control inlining
in some known places (@GMB and Blackhole methods are forcefully inlined these
days, for example).

Of course, we have a sample for that:
http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.java

BOTTOM-LINE:
----------------------------------

The benchmarking harness business is very hard, and very non-obvious. My own
experience tells me even the smartest people make horrible mistakes in them,
myself included. We try to get around that by fixing more and more things
in JMH as we discover more, even if that means significant API changes.
Please do not trust the names behind the projects: whether it's Google or
Oracle -- the only thing matters is whether the projects are up to technical
challenges they face.

The job for a benchmark harness it to provide reliable benchmarking
environment. It could go further than that (up to the point harness can
<strike>read mail</strike> submit results to GAE), but it is only prudent
if it does its primary job done.

The issues above explain why I get all amused when people bring up trivial
things like IDE support and/or the ability to draw the graphs as the
deal-breaker things for benchmark harness choices. It's like looking at the
cold fusion reactor and deciding to run the the coal power plant instead,
because the fusion reactor has an ugly shape, and painted in the color you
don't particularly like.

-Aleksey.

Roland Deschain

unread,

Jul 23, 2015, 4:10:37 PM7/23/15

to mechanical-sympathy, moru...@gmail.com

You should also check ScalaMeter (http://scalameter.github.io). It works for both Scala and Java, and has some powerful features.

To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.

ymo

unread,

Jul 24, 2015, 4:40:21 PM7/24/15

to mechanical-sympathy, moru...@gmail.com, roland....@gmail.com

Can anyone familiar with with these say if it is possible to :

1) Generate a load in a very *deterministic* manner in these benchmark tools ?

2) When they are not deterministic (meaning suffering from coordinated omission) what do they do ?

I have not found a single benchmark tool so far that can claim to be deterministic in its load generation on the jvm.

Jin Mingjian

unread,

Jul 25, 2015, 1:19:14 AM7/25/15

to mechanica...@googlegroups.com

Rudiger, me too:) The ide support which still rely on maven integration way could not be accepted by some. I did a primary plain jar version when JMH initial coming. But Aleksey's diligent updates soon kill my idea. When recently back to my project, I find more goods done JMH and Aleksey. I plan to try if I can maintain a plain version of JMH again. let's see if Aleksey leave rooms for us:)

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Nitsan Wakart

unread,

Jul 27, 2015, 7:44:24 AM7/27/15

to mechanica...@googlegroups.com, moru...@gmail.com, roland....@gmail.com

JMH has one type of load which is "All-out", it is very deterministically all out.

You can construct a cost under load test using a per invocation @Setup method which sleeps/throttles the invocation, and use the sample benchmark mode to capture percentiles of random sampling measurements around the invocation (this is random omission, which is not an issue as it is not biased). The benchmark I just described will suffer from coordinated omission as there's no notion of schedule. Measurement has no 'intended' start time, so no correction for such is made.

As with all missing features from OSS projects this should be viewed as an opportunity for contribution rather than an issue ;-)

There's no generic "load generation on the JVM" tool that I know of.

Some domain specific load generators do tackle CO(e.g. YCSB post 0.2.0, Wrk2, Cassandra stress2, LDBC benchmarks for graph DBs, Gattling).

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Reply all

Reply to author

Forward