However JMH could improve on the usability front. For example, it could be provided as a downloadable JAR and not require Maven to build it :-)
Maybe even have an Ant task with ability to fail a build on a configurable deviation from a baseline. Performance testing needs to be part of continuous delivery pipeline so we have continuous performance profiling and testing.
You could grab the jar just from here:
Maybe I need some sort of treatment to overcome the adverse reaction I seem to have to Maven. :-) I've just seen way way too many projects end up in library bloat as mvn makes it so easy to download and then depend on half the Internet. We have roach motel semantics with locks; I feel maven turns your project into a roach motel for external dependencies. Oh how I dream of simpler days and makefiles... I've good reasons to really dislike Maven. If you ever had the misfortune of having no choice but to script within it with Jelly you will know the pain as just one example.
BTW thanks for the great work on JMH! I'd be happy to help integrate it with CI pipelines when I get the time.On Saturday, 1 February 2014 12:48:12 UTC, Aleksey Shipilev wrote:Thanks Martin,суббота, 1 февраля 2014 г., 16:44:54 UTC+4 пользователь Martin Thompson написал:However JMH could improve on the usability front. For example, it could be provided as a downloadable JAR and not require Maven to build it :-)JMH is at Maven Central for a few months now, see the update on OpenJDK page:(you can even ask the Maven archetype to generate the benchmark project for you)Maybe even have an Ant task with ability to fail a build on a configurable deviation from a baseline. Performance testing needs to be part of continuous delivery pipeline so we have continuous performance profiling and testing.I will gladly accept such the task in mainline JMH workspace, subject to due OpenJDK contribution process :)-Aleksey.
Back to the topic: I tried to convert one simple benchmark from Caliper to JMH only to find out that there's nothing like com.google.caliper.Param. This is sort of showstopper when you want to measure how the branching probability influences the timing. Or is there some simple workaround?
воскресенье, 2 февраля 2014 г., 1:12:32 UTC+4 пользователь Martin Grajcar написал:Back to the topic: I tried to convert one simple benchmark from Caliper to JMH only to find out that there's nothing like com.google.caliper.Param. This is sort of showstopper when you want to measure how the branching probability influences the timing. Or is there some simple workaround?We resist supporting first-class @Params in JMH, because it opens the significant can of worms (interaction with @State-s and asymmetric benchmarks, generally non-trivial logic of traversing the parameter space, representing parameters in stable machine-readable formats, overriding parameters from the command line, etc.). The "workaround" we have in JMH, or rather, the recommended way of doing this kind of thing is using JMH API, like in: https://github.com/shipilev/article-exception-benchmarks/blob/master/src/main/java/net/shipilev/perf/exceptions/ExceptionsVsFlagsBench.java (see Main there)
Interesting thank you.Is that also how you recommend to run the same bench through multiple implementations of the same interface?I couldn't run the inheritance sample in anyway... copy/paste works obviously but it's super painful and error prone.
Thanks for the reply.No I can make the JMHSample_24_Inheritance.java run.It just don't process it up to the benchmark files in manifest.If you think it should make the job I will dig into it and provide feedback.
It doesn't matter for benchmarks that your are running during development of a particular piece.But, like Martin, we are trying to integrate performance benchmarking (in other words JMH) in our development process and having a single way to call them, self contained is important for the build process and CI automation. So the convention for declaring and running jmh benchs are important.
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/m4opvy4xq3U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and all its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Full disclosure: I work for Oracle, and do Java performance work inOpenJDK. I also develop and maintain JMH, and JMH is my 4-th (I think)benchmark harness. Hence, my opinion is biased, and I try to stay objectivebecause we've been in Caliper's shoes...Disclaimer: I am not the only maintainer and developer for JMH. It wasdeveloped with heavy contributions from both JRockit (where it cameoriginally) and HotSpot performance teams. Hence, when I say "we", I meanmany JMH contributors.IMO, Caliper is not as bad for large benchmarks. In fact, Caliper feels justlike pre-JMH harnesses we had internally in Sun/BEA. And that is not acoincidence, because Caliper benchmark interface is very intuitive and anobvious one. The sad revelation that cas upon me over previous severalyears is that the simplicity of benchmark APIs does not correlate withbenchmark reliability.I don't follow Caliper development, and I'm not in position to bash Caliper,so instead of claiming anything that Caliper does or does not do, let mehighlight the history of JMH redesigns over the years. That should helpto review other harnesses, since I can easily say "been there, tried that,it's broken <in this way>". Most of the things can even be guessed from theAPI choices the harness makes. If API can not provide the instruments toavoid a pitfall, then it is very probable harness makes no moves to avoid it(except for the cases where magic dust is involved).I tend to think this is a natural way for a benchmark harness to evolve, andyou can map this timeline back to your favorite benchmark harness. Thepitfalls are many and tough, the non-extensive "important list" is as follows:A. Dynamic selection of benchmarks.Since you don't know at "harness" compile time what benchmarks it would run,the obvious choice would be calling the benchmark methods via Reflection.Back in the days, this pushed us to accept the same "repetition" counter inthe method to amortize the reflective costs. This already introduces themajor pitfall about looping, see below.But infrastructure-wise, harness then should intelligently choose therepetition count. This almost always leads to calibrating mechanics, whichis almost always broken when loop optimizations are in effect. If onebenchmark is "slower" and autobalances with lower reps count, and anotherbenchmark is "faster" and autobalances with higher reps count, thenoptimizer have more opportunity to optimize "faster" benchmark even further.Which departs us from seeing how exactly the benchmark performs andintroduces another (hidden! and uncontrollable!) degree of freedom.In retrospect, the early days decision of JMH to generate syntheticbenchmark code around the method, which contains the loop (carefully chosenby us to avoid the optimizations in current VMs -- separation on concerns,basically), is paying off *very* nicely. We can then call that syntheticstub via Reflection without even bothering about the costs....That is not to mention users can actually review the generated benchmarkcode looking for the explanations for the weird effects. We do thatfrequently as the additional control.B. Loop optimizations.This is by far my major shenanigan about almost every harness. What is theusual answer to "My operation is very small, and timers' granularity/latencyis not able to catch the effect"? My, yes, of course, warp it in the indexed-loop.This mistake is painfully obvious, and real pain-in-the-back to prevent. Weeven have the JMH sample to break the habit of people coming to build thesame style benchmarks:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_11_Loops.java(BTW, the last time I tried Caliper a few years ago, it even refused to runwhen calibration says the running time does not change when changing thereps count. Well, THANK YOU, but I really WANT to run that benchmark!)C. Dead-code elimination.This is my favorite pet-peeve. It is remarkably hard to introduce theside-effect to the benchmark which is both reliable and low-overhead.Low-overhead parts really require the JVM expertise to get right, andpushing that on to users is very, very dumb. JMH's Blackhole classes tooka significant amount of our time to implement correctly, and we still doingthe tunings here and there to minimize their costs [to the extreme we arethinking about the proper VM interface to consume the values]. Remarkably,we can hide all that complexity behind the simple user interface, and letusers concentrate on their workloads. This is what good harnesses do.Examples:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_08_DeadCode.javahttp://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_09_Blackholes.javaThe usual ways to deal with DCE are broken in subtle ways:a) Returning the value from the reflective call: JIT inflates thereflective call, and inlines it as usual Java method, DCE ensues.b) Writing the values in the fields: doing that in the loop means runtimecan only write the latest value, and DCE anything else; storing Objects infields usually entail GC store barriers; storing fields usually entail falsesharing with some other important data...c) Accumulate the values in locals and print them: still allows looppipelining and partial DCE-ing; and also, good luck with Objects!You might want to investigate which one your favorite harness is using.D. Constant foldingsAs much as dead-code elimination is a buzzword in benchmarking community,the symmetric effect is mostly overlooked. That is, DCE works by eliminatingthe part of program graph because of the unclaimed outputs. But there isalso the optimization that eliminates the part of program graph because ofthe predictable inputs. This JMH sample demonstrates the effect:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_10_ConstantFold.javaAvoiding this issue again requires JVM expertise, and it is cruel to pushusers to do that. It takes a very careful design of benchmark loop to break theload coalescing across loop iterations, when you *also* want to provide lowoverhead for fine-grained benchmarks. We spent considerable amount of timetuning up the measurement loops (and this is transparent for JMH users,because you "just" recompile the benchmark code, and the new synthetic codeis being generated, voila).When harness asks users to create benchmark loop on their own, it pushesusers to deal with the issue on their own as well. I can count the peoplewho have time, courage, and expertise to do this kind of code on the fingersof one hand.E. Non-throughput measuresNow, when the payload is wrapped in the benchmark loop, it seems impossibleto collect any non-throughput metrics. The two most significant that welearned through our internal JMH uses are: sampling the execution time, andsingle-shot measurements.Measuring the individual timings is very tough, because timer overheads canbe very painful, and there is also coordinated omission tidbits, yada-yada...That is, without a smart scheme that samples only *some* invocations, youwill mostly drown in the timing overheads. It turns out, sampling is rathereasy to implement with harness which *already* generates the synthetic code.This is why JMH's support for SampleTime was so clean and easy to implement.(Success story: measuring FJP latencies on JDK 8 Streams)Measuring the single-invocation timings is needed for warmup studies: what'sthe time to invoke the payload "in cold"? Again, once you generate the codearound the benchmark, it is easy to provide the proper timestamping. Whenyour harness implements multiple forks, it is very easy to have thousandsof "cold" invocations without leaving your coffee cold. What if your harnessrequires reps count and requires calibration? Forget it.The second-order concern is to provide the clean JVM environment for thiskind of run. In JMH, there is a separation between host JVM and the forkedJVM, where most of the heavy infrastructural heavy-lifting like regexpmatching, statistics, printing, etc is handled in the host VM. The forked VMfast-pathes to "just" measure, not contaminating itself with most infrastuff. This makes SingleShot benchmark modes very convenient in JMH.(Success story: JDK 8 Lambda linkage/capture costs, and also JSR 292 things)See the examples here:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_02_BenchmarkModes.javaIt is educational to compile the benchmarks and look for the generated codeto see the loops we are generating for them (target/generated-sources/...)F. Synchronize iterationsEverything significantly complicates when you start to supportmulti-threaded benchmarks. It is *not* enough to shove in the executor andrun the benchmark in multiple threads. The simplest issue everyone overlooksis that starting/stopping threads is not instantaneous, and so you need tocare if all your worker threads are indeed started. More in this JMH example:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_17_SyncIterations.javaWithout this, most of heavily-threaded benchmarks are way, way off theactual results. We routinely seen >30% difference prior introducing thiskind of workaround. The only other harness I know doing this is SPECjvm2008.G. Multi-threaded sharingMulti-threaded benchmarks are also interesting because they introducesharing. It is tempting to "just" make the benchmark object either sharedbetween the worker threads, or allocate completely distinct objects foreach worker thread. That's the obvious way to introduce sharing in thebenchmark API.However, the reality begs to differ: in many cases, you want thestate-bearing objects to have *different* shareability domains. E.g. in manyconcurrent benchmarks, I want to have the shared state which holds myconcurrent primitive to test, and a distinct state which keeps my scratchdata.In JMH, it forces you to introduce @State:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java...together with some clean way of injecting the state objects into the run,since the default benchmark object is not the appropriate substitute (can't beboth shared and distinct).H. Multi-threaded setup/teardownStates often require setup and teardown. It gets interesting for tworeasons: 1) in many cases, you don't want any non-worker thread to touch thestate object, and let only the worker threads to setup/teardown state objects,like in the cases where you initialize thread-local structures or otherwisecare about NUMA and locality -- this calls for tricky lazy init schemes;2) in many cases, you have to call setup/teardown on shared objects, whichmeans you need to synchronize workers, and you can't do that on hot-pathswith blocking the worker threads (schedulers kick in and ruin everything) -- thiscalls for tricky busy-looping concurrency control.Fortunately, it can be completely hidden under the API, like in JMH:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_06_FixtureLevel.javaI. False-god-damned-sharingAnd of course, after you done with all the API support for multi-threadedbenchmarks, you have to dodge some new unfortunate effects.False-god-damned-sharing included. The non-extensive list where we got thefalse sharing, and it affected our results is: 1) can't afford false sharingon the "terminate" flag, which can be polled every nanosecond; 2) can'tafford false sharing in blackholes, because you deal with nanosecond-scaleevents there; 3) can't afford false sharing in state objects, because youknow why; 4) can't afford false sharing in any other control structure whichis accessed by worker threads.In JMH, we did a lot, scratch that, *A LOT* to avoid false sharing in theinfra code. As well as we automatically pad the state objects providing atleast some level of protection for otherwise oblivious users.J. Asymmetric benchmarksNow that you take a breath after working hard dealing with all these issues,you have to provide the support for the benchmarks which are asymmetric. I.e.in the same run, you might want to have the benchmark methods executing_different_ chunks of code, and measure them _distinctly_. Working example isNitsan's queuing experiments:...but let me instead show the JMH example:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_15_Asymmetric.javaK. InliningThe beast of the beasts: for many benchmarks, the performance differencescan only be explained by the inlining differences, which broke/enabled someadditional compiler optimizations. Hence, playing nice with the inliner isessential for benchmark harness. Again, pushing users to deal with thiscompletely on their own is cruel, and we can ease their pain a bit.JMH does two things: 1) It peels the hottest measurement loop in a separatemethod, which provides the entry point for compilation, and the inliningbudget starts there; 2) @CompilerControl annotation to control inliningin some known places (@GMB and Blackhole methods are forcefully inlined thesedays, for example).Of course, we have a sample for that:http://hg.openjdk.java.net/code-tools/jmh/file/f2e982b7c51b/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_16_CompilerControl.javaBOTTOM-LINE:----------------------------------The benchmarking harness business is very hard, and very non-obvious. My ownexperience tells me even the smartest people make horrible mistakes in them,myself included. We try to get around that by fixing more and more thingsin JMH as we discover more, even if that means significant API changes.Please do not trust the names behind the projects: whether it's Google orOracle -- the only thing matters is whether the projects are up to technicalchallenges they face.The job for a benchmark harness it to provide reliable benchmarkingenvironment. It could go further than that (up to the point harness can<strike>read mail</strike> submit results to GAE), but it is only prudentif it does its primary job done.The issues above explain why I get all amused when people bring up trivialthings like IDE support and/or the ability to draw the graphs as thedeal-breaker things for benchmark harness choices. It's like looking at thecold fusion reactor and deciding to run the the coal power plant instead,because the fusion reactor has an ugly shape, and painted in the color youdon't particularly like.-Aleksey.
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.