Benchmarking with JMH

425 views
Skip to first unread message

tm jee

unread,
Feb 5, 2014, 9:28:00 PM2/5/14
to mechanica...@googlegroups.com
Hello guys, 

Trying to benchmark Queue's performance using JHM. Appreciate some heads up whether i'm heading down the correct path, Tia. 

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode({Mode.AverageTime})
@Measurement(iterations=10 ,time=1, timeUnit=TimeUnit.SECONDS)
@Warmup(iterations=5, time=1, timeUnit=TimeUnit.SECONDS)
@Fork(2)
public class JmhMpscQueueBenchmark {

    @State(Scope.Group)
    public static class SpscState {
        volatile int c = 0;
        Queue q;
        @Setup(Level.Trial)
        public void setUp() {
            q = ...  ; // instantiates Queue
        }
        @TearDown(Level.Trial)
        public void tearDown() {
            q = null;
        }
    }


    @GenerateMicroBenchmark
    @Group("mpsc_2p")
    @GroupThreads(2)
    public void mpsc_2p_offer(SpscState s, BlackHole blackHole) {
        blackHole.consume(s.q.offer(s.c++));
    }

    @GenerateMicroBenchmark
    @Group("mpsc_2p")
    @GroupThreads(1)
    public void mpsc_2p_poll(SpscState s, BlackHole blackHole) {
        blackHole.consume(s.q.poll());
    }

@GenerateMicroBenchmark
    @Group("mpsc_3p")
    @GroupThreads(3)
    public void mpsc_3p_offer(SpscState s, BlackHole blackHole) {
        blackHole.consume(s.q.offer(s.c++));
    }

    @GenerateMicroBenchmark
    @Group("mpsc_3p")
    @GroupThreads(1)
    public void mpsc_3p_poll(SpscState s, BlackHole blackHole) {
        blackHole.consume(s.q.poll());
    }

   // keep going with increasing @GroupThreads in consumer 
}


The general idea is to use @Group and @GroupThreads to simulate concurrent access. Any idea if this is good enough ?

Also wondering what does fellow mechanical sympathyzers used to unit test concurrency ? Suggestions would be nice.

Cheers


Georges Gomes

unread,
Feb 6, 2014, 4:55:36 PM2/6/14
to mechanica...@googlegroups.com
Hi,

Benchmarking queues is a special exercice.
I highly recommend the blog of Nitsan Wakart.

In a nutshell, depending on the implementation of the queue, they are likely to be sensitive to certain situation. Queues always empty (consumer faster than producer) or queues always full (producer faster than consumer) are likely to create false sharing on the data of the queue. Similarly, queues perfectly balanced between producer and consumer could run faster because in ideal situation (very unlikely).

Rudiger Moller posted an extract of the Disruptor recently that summarize it well
Topic = Avoiding deadlocks in systems with bounded queues

Nice wrap up from disruptor documentation describing exactly the issues I had when fiddling with actors (bounded/unbounded Q based):

3.5 The Problems of Queues
 
Queues typically use either linked-lists or arrays for the underlying storage of elements.  If an in-memory queue is allowed to be unbounded then for many classes of problem it can grow unchecked until it reaches the point of catastrophic failure by exhausting memory.  This happens when producers outpace the consumers.  Unbounded queues can be useful in systems where the producers are guaranteed not to outpace the consumers and memory is a precious resource, but there is always a risk if this assumption doesn’t hold and queue grows without limit.  To avoid this catastrophic outcome, queues are commonly constrained in size (bounded).  Keeping a queue bounded requires that it is either array-backed or that the size is actively tracked. 
Queue implementations tend to have write contention on the head, tail, and size variables.  When in use, queues are typically always close to full or close to empty due to the differences in pace between consumers and producers.  They very rarely operate in a balanced middle ground where the rate of production and consumption is evenly matched.  This propensity to be always full or always empty results in high levels of contention and/or expensive cache coherence.  The problem is that even when the head and tail mechanisms are separated using different concurrent objects such as locks or CAS variables, they generally occupy the same cache-line. 
The concerns of managing producers claiming the head of a queue, consumers claiming the tail, and the storage of nodes in between make the designs of concurrent implementations very complex to manage beyond using a single large-grain lock on the queue.  Large grain locks on the whole queue for put and take operations are simple to implement but represent a significant bottleneck to throughput.  If the concurrent concerns are teased apart within the semantics of a queue then the implementations become very complex for anything other than a single producer – single consumer implementation. 
In Java there is a further problem with the use of queues, as they are significant sources of garbage.  Firstly, objects have to be allocated and placed in the queue.  Secondly, if linked-list backed, objects have to be allocated representing the nodes of the list.   When no longer referenced, all these objects allocated to support the queue implementation need to be re-claimed. 

In the JAQ project of Nitsan, you will find some interesting benchmarks.
Specially this one. Where you can tune the speed of consumer and producer and see effects on measurements.

Having made a lot of mistake myself in queue benchmarking, I will recommend to have a wide variety of benchmark testing different aspects and situations. And then look at how your changes affect all the benchmarks.
single threaded poll, offer, multi-threaded, concurrent, throughput, latency, etc... etc...

I have not contributed to JAQ yet but I'm planning to. I would encourage you to contribute to it because, ultimately, queues are something really basic but they require a lot of effort to get into high levels. Ultimately, we need that single place where we can get the right queues with high performance and where the benchmarking/testing is strong.

My 2 cents
Georges




--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

tm jee

unread,
Feb 7, 2014, 5:57:02 PM2/7/14
to mechanica...@googlegroups.com
Thanks George. 

It'd be really nice if JMH samples contains benchmark for queue-like structure that benchmark cases you mentioned (always full, always consumed empty, and the middle sweet spot). Can't seems to find that in the samples, so i sort of spit out this simplistic benchmark out of my very limited 3hrs experience of JMH. The articles you provided are cool, especially Nitsan's one that include a delay. Nice.

Looking forward to see ya contribution to JHM mate.

Cheers.
Reply all
Reply to author
Forward
0 new messages