Hi,
Benchmarking queues is a special exercice.
I highly recommend the blog of Nitsan Wakart.
In a nutshell, depending on the implementation of the queue, they are likely to be sensitive to certain situation. Queues always empty (consumer faster than producer) or queues always full (producer faster than consumer) are likely to create false sharing on the data of the queue. Similarly, queues perfectly balanced between producer and consumer could run faster because in ideal situation (very unlikely).
Rudiger Moller posted an extract of the Disruptor recently that summarize it well
Topic = Avoiding deadlocks in systems with bounded queues
Nice wrap up from disruptor documentation describing exactly the issues I had when fiddling with actors (bounded/unbounded Q based):
3.5 The Problems of Queues
Queues typically use either linked-lists or arrays for the underlying storage of elements. If an in-memory queue is allowed to be unbounded then for many classes of problem it can grow unchecked until it reaches the point of catastrophic failure by exhausting memory. This happens when producers outpace the consumers. Unbounded queues can be useful in systems where the producers are guaranteed not to outpace the consumers and memory is a precious resource, but there is always a risk if this assumption doesn’t hold and queue grows without limit. To avoid this catastrophic outcome, queues are commonly constrained in size (bounded). Keeping a queue bounded requires that it is either array-backed or that the size is actively tracked.
Queue implementations tend to have write contention on the head, tail, and size variables. When in use, queues are typically always close to full or close to empty due to the differences in pace between consumers and producers. They very rarely operate in a balanced middle ground where the rate of production and consumption is evenly matched. This propensity to be always full or always empty results in high levels of contention and/or expensive cache coherence. The problem is that even when the head and tail mechanisms are separated using different concurrent objects such as locks or CAS variables, they generally occupy the same cache-line.
The concerns of managing producers claiming the head of a queue, consumers claiming the tail, and the storage of nodes in between make the designs of concurrent implementations very complex to manage beyond using a single large-grain lock on the queue. Large grain locks on the whole queue for put and take operations are simple to implement but represent a significant bottleneck to throughput. If the concurrent concerns are teased apart within the semantics of a queue then the implementations become very complex for anything other than a single producer – single consumer implementation.
In Java there is a further problem with the use of queues, as they are significant sources of garbage. Firstly, objects have to be allocated and placed in the queue. Secondly, if linked-list backed, objects have to be allocated representing the nodes of the list. When no longer referenced, all these objects allocated to support the queue implementation need to be re-claimed.
In the
JAQ project of Nitsan, you will find some interesting benchmarks.
Specially this
one. Where you can tune the speed of consumer and producer and see effects on measurements.
Having made a lot of mistake myself in queue benchmarking, I will recommend to have a wide variety of benchmark testing different aspects and situations. And then look at how your changes affect all the benchmarks.
single threaded poll, offer, multi-threaded, concurrent, throughput, latency, etc... etc...
I have not contributed to JAQ yet but I'm planning to. I would encourage you to contribute to it because, ultimately, queues are something really basic but they require a lot of effort to get into high levels. Ultimately, we need that single place where we can get the right queues with high performance and where the benchmarking/testing is strong.
My 2 cents
Georges