MPSC + backoff arena

195 views
Skip to first unread message

Benjamin Manes

unread,
Apr 10, 2015, 3:25:28 AM4/10/15
to lock...@googlegroups.com
I thought I'd share an enhancement I made in my Java port of the "Non-intrusive MPSC node-based queue" [1].

I added a combining arena as a backoff strategy [2] based on the idea from an elimination stack [3] which cancels opposing operations when contention is detected. In the case of a queue, the arena is used by producers to transfer work as Dmitriy's algorithm can insert a sublist as easily as a single element. Another way of viewing this enhancement is as an adaption of the flat combining technique [4].

Adding an arena was a simple addition that adds no cost when uncontended, and doubles the throughput when contended.

Cheers,
Ben

Dmitry Vyukov

unread,
Apr 10, 2015, 3:28:17 AM4/10/15
to lock...@googlegroups.com
On Fri, Apr 10, 2015 at 7:51 AM, Benjamin Manes <ben....@gmail.com> wrote:
I thought I'd share an enhancement I made in my Java port of the "Non-intrusive MPSC node-based queue" [1].

I added a combining arena as a backoff strategy [2] based on the idea from an elimination stack [3] which cancels opposing operations when contention is detected. In the case of a queue, the arena is used by producers to transfer work as Dmitriy's algorithm can insert a sublist as easily as a single element. Another way of viewing this enhancement is as an adaption of the flat combining technique [4].

Adding an arena was a simple addition that adds no cost when uncontended, and doubles the throughput when contended.


Hi Benjamin,

This looks interesting. But where is the code? Or can you describe the details of the arena combining algorithm?
 

Benjamin Manes

unread,
Apr 10, 2015, 6:36:40 AM4/10/15
to lock...@googlegroups.com
The code in the linked github repo, more specifically at:

The queue can be constructed where either the producer waits until its element(s) are added or completes if handed off. I saw a few obvious ways to implement the busy waiting and chose the simplest because I think most usages of an mpsc queue will keep it small. If that assumption is wrong, the per-node field could be removed at the cost of adding complexity to the arena coordination logic.

I plan on reworking my EliminationStack to add combining. I don't particularly like the approach taken by the DECS paper (http://www.cs.bgu.ac.il/~hendlerd/papers/DECS.pdf). I think that I have a simpler scheme that will work better, and I'll try to implement that soon.

Dmitry Vyukov

unread,
Apr 10, 2015, 7:29:43 AM4/10/15
to lock...@googlegroups.com
On Fri, Apr 10, 2015 at 1:36 PM, Benjamin Manes <ben....@gmail.com> wrote:
> The code in the linked github repo, more specifically at:
> https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/SingleConsumerQueue.java

Nice!

The non-linearizable version probably does not implement the Queue
interface, because it is not a drop-in replacement for other
linearizable queues. Just a note.
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Scalable Synchronization Algorithms" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lock-free+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lock-free/27512a81-9c29-430c-b6cf-a534a36a908f%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Benjamin Manes

unread,
Apr 10, 2015, 7:46:53 AM4/10/15
to lock...@googlegroups.com
True, but since its specialized and warns about being less general purpose I think its an okay violation of the interface. Opt'ing in would be acknowledging the assumptions and made clear through a factory method.

I also made a slight change compared to the original algorithm. The producer will always emit a volatile store after it has appended to the queue. The original uses a lazy store, which other ports retain. Unfortunately in micro-benchmarks this results in heap explosion due to the lack of visibility disallowing the consumer to drain the queue. That's not a concern in applications, of course, but I decided to err on the safe side. The arena and other application behavior would probably make the lazy set optimization negligible.

Ben Manes

unread,
Apr 18, 2015, 11:08:55 AM4/18/15
to lock...@googlegroups.com
fyi,

I implemented a concurrent stack with elimination and combining backoff arena. The approach taken is much less complex than the one described in the DECS paper, and also simpler than my original elimination stack. It appears to perform quite well and the code is linked below. I think this is a pretty nice technique and wish it was more popular in concurrency libraries.



Reply all
Reply to author
Forward
0 new messages