Bounded multiple producers multiple consumers queue for .NET

217 views
Skip to first unread message

Alexandr Nikitin

unread,
Sep 29, 2016, 6:55:26 AM9/29/16
to mechanical-sympathy
Hi all,

I've tried to port the the MPMC queue algorithm by Dmitry Vyukov to .NET. I would appreciate hearing your opinion on it.

Mendel Monteiro-Beckerman

unread,
Oct 5, 2016, 12:43:54 PM10/5/16
to mechanical-sympathy
You could type the queue and the Cell struct so that the user can transfer structs without boxing.  As ConcurrentQueue is also typed it would lead to a better comparison.

Also, in the dequeue you can cheat a bit and not call the Cell constructor by setting the fields of the Cell directly in the array (like you do in the enqueue)

Dan Eloff

unread,
Oct 5, 2016, 7:26:44 PM10/5/16
to mechanica...@googlegroups.com
One of the beautiful things about Dmitry's queue is you can easily make a MPSC SPMC or SPSC queue out of it with minor code changes. Instead of doing a compare exchange on enqueue/dequeue positions, you can just use regular loads and stores on the single producer/consumer side. This is possible because enqueue doesn't touch dequeue pos and visa versa.  Even although it isn't actually an SPSC queue design, it performs very competitively in that role. So you get 4 good bounded queue implementations for the price of one. With a little creativity you can even re-use the same class, parameterized with the enqueue/dequeue position implementation. Just careful not to use polymorphic code to accomplish that, unless you can establish that the compiler is able to compile it away. That's long been my Swiss Army Knife queue solution.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Avi Kivity

unread,
Oct 6, 2016, 3:00:51 AM10/6/16
to mechanica...@googlegroups.com

It's very interesting, but doesn't it suffer from blocking?  If producer P1 gets scheduled out between its two atomic writes, then all consumers will not be able to dequeue anything, even though other producers were successfully able to queue their data.  Maybe it's not an issue for throughput-oriented workloads.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Alexandr Nikitin

unread,
Oct 6, 2016, 3:54:07 AM10/6/16
to mechanica...@googlegroups.com
The initial intention was to implement the MPMC queue for reference types. I'm still playing with generic type version and value types version (mostly with struct sizes). 
Regarding the Cell constructor and cheating, I do that on purpose:
* in enqueue I access the array two times so that the write won't be reordered and dequeue won't see the sequence before the element.
* In dequeue the order doesn't matter, and basically the Cell constructor is translated to two movs

That part looks like this:
00007ffe`162b0b67 498bc8          mov     rcx,r8 ;read to result var
00007ffe`162b0b6a e891325f5f      call    clr!JIT_CheckedWriteBarrier (00007ffe`758a3e00)
00007ffe`162b0b6f 8d443701        lea     eax,[rdi+rsi+1] ;sequence = pos + bufferMask + 1
00007ffe`162b0b73 33d2            xor     edx,edx ;null
00007ffe`162b0b75 8903            mov     dword ptr [rbx],eax ;write sequence
00007ffe`162b0b77 48895308        mov     qword ptr [rbx+8],rdx ;write null

You can take a look at the assembly code here: https://github.com/alexandrnikitin/MPMCQueue.NET#assembly

Alexandr Nikitin

unread,
Oct 6, 2016, 4:09:19 AM10/6/16
to mechanical-sympathy
Yes, technically it's a blocking queue. If a producer managed to increment the enqueue position but didn't increment the sequence then all consumers will be blocked till the increment happens.
Any reliable way to measure nanoseconds latency? :)

Avi Kivity

unread,
Oct 6, 2016, 8:21:06 AM10/6/16
to mechanica...@googlegroups.com

Do you mean that the latency is bounded by a few nanoseconds?  That only happens if the producer thread is never preempted.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

Alexandr Nikitin

unread,
Oct 6, 2016, 8:34:17 AM10/6/16
to mechanical-sympathy
Yes, correct, the benchmark is synthetic. It takes ~50ns to enqueue and then dequeue an item on full throttle. Here's the benchmark results of the (worst) case: empty queue, always polling consumers, always pushing producers. https://github.com/alexandrnikitin/MPMCQueue.NET#benchmarks
I just didn't come up with a reliable benchmark for latency yet. It would be great to have it.
Reply all
Reply to author
Forward
0 new messages