Simple and efficient bounded MPMC queue

Daniel Eloff

unread,

Jan 23, 2012, 3:24:47 PM1/23/12

to lock...@googlegroups.com

So I'm looking at Dmitriy's bounded MPMC queue. It is indeed as simple as it gets when it comes to a lock-free algorithm, the acid test being that it's simple enough that even I can understand it.

I noticed that if the CAS on enqueue_pos fails, the loop repeats, and re-checks the same cell it was just denied due to contention. If the other thread did not update sequence it can try to CAS the enqueue_pos again which will always fail. If the other did update sequence then it drops into the else clause which updates pos from enqueue_pos.

It will work, but why have that extra loop iteration under contention? Couldn't you just write it as:

for (;;)

{

size_t pos = enqueue_pos_.load(std::memory_order_relaxed);

cell = &buffer_[pos & buffer_mask_];

size_t seq = cell->sequence_.load(std::memory_order_acquire);

intptr_t dif = (intptr_t)seq - (intptr_t)pos;

if (dif == 0)

{

if (enqueue_pos_.compare_exchange_weak(pos, pos + 1, std::memory_order_relaxed))

break;

}

else if (dif < 0)

return false;

}

Which should always update pos after every CAS failure?

I may be missing something critical to my understanding of the algorithm.

Cheers,

Dan

Dmitriy V'jukov

unread,

Jan 23, 2012, 11:48:38 PM1/23/12

to lock...@googlegroups.com

C1x/C++11 compare_exchange does update pos (comperand) on failure.
Moreover, most likely it does it in a more efficient way than a naive
reload:
https://groups.google.com/d/msg/golang-nuts/zCk41s_RL3A/1AgR0e9gX2kJ

--
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Daniel Eloff

unread,

Jan 24, 2012, 3:24:13 PM1/24/12

to lock...@googlegroups.com

Ah, I understand. Unfortunately something difficult for me when reading c++ is mutable reference parameters - it's impossible to tell that they are actually out params without knowing the function signature. I like how the Google style guide makes out parameters pointers to make it more obvious to the reader.

Good job on this queue btw, I'm planning to use it in a work-stealing scheduler (which normally would use a faster SPMC queue) However, I want to allow for multiple producers so I can experiment with NUMA-aware scheduling. Schedule work on one of the threads on the processor closest to the data, preferentially steal work from other threads on the same processor and only when failing that steal from remote processors.

At least I think that should yield better results under heavy load than a randomized clik/tbb style algorithm.

Cheers,

Dan

Dmitriy V'jukov

unread,

May 14, 2012, 6:43:42 AM5/14/12

to lock...@googlegroups.com

That's a difficult question. With some help from end developer, I think, NUMA-aware scheduling will be faster/scalable.
Take a look at QuickThread library:
http://www.quickthreadprogramming.com/
You may find the author on Intel Software Network.

Reply all

Reply to author

Forward