Re: Question about your MPMC code

Skip to first unread message

Dmitry Vyukov

Nov 10, 2018, 8:24:41 PM11/10/18
On Sat, Nov 10, 2018 at 4:37 PM Guang Han <> wrote:
> Hi Dmitry,
> I came across your MPMC code on 1024 cores website. It is very well written with awesome performance. It has the best performance than any other MPMC queue i can find online. I have one quick question I hope you can help me with. Thanks!!
> Basically in the enqueue method, enqueue_pos_ is loaded again using relaxed order when diff is larger than 0, which indicates that its load at the beginning of the enqueue method (pos = enqueue_pos_.load(relaxed)) does not obtain the latest value of enqueue_pos_. How to ensure that the load inside the for loop will indeed return the latest value, what if it always returns the same value as the load at the beginning of the enqueue method? Then the thread will keep looping until the end of its scheduling quota?
> Thanks!
> Guang
> bool enqueue(T const& data)
> {
> cell_t* cell;
> size_t pos = enqueue_pos_.load(std::memory_order_relaxed);
> for (;;)
> {
> cell = &buffer_[pos & buffer_mask_];
> size_t seq = cell->sequence_.load(std::memory_order_acquire);
> intptr_t dif = (intptr_t)seq - (intptr_t)pos;
> if (dif == 0)
> {
> if (enqueue_pos_.compare_exchange_weak(pos, pos + 1, std::memory_order_relaxed))
> break;
> }
> else if (dif < 0)
> return false;
> else
> pos = enqueue_pos_.load(std::memory_order_relaxed);
> }
> cell->data_ = data;
> cell-> + 1, std::memory_order_release);
> return true;
> }

Hi Guang,

C/C++ standards guarantee that stores become visible to loads eventually:

C++, 1.10/25:
An implementation should ensure that the last value (in modification
order) assigned by an atomic or synchronization operation will become
visible to all other threads in a finite period of time.

In practice this is provided by cache coherence. All modern processors
are cache coherent, so a load can't observe stale data at all.

Dmitry Vyukov

Nov 10, 2018, 9:31:58 PM11/10/18
On Sat, Nov 10, 2018 at 6:16 PM Guang Han <> wrote:
> Thanks for your quick response, Dmitry! Do you mean that relaxed load can't observe stale data at all? I think sometimes it might observe stale data because it is a relaxed load and it may not get the latest value written by other threads(cores) right away.

Barriers/fences don't affect this in any way. If data has not yet
propagated no load will observe it, if it has propagated any load will
observe it. Barriers/fences affect only mutual ordering of memory
accesses, see:

> But i do agree with you that in practice I never see the following happens.
> "else
> pos = enqueue_pos_.load(std::memory_order_relaxed);"
> When you say "eventually" or "finite period of time", normally how long it will take for typical cache coherence algorithms? A couple of microseconds?

I think it's more like tens of nanoseconds.
And from some point of view it's actually immediate. Because it a
distributed system with finite speed of data propagation, event occurs
when you observe it by definition, there is no other way to define it
(think of relativity theory and speed of light). If you observe it, it
happened. If you don't see it, it did not happen yet.
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
Reply all
Reply to author
0 new messages