Don't understand why this is a data race?

87 views
Skip to first unread message

Joe Best-Rotheray

unread,
Apr 2, 2016, 4:02:47 AM4/2/16
to Relacy Race Detector
Hi there,

I got this output when testing my bounded lock free MPMC queue:


As seen here:

  1. [81] 3: acquire fence, in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(71)
  2. [82] 3: <010419C0> load, value=01041BD4, in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(72)
  3. [83] 3: <01041BF4> load, value=0, in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(72)
  4. [84] 3: DATA RACE (data race detected), in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(72)
The data race seems to be about <01041BF4>, but earlier on when it is written:

[43] 0: <01041BF4> store, value=0, in LockFreeMPMCQueue<unsigned int,32,64>::try_enqueue, lockfreempmcqueue.h(37)

So the value being loaded is correct, I can't see why this is a race?

Any help would be much appreciated!

Cheers
Joe

Dmitry Vyukov

unread,
Apr 2, 2016, 4:09:15 AM4/2/16
to rel...@googlegroups.com
Hi Joe,

The store was done by thread 0, and the racing load is done by thread
3. These memory accesses are not synchronized this causes data race.
The fact that the load happened to produce the right value is
irrelevant, it could as well produce a wrong value and in general any
data race renders behavior of the program as undefined. You need to
add some synchronization between the store and the load to ensure that
the load always yields the correct value.


--
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Joe Best-Rotheray

unread,
Apr 2, 2016, 9:29:03 AM4/2/16
to Relacy Race Detector
Doesn't this release fence synchronise with the acquire fence?

[55] 0: release fence, in LockFreeMPMCQueue<unsigned int,32,64>::try_enqueue, lockfreempmcqueue.h(47)

Here's the actual code I'm testing http://pastebin.com/AXwa4zE1

Dmitry Vyukov

unread,
Apr 2, 2016, 10:51:46 AM4/2/16
to rel...@googlegroups.com
Release/acquire fences take effect only if you do an atomic store
before the release store and an atomic load (which loads the value
stored by the store) after the acquire fence. That is, the following:

// thread 1
data = 1;
ready.store(true, memory_order_release);

// thread 2
if (ready.load(memory_order_acquire))
assert(data == 1);

is equivalent to:

data = 1;
atomic_thread_fence(memory_order_release);
ready.store(true, memory_order_relaxed);

// thread 2
if (ready.load(memory_order_relaxed)) {
atomic_thread_fence(memory_order_acquire);
assert(data == 1);
}

The fences do not provide any synchronization on their own.
> --
> You received this message because you are subscribed to the Google Groups
> "Relacy Race Detector" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to relacy+un...@googlegroups.com.
> To post to this group, send email to rel...@googlegroups.com.
> Visit this group at https://groups.google.com/group/relacy.
> For more options, visit https://groups.google.com/d/optout.

Joe Best-Rotheray

unread,
Apr 2, 2016, 11:22:55 AM4/2/16
to Relacy Race Detector
So are you saying that:

data = 1; 
atomic_thread_fence(memory_order_release); 
ready.store(true, memory_order_relaxed); 

// thread 2 
if (ready.load(memory_order_relaxed)) { 
  atomic_thread_fence(memory_order_acquire); 
  assert(data == 1); 

contains a data race?

Dmitry Vyukov

unread,
Apr 2, 2016, 12:02:48 PM4/2/16
to rel...@googlegroups.com
This is not a data race.

What's in your code serves the purpose of the 'ready' variable?
From the log, the 'data' variable is 01041BF4.

Joe Best-Rotheray

unread,
Apr 2, 2016, 12:21:17 PM4/2/16
to Relacy Race Detector
"data = 1"
[43] 0: <01041BF4> store, value=0, in LockFreeMPMCQueue<unsigned int,32,64>::try_enqueue, lockfreempmcqueue.h(37)
release fence
[55] 0: release fence, in LockFreeMPMCQueue<unsigned int,32,64>::try_enqueue, lockfreempmcqueue.h(47)
ready = true
[57] 0: <01041B40> atomic store, value=3, (prev value=2), order=relaxed, in LockFreeMPMCQueue<unsigned int,32,64>::try_enqueue, lockfreempmcqueue.h(48)

if( ready )
[77] 3: <01041B40> atomic load, value=4, order=relaxed, in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(55)
acquire fence
[81] 3: acquire fence, in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(71)
"assert data == 1"
[83] 3: <01041BF4> load, value=0, in LockFreeMPMCQueue<unsigned int,32,64>::try_dequeue, lockfreempmcqueue.h(72)

Dmitry Vyukov

unread,
Apr 5, 2016, 2:42:17 AM4/5/16
to rel...@googlegroups.com
Atomic store does not synchronize with atomic load in this case.

See C++ 1.10/8:

Certain library calls synchronize with other library calls performed
by another thread. For example, an atomic store-release synchronizes
with a load-acquire that takes its value from the store (29.3). [
Note: Except in the specified cases, reading a later value does not
necessarily ensure visibility as described below. Such a requirement
would sometimes interfere with efficient implementation. — end note ]

The problem is that store stores 3, but load loads 4.

To fix this you need to ensure transitive synchronization. Thread that
stores 3 synchronizes with thread that stores 4, which in turn
synchronizes with load of 4. As the result store of 3 will be
synchronized with load of 4.

Something along the lines of using memory_order_acquire here:
while( m_tail_2($).load( rl::mo_relaxed ) != tail )

Joe Best-Rotheray

unread,
Apr 5, 2016, 4:10:16 AM4/5/16
to Relacy Race Detector
Wow, just when I think I'm figuring out this concurrency stuff O.o

I googled "reading a later value does not necessarily ensure visibility" and found that listing 5.11 of C++ Concurrency In Action covers this.

Thanks for all the help Dmitry :)

Dmitry Vyukov

unread,
Apr 5, 2016, 4:15:59 AM4/5/16
to rel...@googlegroups.com
You are welcome.

Note there is a caveat to this caveat:

1.10/7
A release sequence headed by a release operation A on an atomic object
M is a maximal contiguous subsequence of side effects in the
modification order of M, where the first operation is A, and every
subsequent operation
— is performed by the same thread that performed A, or
— is an atomic read-modify-write operation.

And reading a value stored by a side effect of any operation in the
/release sequence/ does provide synchronization.

Joe Best-Rotheray

unread,
Apr 5, 2016, 5:34:12 AM4/5/16
to Relacy Race Detector
So I could potentially solve this by writing to m_tail_2 with a store-release, reading with load-acquire, and only doing this with a RMW operation. E.g.

replace

    1.         while( m_tail_2($).load( rl::mo_relaxed ) != tail )
    1.         {
    2.             rl::yield(1, $);
    3.         }
    4.  
    5.         rl::atomic_thread_fence( rl::mo_release, $ );
    6.         m_tail_2($).store( tail + 1, rl::mo_relaxed );

    with something like:

    expected = tail;
    while( !m_tail_2.compare_exchange_strong( expected, tail + 1, rl::mo_release, rl::mo_acquire ) )
    {
       rl::yield(1, $);
       expected = tail;
    }

    I decided to exclusively use standalone fences for synchronisation as I find them easier to understand than store-release and load-acquire, but it looks like that was a bad idea! :)

    Joe Best-Rotheray

    unread,
    Apr 5, 2016, 8:29:00 AM4/5/16
    to Relacy Race Detector
    Having looked at this some more, it seems the problem is that m_tail_2 is loaded in try_dequeue, but its value is used to determine if there are values to be read or not, it is not modified from try_dequeue, so I can't make sure that all operations on m_tail_2 are RMW operations. I think perhaps I should go back to the drawing board with this one, and have a per-item "ready" variable, something like this:

    class Queue
    {
    struct Cell
    {
       T data;
       std::atomic<bool> ready;
    };
    Cell* m_data;
    // etc

    Dmitry Vyukov

    unread,
    Apr 10, 2016, 4:15:03 AM4/10/16
    to rel...@googlegroups.com
    On Tue, Apr 5, 2016 at 2:29 PM, Joe Best-Rotheray <spect...@gmail.com> wrote:
    > Having looked at this some more, it seems the problem is that m_tail_2 is
    > loaded in try_dequeue, but its value is used to determine if there are
    > values to be read or not, it is not modified from try_dequeue, so I can't
    > make sure that all operations on m_tail_2 are RMW operations. I think
    > perhaps I should go back to the drawing board with this one, and have a
    > per-item "ready" variable, something like this:
    >
    > class Queue
    > {
    > struct Cell
    > {
    > T data;
    > std::atomic<bool> ready;
    > };
    > Cell* m_data;
    > // etc
    > }


    Check out this queue:
    http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue

    Joe Best-Rotheray

    unread,
    Apr 11, 2016, 8:28:00 AM4/11/16
    to Relacy Race Detector
    I actually ended up with a very similar implementation to this. Obviously just having a bool didn't work, and like you I used an int which increased on every read and write.
    Reply all
    Reply to author
    Forward
    0 new messages