Question about relaxed memory ordering

mvor...@gmail.com

unread,

Feb 23, 2019, 3:44:40 PM2/23/19

to

Is it possible for this simple program to print false? Given the compiler or CPU can reorder around the relaxed atomic. Or is my understanding off.

#include <iostream>
#include <iomanip>
#include <thread>
using namespace std;

int main(int argc, char** argv)
{
atomic_bool flag{false};
bool data{false};

thread t1([&]() {
data = true;
flag.store(true, memory_order_relaxed);
});

thread t2([&]() {
while(flag.load(memory_order_relaxed) == false);
cout << "data = " << boolalpha << data << endl;
});

t1.join();
t2.join();

return 1;
}

Chris Vine

unread,

Feb 23, 2019, 5:09:28 PM2/23/19

to

On Sat, 23 Feb 2019 12:44:33 -0800 (PST)
mvor...@gmail.com wrote

Yes it is possible, and because of that technically you also have
undefined behaviour as regards the read and write of 'data', which is
not an atomic type.

Chris M. Thomasson

unread,

Feb 23, 2019, 5:22:14 PM2/23/19

to

Big time. This is a massive data race on 'bool data'.

mvor...@gmail.com

unread,

Feb 23, 2019, 6:10:07 PM2/23/19

to

the data race goes away if i change it to store(release) and load(acquire) or SC on both, correct?

Chris M. Thomasson

unread,

Feb 23, 2019, 6:20:17 PM2/23/19

to

Correct. :^)

Chris Vine

unread,

Feb 23, 2019, 7:23:53 PM2/23/19

to

Yes, but it is best to avoid using an acquire operation on the flag
itself if spinning, to avoid a fence instruction being emitted on each
iteration of the while loop. This would do what you want:

std::atomic<bool> flag{false};
bool data{false};

std::thread t1([&]() {
data = true;
std::atomic_thread_fence(std::memory_order_release);
flag.store(true, std::memory_order_relaxed);
});

std::thread t2([&]() {
while(!flag.load(std::memory_order_relaxed));
std::atomic_thread_fence(std::memory_order_acquire);

cout << "data = " << boolalpha << data << endl;
});

t1.join();
t2.join();

return 0;

Chris M. Thomasson

unread,

Feb 23, 2019, 11:25:22 PM2/23/19

to

:^D

Using standalone fences is my habit of choice. You sort if intermingled
it here but nice nonetheless:

https://groups.google.com/d/topic/lock-free/A1nzcMBGRzU/discussion

Chris M. Thomasson

unread,

Feb 23, 2019, 11:39:19 PM2/23/19

to

On 2/23/2019 8:25 PM, Chris M. Thomasson wrote:
> On 2/23/2019 4:23 PM, Chris Vine wrote:
>> On Sat, 23 Feb 2019 15:09:58 -0800 (PST)
>> mvor...@gmail.com wrote:
>>> On Saturday, February 23, 2019 at 3:44:40 PM UTC-5, mvor...@gmail.com
>>> wrote:
>>>> Is it possible for this simple program to print false? Given the
>>> compiler or CPU can reorder around the relaxed atomic. Or is my
>>> understanding off.

[...]

> :^D
>
> Using standalone fences is my habit of choice. You sort if intermingled
> it here but nice nonetheless:
>
> https://groups.google.com/d/topic/lock-free/A1nzcMBGRzU/discussion
>
>

Fwiw, programming on the SPARC basically got me on this. Wrt my comment
about intermingling, I accidentally read your release barrier as being
part of the atomic itself, not standalone. Sorry for any confusion
Chris. ;^o

Chris Vine

unread,

Feb 24, 2019, 6:05:52 AM2/24/19

to

On Sat, 23 Feb 2019 20:39:09 -0800
"Chris M. Thomasson" <invalid_chris_t...@invalid.com>
wrote:

As I read the standard[1] I think this "intermingling" works:

std::atomic<bool> flag{false};
bool data{false};

std::thread t1([&]() {
data = true;

flag.store(true, std::memory_order_release);

});

std::thread t2([&]() {
while(!flag.load(std::memory_order_relaxed));
std::atomic_thread_fence(std::memory_order_acquire);
cout << "data = " << boolalpha << data << endl;
});

t1.join();
t2.join();

return 0;

However, like you I don't do that because it looks a bit odd to me.

[1] §29.8/4 of C++11 says "An atomic operation A that is a release
operation on an atomic object M synchronizes with an acquire fence B if
there exists some atomic operation X on M such that X is sequenced
before B and reads the value written by A or a value written by any
side effect in the release sequence headed by A."

Chris M. Thomasson

unread,

Feb 24, 2019, 5:01:54 PM2/24/19

to

On 2/24/2019 3:05 AM, Chris Vine wrote:
> On Sat, 23 Feb 2019 20:39:09 -0800
> "Chris M. Thomasson" <invalid_chris_t...@invalid.com>
> wrote:
>> On 2/23/2019 8:25 PM, Chris M. Thomasson wrote:
>>> On 2/23/2019 4:23 PM, Chris Vine wrote:
>>>> On Sat, 23 Feb 2019 15:09:58 -0800 (PST)
>>>> mvor...@gmail.com wrote:
>>>>> On Saturday, February 23, 2019 at 3:44:40 PM UTC-5, mvor...@gmail.com
>>>>> wrote:
>>>>>> Is it possible for this simple program to print false? Given the
>>>>> compiler or CPU can reorder around the relaxed atomic. Or is my
>>>>> understanding off.
>> [...]
>>> :^D
>>>
>>> Using standalone fences is my habit of choice. You sort if intermingled
>>> it here but nice nonetheless:
>>>
>>> https://groups.google.com/d/topic/lock-free/A1nzcMBGRzU/discussion
>>>
>>>
>>
>> Fwiw, programming on the SPARC basically got me on this. Wrt my comment
>> about intermingling, I accidentally read your release barrier as being
>> part of the atomic itself, not standalone. Sorry for any confusion
>> Chris. ;^o
>
> As I read the standard[1] I think this "intermingling" works:

Indeed it does. Afaict, a std::atomic<T> behaves as if it were a fence
wrt the memory order.

> std::atomic<bool> flag{false};
> bool data{false};
>
> std::thread t1([&]() {
> data = true;
> flag.store(true, std::memory_order_release);
> });
>
> std::thread t2([&]() {
> while(!flag.load(std::memory_order_relaxed));
> std::atomic_thread_fence(std::memory_order_acquire);
> cout << "data = " << boolalpha << data << endl;
> });
>
> t1.join();
> t2.join();
>
> return 0;
>
> However, like you I don't do that because it looks a bit odd to me.

Totally agreed. At least it is guaranteed to work in the std.

> [1] §29.8/4 of C++11 says "An atomic operation A that is a release
> operation on an atomic object M synchronizes with an acquire fence B if
> there exists some atomic operation X on M such that X is sequenced
> before B and reads the value written by A or a value written by any
> side effect in the release sequence headed by A."
>

Right. I read this as if std::atomic behaves as a
std::atomic_thread_fence. However, once we introduce anything other than
relaxed in a std::atomic, it can "hide" where the actual barrier is
executed, or placed in the code.

For instance, a std::atomic::exchange with acquire will execute the
barrier _after_ the atomic RMW of exchange occurs. Using the standalone
fence allows the user to place it exactly where it is needed. It is more
complex, but can be more efficient...

Using a std::atomic::exchange with release will execute the barrier
_before_ the atomic RMW of exchange occurs.

Well, sometimes the standalone can be more flexible, and at least a lot
more "informative" wrt showing exactly where the barriers actually need
to be.