In the latest draft standard, I tried to look up the behavior of atomic read-modify-write operations with specific memory_order settings. In particular I am focusing on the fetch_add and fetch_sub statements.
In the standard (Draft N4606 2016-07-12) I can find:
1.10.1: atomic read-modify-write operations, which have special characteristics
The details of the 'special characteristics' I could not find in the chapter, however it is stated that:
29.3: Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.
29.6.5: (Regarding fetch_key)
Effects: Atomically replaces the value pointed to by object or by this with the result of the computation applied to the value pointed to by object or by this and the given operand. Memory is affected according to the value of order. These operations are atomic read-modify-write operations (1.10).
Returns: Atomically, the value pointed to by object or by this immediately before the effects.
In the lecture of Herb Sutter 'atomic<> Weapons - The C++ Memory Model and Modern Hardware' a reference counting / shared_ptr example was given:
Thread1 - Increment (inside, say smart_ptr copy ctor):
control_block_ptr = other->control_block_ptr;
control_block_ptr->refs.fetch_add(1, memory_order_relaxed);
Thread 2- Decrement (inside, say, smart_ptr dtoc)
if (control_block_ptr->refs.fetch_sub(1, memory_order_acq_rel) == 0) {
delete control_block_ptr;
Note: The typo that the ==0 should be ==1 is not the discussion here.
Another example is the boost 1_62 reference counting example (http://www.boost.org/doc/libs/1_62_0_b2/doc/html/atomic/usage_examples.html)
where this decrement code is shown:
void intrusive_ptr_release(const X * x)
{
if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
boost::atomic_thread_fence(boost::memory_order_acquire);
delete x;
}
}
I am wondering why memory_order_relaxed not used not used in the decrement examples? 29.3 guarantees that the read-modify-write atomic operation shall read the last value. Since only read-modify-write atomic operations are used in these examples, all uses of refcount are guaranteed to read the last value in the modification order.
--
| From: ronaldho...@gmail.com Sent: Wednesday, October 12, 2016 4:17 AM To: ISO C++ Standard - Discussion Reply To: std-dis...@isocpp.org Subject: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations? |
| From: ronaldho...@gmail.com Sent: Wednesday, October 12, 2016 8:05 AM To: ISO C++ Standard - Discussion Reply To: std-dis...@isocpp.org Cc: fai...@gmail.com Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations? |
| From: ronaldho...@gmail.com Sent: Wednesday, October 12, 2016 10:17 AM To: ISO C++ Standard - Discussion Reply To: std-dis...@isocpp.org |
Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations? |
The above reordering is not possibe because your myInt.load and cout
part cannot be reordered past the release-ordered fetch_sub.
Nothing gets reordered after a release (and before an acquire.) Reordering is possible in the other direction.
"Release" is like releasing a mutex, and "acquire" is like acquiring one. So if you have
acquire mutex;
r1 = x;
release mutex;
in thread 1, then
acquire mutex;
x = 5;
release mutex;
in thread 2, it's not allowed for the r1 = x to be reordered after the
release and for x = 5 to be reordered before the acquire. If it were
allowed, mutexes would be useless.
in thread 2, it's not allowed for the r1 = x to be reordered after the
release and for x = 5 to be reordered before the acquire. If it were
allowed, mutexes would be useless.
If that is true, this example does not demonstrate it. What observable behavior changes when x=5 (store) is reordered before the acquire in T2 and r1=x (load) is reordered after the release in T1?
r1 = x in thread 1 is sequenced before "release mutex". "release mutex" in
thread 1 synchronizes-with "acquire mutex" in thread 2. "acquire mutex" in
thread 2 is sequenced before "x = 5". Therefore, r1 = x happens before x =
5.