What is the behavior of memory_order_relaxed with atomic read-modify-write operations?

1,639 views
Skip to first unread message

ronaldho...@gmail.com

unread,
Oct 12, 2016, 4:17:26 AM10/12/16
to ISO C++ Standard - Discussion

In the latest draft standard, I tried to look up the behavior of atomic read-modify-write operations with specific memory_order settings. In particular I am focusing on the fetch_add and fetch_sub statements.


In the standard (Draft N4606 2016-07-12) I can find:


1.10.1: atomic read-modify-write operations, which have special characteristics

The details of the 'special characteristics' I could not find in the chapter, however it is stated that:


29.3: Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.


29.6.5: (Regarding fetch_key)

Effects: Atomically replaces the value pointed to by object or by this with the result of the computation applied to the value pointed to by object or by this and the given operand. Memory is affected according to the value of order. These operations are atomic read-modify-write operations (1.10).

Returns: Atomically, the value pointed to by object or by this immediately before the effects.


In the lecture of Herb Sutter 'atomic<> Weapons - The C++ Memory Model and Modern Hardware' a reference counting / shared_ptr example was given:


Thread1 - Increment (inside, say smart_ptr copy ctor):

   control_block_ptr = other->control_block_ptr;

   control_block_ptr->refs.fetch_add(1, memory_order_relaxed);


Thread 2- Decrement (inside, say, smart_ptr dtoc)

  if (control_block_ptr->refs.fetch_sub(1, memory_order_acq_rel) == 0) {

    delete control_block_ptr;


Note: The typo that the ==0 should be ==1 is not the discussion here.


Another example is the boost 1_62 reference counting example (http://www.boost.org/doc/libs/1_62_0_b2/doc/html/atomic/usage_examples.html)

where this decrement code is shown:

void intrusive_ptr_release(const X * x)
  {
    if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
      boost::atomic_thread_fence(boost::memory_order_acquire);
      delete x;
    }
  }



I am wondering why memory_order_relaxed not used not used in the decrement examples? 29.3 guarantees that the read-modify-write atomic operation shall read the last value. Since only read-modify-write atomic operations are used in these examples, all uses of refcount are guaranteed to read the last value in the modification order.


Question 1: Why is the memory order release needed in the fetch_sub? There is no code that can be reordered incorrectly and 29.3 guarantees that the last value will always be read.

Question 2: Why is the acquire needed before the delete?
For the decrement a condition is used so the delete of the atomic reference may never be moved earlier that the atomic read-modify-write operation in an optimization process. Therefore the atomic read-modify-write will always happen before the deletion of the reference counter. Assuming no other code is present in the given samples, why would the acquire be needed? Is this not ensured by read-read coherence etc.?



Andrey Semashev

unread,
Oct 12, 2016, 4:58:09 AM10/12/16
to std-dis...@isocpp.org
On 10/12/16 11:17, ronaldho...@gmail.com wrote:
>
> Another example is the boost 1_62 reference counting example
> (http://www.boost.org/doc/libs/1_62_0_b2/doc/html/atomic/usage_examples.html)
>
> where this decrement code is shown:
>
> void intrusive_ptr_release(const X * x)
> {
> if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
> boost::atomic_thread_fence(boost::memory_order_acquire);
> delete x;
> }
> }
>
>
>
> I am wondering why memory_order_relaxed not used not used in the
> decrement examples? 29.3 guarantees that the read-modify-write atomic
> operation shall read the last value. Since only read-modify-write atomic
> operations are used in these examples, all uses of refcount are
> guaranteed to read the last value in the modification order.
>
>
> Question 1: Why is the memory order release needed in the fetch_sub?
> There is no code that can be reordered incorrectly and 29.3 guarantees
> that the last value will always be read.

The release semantics ensure that other (non-atomic) memory accesses
preceeding the decrement are not reordered after the decrement. For
instance, if you made some changes to the object, you want these changes
become visible to other threads before you release the reference to the
object so that if another thread releases his last reference and calls
the destructor he has the actual view of the object.

> Question 2: Why is the acquire needed before the delete?

This is to guarantee that the destructor's memory accesses are not
reordered before the reference counter decrement. Again, this makes sure
the destructor operates on the actual view of the object.

> For the decrement a condition is used so the delete of the atomic
> reference may never be moved earlier that the atomic read-modify-write
> operation in an optimization process.

Subsequent memory operations can be reordered before a non-acquire
atomic operation, by both the compiler and CPUs. In other words had
there been no acquire fence, some contents of the object could have been
loaded from memory speculatively before the atomic decrement.

Faisal Vali

unread,
Oct 12, 2016, 7:17:08 AM10/12/16
to std-dis...@isocpp.org
Regardless of how hardware creates the illusion (i.e.
flush/fence-instructions operating at different levels of
architectural granularity), the memory model only allows us to reason
about whether some sequence of code S1 (that modifies a set of memory
locations {ML}) was executed by T1 (thread-of-execution) before some
other sequence of code S2 (i.e the state of {ML} but *only* from the
point-of-view of S2's thread-of-execution T2) if a 'happens before'
relationship is established between S1 and S2 during some execution of
your system. modification order of a memory location does not
establish a happens-before relationship between S1 and S2 - and so you
need to establish one using a store-release and load-acquire (yes I
know this is non-intuitive - I've struggled with it myself). In this
case S1 and S2 are lexically the same sequence of instructions but if
they are executed by different threads-of-execution, then we need S1
to have 'happened before' S2 for our system to work - hence you need
the store-release (during S1's execution by T1) and the load-acquire
(during S2's execution by T2).


>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to std-discussio...@isocpp.org.
> To post to this group, send email to std-dis...@isocpp.org.
> Visit this group at
> https://groups.google.com/a/isocpp.org/group/std-discussion/.

ronaldho...@gmail.com

unread,
Oct 12, 2016, 8:05:57 AM10/12/16
to ISO C++ Standard - Discussion, fai...@gmail.com
Thank you for your responses.

With normal atomic load and store operations, I would fully agree. However, for atomic read-modify-write operations, this statement is made by the standard:
1.10.1: atomic read-modify-write operations, which have special characteristics
29.3: Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

This would imply that "Happens before" is already guaranteed by this instruction, regardless of the used memory model.

Faisalv

unread,
Oct 12, 2016, 8:24:38 AM10/12/16
to std-dis...@isocpp.org
You're welcome to pursue this line of reasoning But for what it's worth I've asked sg1 a similar question and the answer is no - modification order does not imply 'happens before'

Sent from my iPhone
--

Anthony Williams

unread,
Oct 12, 2016, 8:52:19 AM10/12/16
to std-dis...@isocpp.org
On 12/10/16 13:24, Faisalv wrote:
> You're welcome to pursue this line of reasoning But for what it's worth
> I've asked sg1 a similar question and the answer is no - modification
> order does not imply 'happens before'

Correct. The happens-before relationship is about what is guaranteed wrt
to *other* memory locations.

Thread 1: store to X, RMW on Y.
Thread 2: RMW on Y, load from X.

If thread 1's RMW on Y does not have (at least) "release" ordering
and/or the thread 2's RMW on Y does not have (at least) "acquire"
ordering then even if thread 2's RMW is later in modification order than
thread 1's, then the compiler/processor/cache is not required to ensure
visibility of the store to X from thread 1 to the load in thread 2.

However, relaxed RMW ops can form part of a "release sequence" --- a
store release followed by a series of RMW ops from random threads,
followed by a load acquire of the last value written is still a
release-acquire pairing.

Anthony
--
Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
just::thread C++11 thread library http://www.stdthread.co.uk
Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Andrey Semashev

unread,
Oct 12, 2016, 9:07:10 AM10/12/16
to std-dis...@isocpp.org
No, I don't think it implies that. See [intro.races]/4 for the
definition of the modification order.

I believe what the standard says here is that there is a modification
order for a given atomic object, and this order respects the "happens
before" rule. However, that order does not include other (atomic or not)
memory accesses. Also, there is no "happens before" relation between
relaxed operations on the same atomic object performed by different
threads because there is no "inter-thread happens before"
([intro.races]/9) relation between the two operations. As such these
operations are indeterminately ordered.

In the reference counting case, if two threads are decrementing the
counter concurrently from 2 to 0, there is no guarantee which thread
reaches 0. There is only the guarantee that 0 will be reached in the end.

Tony V E

unread,
Oct 12, 2016, 9:52:11 AM10/12/16
to ISO C++ Standard - Discussion
‎"Assuming no other code is present..."

‎There is always other code. The acquire/release is for the other code that is touching other (related) memory. ie whatever the pointer is pointing to. 


Sent from my BlackBerry portable Babbage Device
Sent: Wednesday, October 12, 2016 4:17 AM
To: ISO C++ Standard - Discussion
Subject: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations?

Tony V E

unread,
Oct 12, 2016, 9:52:14 AM10/12/16
to ISO C++ Standard - Discussion, fai...@gmail.com
It only implies, that the atomic happens in some order, not all memory.

Sent from my BlackBerry portable Babbage Device
Sent: Wednesday, October 12, 2016 8:05 AM
To: ISO C++ Standard - Discussion
Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations?

--

ronaldho...@gmail.com

unread,
Oct 12, 2016, 10:17:31 AM10/12/16
to ISO C++ Standard - Discussion
Again, thank you all for replying. Please forgive my tenaciousness on this subject.

> In the reference counting case, if two threads are decrementing the counter concurrently from 2 to 0, there is no guarantee which thread reaches 0. There is only the guarantee that 0 will be reached in the end.

Is this not what you want? For shared_ptr, the destructor will typically run on any thread, so as long as it reached 0 and the destructor is called it behaves ok. Let's assume this decrement code:

void intrusive_ptr_release(const X * x)
  {
    if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      delete x;
    }
  }

Can we assume that the == 1 will only evaluate to true once? I would feel yes, cause RMW atomics are guaranteed to read the last value in the modification order
Can we assume that delete will always be called after evaluation of the fetch_sub? I would feel yes, since it depends on the == 1 condition and therefore may not be reordered before the fetch_sub.

Would this implies that this code is correct? (With the only side effect that it is less defined on which thread the destructor of the shared_ptr would be called, but it would only be called once)

If my reasoning is incorrect, I would appreciate a counter-example. What goes wrong and why?

Tony V E

unread,
Oct 12, 2016, 10:21:13 AM10/12/16
to ISO C++ Standard - Discussion
If the decrement is relaxed, the thread that went from 2 -> 1 could still be accessing x while the 1 -> 0 thread is deleting it.

Sent from my BlackBerry portable Babbage Device
Sent: Wednesday, October 12, 2016 10:17 AM
To: ISO C++ Standard - Discussion
Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations?

Anthony Williams

unread,
Oct 12, 2016, 10:22:33 AM10/12/16
to std-dis...@isocpp.org
On 12/10/16 15:17, ronaldho...@gmail.com wrote:
> Again, thank you all for replying. Please forgive my tenaciousness on
> this subject.
>
>> In the reference counting case, if two threads are decrementing the
> counter concurrently from 2 to 0, there is no guarantee which thread
> reaches 0. There is only the guarantee that 0 will be reached in the end.
>
> Is this not what you want? For shared_ptr, the destructor will typically
> run on any thread, so as long as it reached 0 and the destructor is
> called it behaves ok. Let's assume this decrement code:
>
> void intrusive_ptr_release(const X * x)
> {
> if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
> delete x;
> }
> }
>
> Can we assume that the == 1 will only evaluate to true once? I would
> feel yes, cause RMW atomics are guaranteed to read the last value in the
> modification order
> Can we assume that delete will always be called after evaluation of the
> fetch_sub? I would feel yes, since it depends on the == 1 condition and
> therefore may not be reordered before the fetch_sub.

The answers are "yes" and "yes".

> Would this implies that this code is correct? (With the only side effect
> that it is less defined on which thread the destructor of the shared_ptr
> would be called, but it would only be called once)

No, your code is not correct, there is still a problem.

Suppose threads 1&2 both hold references to an X object, and the ref
count is 2.

Thread 1 modifies the X object, and then calls intrusive_ptr_release.
Thread 2 also calls intrusive_ptr_release.

Suppose thread 2's call to intrusive_ptr_release is after that from
thread 1 in the modification order of x->refcount_. Thread 2 is
therefore the thread that will see x->refcount_ drop to zero, and thus
call the destructor.

However, there is no synchronization edge between the modification to X
made in thread 1 and the destructor call in thread 2. Consequently there
is a data race, and undefined behaviour: thread 2 may destroy the object
before thread 1's write, so thread 1 is writing to a destroyed object.

Andrey Semashev

unread,
Oct 12, 2016, 10:32:12 AM10/12/16
to std-dis...@isocpp.org
On 10/12/16 17:17, ronaldho...@gmail.com wrote:
> Again, thank you all for replying. Please forgive my tenaciousness on
> this subject.
>
>> In the reference counting case, if two threads are decrementing the
> counter concurrently from 2 to 0, there is no guarantee which thread
> reaches 0. There is only the guarantee that 0 will be reached in the end.
>
> Is this not what you want?

Yes, that is enough guarantee to implement reference counting. But that
in turn means that each thread has to ensure its modifications of the
object are visible to other threads (because you don't know which thread
will call the destructor).

> Let's assume this decrement code:
>
> void intrusive_ptr_release(const X * x)
> {
> if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
> delete x;
> }
> }
>
> Can we assume that the == 1 will only evaluate to true once?

Yes.

> Can we assume that delete will always be called after evaluation of the
> fetch_sub?

No. As I said earlier, both the compiler and the CPU are allowed to
reorder (parts of) the destructor before the decrement.

For instance, the compiler might decide that the beginning of the
destructor could be sped up if its initial loads are performed
beforehand so that the relevant data is already in the cache by the time
fetch_sub completes.

Even if the compiler doesn't do that, the CPU can speculatively predict
that the branch is taken before the atomic decrement completes and begin
executing the destructor using stale data from cache.

The same can happen in reverse direction - your stores to the object
contents just before you release the reference can be reordered after
the fetch_sub. In this case the destructor executed in another thread
will not see those modifications.

Anthony Williams

unread,
Oct 12, 2016, 10:45:44 AM10/12/16
to std-dis...@isocpp.org
On 12/10/16 15:32, Andrey Semashev wrote:
> On 10/12/16 17:17, ronaldho...@gmail.com wrote:
>> Again, thank you all for replying. Please forgive my tenaciousness on
>> this subject.
>>
>>> In the reference counting case, if two threads are decrementing the
>> counter concurrently from 2 to 0, there is no guarantee which thread
>> reaches 0. There is only the guarantee that 0 will be reached in the end.
>>
>> Is this not what you want?
>
> Yes, that is enough guarantee to implement reference counting. But that
> in turn means that each thread has to ensure its modifications of the
> object are visible to other threads (because you don't know which thread
> will call the destructor).
>
>> Let's assume this decrement code:
>>
>> void intrusive_ptr_release(const X * x)
>> {
>> if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
>> delete x;
>> }
>> }
>>
>> Can we assume that the == 1 will only evaluate to true once?
>
> Yes.
>
>> Can we assume that delete will always be called after evaluation of the
>> fetch_sub?
>
> No. As I said earlier, both the compiler and the CPU are allowed to
> reorder (parts of) the destructor before the decrement.

It's complicated. The fetch_sub is sequenced-before the delete on the
same thread, so from the point of view of that thread the delete must
occur after the fetch_sub.

However, from the POV of *other* threads, then there is no ordering
constraint, and the effects may become visible in either order.

Andrey Semashev

unread,
Oct 12, 2016, 10:49:59 AM10/12/16
to std-dis...@isocpp.org
Even in the same thread, the program is allowed to behave "as-if" the
delete is called after fetch_sub. This includes reordering some parts of
the destructor with fetch_sub in the real generated code, as long as the
destructor does not observe effects of that reordering.

> However, from the POV of *other* threads, then there is no ordering
> constraint, and the effects may become visible in either order.

Absolutely.

Thiago Macieira

unread,
Oct 12, 2016, 11:00:01 AM10/12/16
to std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 15:22:30 CEST, Anthony Williams
escreveu:
> However, there is no synchronization edge between the modification to X
> made in thread 1 and the destructor call in thread 2. Consequently there
> is a data race, and undefined behaviour: thread 2 may destroy the object
> before thread 1's write, so thread 1 is writing to a destroyed object.

Shouldn't this be called out as a bug in the class X itself?

If we're operating on a refcounted object from multiple threads and one of
those threads modifies the object, it should apply the appropriate
synchronisation mechanisms to ensure there are no data races. That is,
shouldn't X::modify() do a release and X::~X() do an acquire?

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

ronaldho...@gmail.com

unread,
Oct 12, 2016, 11:08:10 AM10/12/16
to ISO C++ Standard - Discussion
Ok, thanks for the clarification. I wrote a small code snippet to try to understand:

Assume 2 Threads with the refcount at 2:

Thread 1:
if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      delete x;
    }

Thread 2:
x->myInt = 4;
if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      delete x;
    }

The compiler / processor could reorder the code of Thread 2 to:
if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      x->myInt = 4;
      delete x;
    } else {
      x->myInt = 4;
    }

Now if Thread 1 calls the destructor, Thread 2 could cause an invalid write.

So this means that the code must at least be:

Thread 1:
if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
      delete x;
    }

Thread 2:
x->myInt = 4;
if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
      delete x;
    }

Now the reordering above is not allowed, so that's good.

Still one point I don't understand: Why is the acquire needed? This still seems hard to grasp.

Andrey Semashev

unread,
Oct 12, 2016, 11:23:59 AM10/12/16
to std-dis...@isocpp.org
Assume for instance that x is a pointer to the following class X:

class X
{
public:
int myInt;

~X()
{
std::cout << myInt << std::endl;
}
};

Now your code with release semantics could be transformed to the following:

Thread 1:
int prefetchedMyInt = x->myInt;
if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
std::cout << prefetchedMyInt << std::endl;
::operator delete (x); // free raw storage
}

Thread 2:
x->myInt = 4;
if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
delete x;
}

Here, your assignment "x->myInt = 4" can be lost.

Andrey Semashev

unread,
Oct 12, 2016, 11:29:13 AM10/12/16
to std-dis...@isocpp.org
On 10/12/16 17:59, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 15:22:30 CEST, Anthony Williams
> escreveu:
>> However, there is no synchronization edge between the modification to X
>> made in thread 1 and the destructor call in thread 2. Consequently there
>> is a data race, and undefined behaviour: thread 2 may destroy the object
>> before thread 1's write, so thread 1 is writing to a destroyed object.
>
> Shouldn't this be called out as a bug in the class X itself?
>
> If we're operating on a refcounted object from multiple threads and one of
> those threads modifies the object, it should apply the appropriate
> synchronisation mechanisms to ensure there are no data races. That is,
> shouldn't X::modify() do a release and X::~X() do an acquire?

The object X may not be designed to know that it is reference counted.
You typically don't expect to have to synchronize with other threads in
the destructor because it should only be called in one thread. And while
there should be synchronization protecting X::modify(), there typically
isn't one for a destructor.

Faisal Vali

unread,
Oct 12, 2016, 1:01:55 PM10/12/16
to std-dis...@isocpp.org
On Wed, Oct 12, 2016 at 7:52 AM, Anthony Williams <antho...@gmail.com> wrote:
> On 12/10/16 13:24, Faisalv wrote:
>> You're welcome to pursue this line of reasoning But for what it's worth
>> I've asked sg1 a similar question and the answer is no - modification
>> order does not imply 'happens before'
>
> Correct. The happens-before relationship is about what is guaranteed wrt
> to *other* memory locations.
>
> Thread 1: store to X, RMW on Y.
> Thread 2: RMW on Y, load from X.
>
> If thread 1's RMW on Y does not have (at least) "release" ordering
> and/or the thread 2's RMW on Y does not have (at least) "acquire"
> ordering then even if thread 2's RMW is later in modification order than
> thread 1's, then the compiler/processor/cache is not required to ensure
> visibility of the store to X from thread 1 to the load in thread 2.
>


> However, relaxed RMW ops can form part of a "release sequence" --- a
> store release followed by a series of RMW ops from random threads,
> followed by a load acquire of the last value written is still a
> release-acquire pairing.
>

Nice - I think your last paragraph might clarify for me why the
fetch_add can be relaxed - and still sync up with any subsequent
acquires that might otherwise result in a delete ptr (and since the
assumption is that 'other' is valid and prevents the ref-coutn from
dropping into deletable levels - that sequence of instructions is
safe?)

> Anthony
> --
> Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
> just::thread C++11 thread library http://www.stdthread.co.uk
> Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
> 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976
>

Thiago Macieira

unread,
Oct 12, 2016, 1:53:30 PM10/12/16
to std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 18:29:09 CEST, Andrey Semashev
escreveu:
> > If we're operating on a refcounted object from multiple threads and one of
> > those threads modifies the object, it should apply the appropriate
> > synchronisation mechanisms to ensure there are no data races. That is,
> > shouldn't X::modify() do a release and X::~X() do an acquire?
>
> The object X may not be designed to know that it is reference counted.
> You typically don't expect to have to synchronize with other threads in
> the destructor because it should only be called in one thread. And while
> there should be synchronization protecting X::modify(), there typically
> isn't one for a destructor.

Fair enough. Then the thread synchronisation needs to be external to it. If
you're using a non-thread-safe object in multiple threads, then your code must
perform the synchronisation. In this case, what prevents two threads from
calling X::modify() at the same time? If you do have a lock, then the lock
performs the memory synchronisation.

If you don't have a lock, then the reference counting won't prevent it and you
should really not be calling modify() in the first place, from any thread.

The usual case of reference-counting is that the object is immutable while
mroe than one reference is active. In that case, there's no need to perform an
acquire barrier, since no threads can have modified the object since it was
orignially synchronised to the thread that will delete it.

The release barrier may still be required, depending on what is done with the
destruction. It's not required for a regular delete, as new/delete/malloc/free
perform the synchronisation, but it may be needed if you're implementing your
own free list.

Thiago Macieira

unread,
Oct 12, 2016, 1:57:06 PM10/12/16
to std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 08:08:10 CEST,
ronaldho...@gmail.com escreveu:
> Thread 1:
> if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
> delete x;
> }
>
> Thread 2:
> x->myInt = 4;
> if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
> delete x;
> }
>
> Now the reordering above is not allowed, so that's good.
>
> Still one point I don't understand: Why is the acquire needed? This still
> seems hard to grasp.

Because of that x->myInt = 4. If the int is written to from Thread 2, but
Thread 1 wants to delete, it needs to acquire all the modifications made from
Thread 2. Otherwise, the object x may be seen in an inconsistent state.

I still maintain that the problem is the x->myInt = 4 in the first place
without an extra synchronisation mechanism. The reference counting is not
sufficient to guarantee synchronisation of a shared object. If you want to
modify x, acquire a lock.

Andrey Semashev

unread,
Oct 12, 2016, 2:09:19 PM10/12/16
to std-dis...@isocpp.org
On 10/12/16 20:53, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 18:29:09 CEST, Andrey Semashev
> escreveu:
>>> If we're operating on a refcounted object from multiple threads and one of
>>> those threads modifies the object, it should apply the appropriate
>>> synchronisation mechanisms to ensure there are no data races. That is,
>>> shouldn't X::modify() do a release and X::~X() do an acquire?
>>
>> The object X may not be designed to know that it is reference counted.
>> You typically don't expect to have to synchronize with other threads in
>> the destructor because it should only be called in one thread. And while
>> there should be synchronization protecting X::modify(), there typically
>> isn't one for a destructor.
>
> Fair enough. Then the thread synchronisation needs to be external to it. If
> you're using a non-thread-safe object in multiple threads, then your code must
> perform the synchronisation. In this case, what prevents two threads from
> calling X::modify() at the same time? If you do have a lock, then the lock
> performs the memory synchronisation.

I think you missed what I wrote above about the destructor. Even if you
do perform external synchronization for X::modify(), you don't do that
for the destructor - mostly because you don't know when it is called
(i.e. when the reference counter drops to zero).

You could say that the (thread-safe) reference counting does that
synchronization for you by having the necessary acquire fence.

Thiago Macieira

unread,
Oct 12, 2016, 2:16:34 PM10/12/16
to std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 21:09:15 CEST, Andrey Semashev
escreveu:
> > Fair enough. Then the thread synchronisation needs to be external to it.
> > If
> > you're using a non-thread-safe object in multiple threads, then your code
> > must perform the synchronisation. In this case, what prevents two threads
> > from calling X::modify() at the same time? If you do have a lock, then
> > the lock performs the memory synchronisation.
>
> I think you missed what I wrote above about the destructor. Even if you
> do perform external synchronization for X::modify(), you don't do that
> for the destructor - mostly because you don't know when it is called
> (i.e. when the reference counter drops to zero).

I didn't, because you can't delete the object while the other thread has a
lock in place. That means you need to wait for it to release the lock before
the current thread calls delete and that's a happens-before.

> You could say that the (thread-safe) reference counting does that
> synchronization for you by having the necessary acquire fence.

Hmm... right. But reference counting is only thread-safety for the lifetime of
the object. All operations must still be atomic, if you won't use a lock.

Andrey Semashev

unread,
Oct 12, 2016, 2:22:12 PM10/12/16
to std-dis...@isocpp.org
On 10/12/16 21:16, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 21:09:15 CEST, Andrey Semashev
> escreveu:
>>> Fair enough. Then the thread synchronisation needs to be external to it.
>>> If
>>> you're using a non-thread-safe object in multiple threads, then your code
>>> must perform the synchronisation. In this case, what prevents two threads
>>> from calling X::modify() at the same time? If you do have a lock, then
>>> the lock performs the memory synchronisation.
>>
>> I think you missed what I wrote above about the destructor. Even if you
>> do perform external synchronization for X::modify(), you don't do that
>> for the destructor - mostly because you don't know when it is called
>> (i.e. when the reference counter drops to zero).
>
> I didn't, because you can't delete the object while the other thread has a
> lock in place. That means you need to wait for it to release the lock before
> the current thread calls delete and that's a happens-before.

You don't have to acquire a lock to release the reference to the object.
You won't delete the object not because of the lock but because that
other thread holds a reference to the object.

If you manage references to the object by acquiring a lock then you
don't need atomic reference counter.

Thiago Macieira

unread,
Oct 12, 2016, 2:32:02 PM10/12/16
to std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 21:22:08 CEST, Andrey Semashev
escreveu:
> You don't have to acquire a lock to release the reference to the object.
> You won't delete the object not because of the lock but because that
> other thread holds a reference to the object.
>
> If you manage references to the object by acquiring a lock then you
> don't need atomic reference counter.

Right.

But what kind of access can you legitimately have to a shared object that is
only protected by an atomic reference counter?

I argue that you can only do atomic operations, like incrementing a use
counter. If any operation is done that reads two values, then there needs to
be a memory barrier to ensure happens-before. That barrier either is inside
the class, or in the code using it. Either way, there's a barrier.

Andrey Semashev

unread,
Oct 12, 2016, 2:42:31 PM10/12/16
to std-dis...@isocpp.org
On 10/12/16 21:31, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 21:22:08 CEST, Andrey Semashev
> escreveu:
>> You don't have to acquire a lock to release the reference to the object.
>> You won't delete the object not because of the lock but because that
>> other thread holds a reference to the object.
>>
>> If you manage references to the object by acquiring a lock then you
>> don't need atomic reference counter.
>
> Right.
>
> But what kind of access can you legitimately have to a shared object that is
> only protected by an atomic reference counter?
>
> I argue that you can only do atomic operations, like incrementing a use
> counter.

Not necessarilly. A thread may be guaranteed to have exclusive access to
the object by external means (e.g. a thread creates and initializes the
object; no other thread knows about the object at this point, no lock
needs to be acquired). Granted, releasing such object into shared usage
usually involves a fence at some point.

Tony V E

unread,
Oct 12, 2016, 3:15:01 PM10/12/16
to Thiago Macieira, Peter Dimov
The lock protecting modification is not synchronized-with the refcount atomic. 

So if the last thread doesn't grab the lock the delete still doesn't see the right data. 

I'm not sure what that implies for this thread, but it sounds like a problem. 

I think Peter Dimov is the only person I trust to answer questions about shared_ptr atomic ops. CC'd


Sent from my BlackBerry portable Babbage Device
  Original Message  
From: Thiago Macieira
Sent: Wednesday, October 12, 2016 2:32 PM
To: std-dis...@isocpp.org
Reply To: std-dis...@isocpp.org
Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed with atomic read-modify-write operations?

Thiago Macieira

unread,
Oct 12, 2016, 4:20:39 PM10/12/16
to std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 21:42:27 CEST, Andrey Semashev
escreveu:
> Not necessarilly. A thread may be guaranteed to have exclusive access to
> the object by external means (e.g. a thread creates and initializes the
> object; no other thread knows about the object at this point, no lock
> needs to be acquired). Granted, releasing such object into shared usage
> usually involves a fence at some point.

Exactly. I'm not interested in the single-thread case. I'm also not interested
in the read-only multi-thread case. I'm wondering about a multi-thread case
that modifies the object.

Thiago Macieira

unread,
Oct 12, 2016, 4:21:55 PM10/12/16
to std-dis...@isocpp.org, Tony V E, Peter Dimov
Em quarta-feira, 12 de outubro de 2016, às 15:14:58 CEST, Tony V E escreveu:
> The lock protecting modification is not synchronized-with the refcount
> atomic.
>
> So if the last thread doesn't grab the lock the delete still doesn't see the
> right data.

True, but my point is that the code that will delete should have grabbed the
lock in the first place.

>
> I'm not sure what that implies for this thread, but it sounds like a
> problem.
>
> I think Peter Dimov is the only person I trust to answer questions about
> shared_ptr atomic ops. CC'd


Peter Dimov

unread,
Oct 12, 2016, 5:15:13 PM10/12/16
to Tony V E, std-dis...@isocpp.org
As Andrey and Anthony have already explained, when you have

shared_ptr<X> p1( new X );

in thread 1, then pass this to thread 2 (as p2) which then modifies *p2 and
drops its reference, after which thread 1 drops its reference, for ~X
executed in thread 1 to see the changes done in thread 2, the reference
count decrement inside ~shared_ptr needs to establish a synchronizes-with
relationship.

Since X is only modified from a single thread (thread 2), there is no need
for it to contain synchronization code.

Yes, there is nothing that would prevent X to be modified from more than one
thread. This is independent of shared_ptr. Just don't do that. shared_ptr's
job is to ensure that if you play by the rules and do not introduce data
races on X, it won't introduce any of its own, either.

-----Original Message-----
From: Tony V E
Sent: Wednesday, October 12, 2016 22:14
To: Thiago Macieira ; std-dis...@isocpp.org
Cc: Peter Dimov
Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed
with atomic read-modify-write operations?

The lock protecting modification is not synchronized-with the refcount
atomic.

So if the last thread doesn't grab the lock the delete still doesn't see the
right data.

I'm not sure what that implies for this thread, but it sounds like a
problem.

I think Peter Dimov is the only person I trust to answer questions about
shared_ptr atomic ops. CC'd


Sent from my BlackBerry portable Babbage Device
Original Message
From: Thiago Macieira
Sent: Wednesday, October 12, 2016 2:32 PM
To: std-dis...@isocpp.org
Reply To: std-dis...@isocpp.org
Subject: Re: [std-discussion] What is the behavior of memory_order_relaxed
with atomic read-modify-write operations?

Em quarta-feira, 12 de outubro de 2016, ąs 21:22:08 CEST, Andrey Semashev

Peter Dimov

unread,
Oct 12, 2016, 5:21:53 PM10/12/16
to Thiago Macieira, std-dis...@isocpp.org, Tony V E
Thiago Macieira wrote:

> True, but my point is that the code that will delete should have grabbed
> the lock in the first place.

As a general rule, if you're locking in the destructor you're doing
something wrong, although the specific case where you reference count using
relaxed operations is an exception. But usually, if the destructor can race
with another function that can take the lock, you're already dead, and if
you only need the lock for visibility reasons, _usually_ the primitive that
you use for lifetime management (such as shared_ptr) or interthread
communication should ensure the visibility.

Thiago Macieira

unread,
Oct 12, 2016, 6:37:04 PM10/12/16
to Peter Dimov, std-dis...@isocpp.org, Tony V E
Em quinta-feira, 13 de outubro de 2016, às 00:21:17 CEST, Peter Dimov
escreveu:
In this case, the lock is outside the object, so yes you should grab it before
checking whether your thread is the one that should delete.

Peter Dimov

unread,
Oct 12, 2016, 7:06:51 PM10/12/16
to Thiago Macieira, std-dis...@isocpp.org, Tony V E
Thiago Macieira wrote:

> In this case, the lock is outside the object, so yes you should grab it
> before checking whether your thread is the one that should delete.

"This case" as far as I can understand refers to something like

// thread 1

lock external mutex;
p1->modify();
unlock;

// thread 2

lock external mutex;
p2->modify();
unlock;

and you're saying that the destruction of p1 or p2 needs to take the lock.
Well it does need to take the lock if the reference count doesn't
synchronize, and does not need to take the lock if the reference count does
synchronize. And not taking the lock on dropping a reference (which can
happen implicitly on scope exit) is much more convenient. The separation of
concerns is, I worry about what I do to the pointee, shared_ptr worries
about what it does to the pointee (implicitly invoking its destructor when
the last reference is dropped). So I synchronize my accesses with the lock
as above, and shared_ptr synchronizes its accesses using the count.

Message has been deleted

ronaldho...@gmail.com

unread,
Oct 13, 2016, 3:20:43 AM10/13/16
to ISO C++ Standard - Discussion, thi...@macieira.org, tvan...@gmail.com, pdi...@mmltd.net
Guys, thank you for all the interesting information. This topic is really insightfull to me,

Given the modify running together with the destructor, even with simple types I believe you can create scenario's that cause incorrect behavior.

Consider the is a ref-counted shared_ptr to an std::atomic<int>.

Assume 2 Threads with the refcount at 2:

Thread 1:
if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      delete x;
    }

Thread 2:
if (x->myInt.load(memory_order_relaxed) == 0)
    std::cout << "Some statement" << std::endl;
x->myInt.store(42, memory_order_relaxed);
if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      delete x;
    }

A valid reordering an optimizer / processor internally could do is:
Thread 2:

if (x->refcount_.fetch_sub(1, memory_order_relaxed) == 1) {
      if (x->myInt.load(memory_order_relaxed) == 0)
          std::cout << "Some statement" << std::endl;
      x->myInt.store(42, memory_order_relaxed);
      delete x;
    }
else {
      if (x->myInt.load(memory_order_relaxed) == 0)
          std::cout << "Some statement" << std::endl;
      x->myInt.store(42, memory_order_relaxed);
    }

In this case, there is a flow possible where the store would be done to an already deleted x, which should not be possible.

So I agree with Andrey, this is incorrect and relaxed atomics to the fetch_sub is not sufficient.

And clearly this needs to have acquire / release semantics, otherwise either the load or the store could be reordered behind the fetch_sub.

With this, I no longer believe the Boost example to be correct:
void intrusive_ptr_release(const X * x)
  {
    if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
      boost::atomic_thread_fence(boost::memory_order_acquire);
      delete x;
    }
  }

Given my example above, an invalid reordering is still possible for Thread 2, assuming the compiler / processor does a transformation as such:
x->myInt.store(42, memory_order_relaxed); // This may no longer be re-ordered below the fetch_sub due to the release
if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
      if (x->myInt.load(memory_order_relaxed) == 0)
          std::cout << "Some statement" << std::endl;
      std::atomic_thread_fence(boost::memory_order_acquire);
      delete x;
    }
else {
      if (x->myInt.load(memory_order_relaxed) == 0) // This is still allowed to be reordered below the fetch_sub with the risk of x being deleted before this statement
          std::cout << "Some statement" << std::endl;
    }

What do you guys think? Is the Boost example indeed incorrect from a theoretical perspective?

(Update: Deleted previous post and corrected the code example)

Andrey Semashev

unread,
Oct 13, 2016, 4:53:37 AM10/13/16
to std-dis...@isocpp.org
On 10/13/16 10:20, ronaldho...@gmail.com wrote:
>
> With this, I no longer believe the Boost example to be correct:
> (http://www.boost.org/doc/libs/1_62_0_b2/doc/html/atomic/usage_examples.html
> <http://www.boost.org/doc/libs/1_62_0_b2/doc/html/atomic/usage_examples.html>)
> void intrusive_ptr_release(const X * x)
> {
> if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
> boost::atomic_thread_fence(boost::memory_order_acquire);
> delete x;
> }
> }
>
> Given my example above, an invalid reordering is still possible for
> Thread 2, assuming the compiler / processor does a transformation as such:
> x->myInt.store(42, memory_order_relaxed); // This may no longer be
> re-ordered below the fetch_sub due to the release
> if (x->refcount_.fetch_sub(1, memory_order_release) == 1) {
> if (x->myInt.load(memory_order_relaxed) == 0)
> std::cout << "Some statement" << std::endl;
> std::atomic_thread_fence(boost::memory_order_acquire);
> delete x;
> }
> else {
> if (x->myInt.load(memory_order_relaxed) == 0) // This is still
> allowed to be reordered below the fetch_sub with the risk of x being
> deleted before this statement
> std::cout << "Some statement" << std::endl;
> }
>
> What do you guys think? Is the Boost example indeed incorrect from a
> theoretical perspective?

The above reordering is not possibe because your myInt.load and cout
part cannot be reordered past the release-ordered fetch_sub.

ronaldho...@gmail.com

unread,
Oct 13, 2016, 6:59:58 AM10/13/16
to ISO C++ Standard - Discussion

The above reordering is not possibe because your myInt.load and cout
part cannot be reordered past the release-ordered fetch_sub.


Why would this be for the myInt.load? It does not carry a dependency with the refcount itself and only write operations may not be ordered beyond a release? Should the fetch_sub not have acquire (or acquire/release) for reads not to be allowed reordered past it?

Andrey Semashev

unread,
Oct 13, 2016, 7:08:18 AM10/13/16
to std-dis...@isocpp.org
On 10/13/16 13:59, ronaldho...@gmail.com wrote:
>
> The above reordering is not possibe because your myInt.load and cout
> part cannot be reordered past the release-ordered fetch_sub.
>
> Why would this be for the myInt.load?

Release fences prevent prior memory accesses from being reordered after
the fence. That includes all memory accesses, reads and writes, even
non-atomic ones.

Peter Dimov

unread,
Oct 13, 2016, 7:34:49 AM10/13/16
to ronaldho...@gmail.com, ISO C++ Standard - Discussion, thi...@macieira.org, tvan...@gmail.com
ronaldholthuizen wrote:
> With this, I no longer believe the Boost example to be correct:
> (http://www.boost.org/doc/libs/1_62_0_b2/doc/html/atomic/usage_examples.html)
> void intrusive_ptr_release(const X * x)
> {
> if (x->refcount_.fetch_sub(1, boost::memory_order_release) == 1) {
> boost::atomic_thread_fence(boost::memory_order_acquire);
> delete x;
> }
> }

It's correct. Note that the fetch_sub uses memory_order_RELEASE, not
_relaxed. What this means is that all decrements have release semantics,
except the last one, which has effectively acq_rel semantics because of the
additional acquire fence.

So the last decrement synchronizes-with all previous decrements.

This is, in fact, one of the motivating examples for atomic_thread_fence.

ronaldho...@gmail.com

unread,
Oct 14, 2016, 2:54:58 AM10/14/16
to ISO C++ Standard - Discussion, ronaldho...@gmail.com, thi...@macieira.org, tvan...@gmail.com, pdi...@mmltd.net
@ Andrey:
> Release fences prevent prior memory accesses from being reordered after the fence. That includes all memory accesses, reads and writes, even non-atomic ones.

Are you sure this statement is correct? If I look at cppreference.com, this explanation is given: (http://en.cppreference.com/w/cpp/atomic/memory_order)
A store operation with this memory order performs the release operation: no writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below).

These statements contradict each other.

@ Peter
I understand you reasoning, but in the example I gave, the issue is that read statements of the shared_ptr content of the thread not deleting the object may be reordered below the fetch_sub. Since this thread only has release semantics, this is valid, therefore making the code technically incorrect?

Out of interest I looked up the actual ref_count implementation used in Boost shared_ptr, and there the fetch_sub is used with acquire_release semantics, making their real code more strict than the example on their website.

Peter Dimov

unread,
Oct 14, 2016, 8:16:04 AM10/14/16
to ronaldho...@gmail.com, ISO C++ Standard - Discussion, ronaldho...@gmail.com, thi...@macieira.org, tvan...@gmail.com
ronaldholthuizen wrote:
> @ Andrey:
> > Release fences prevent prior memory accesses from being reordered after
> > the fence. That includes all memory accesses, reads and writes, even
> > non-atomic ones.
>
> Are you sure this statement is correct?

It's correct. Nothing gets reordered after a release (and before an
acquire.) Reordering is possible in the other direction.

"Release" is like releasing a mutex, and "acquire" is like acquiring one. So
if you have

acquire mutex;
r1 = x;
release mutex;

in thread 1, then

acquire mutex;
x = 5;
release mutex;

in thread 2, it's not allowed for the r1 = x to be reordered after the
release and for x = 5 to be reordered before the acquire. If it were
allowed, mutexes would be useless.

Andrey Semashev

unread,
Oct 14, 2016, 11:28:46 AM10/14/16
to std-dis...@isocpp.org
On 10/14/16 09:54, ronaldho...@gmail.com wrote:
> @ Andrey:
>> Release fences prevent prior memory accesses from being reordered
> after the fence. That includes all memory accesses, reads and writes,
> even non-atomic ones.
>
> Are you sure this statement is correct? If I look at cppreference.com,
> this explanation is given:
> (http://en.cppreference.com/w/cpp/atomic/memory_order)
> A store operation with this memory order performs the /release
> operation/: no writes in the current thread can be reordered after this
> store. All writes in the current thread are visible in other threads
> that acquire the same atomic variable (see _Release-Acquire ordering_
> <http://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering>
> below) and writes that carry a dependency into the atomic variable
> become visible in other threads that consume the same atomic (see
> _Release-Consume ordering_
> <http://en.cppreference.com/w/cpp/atomic/memory_order#Release-Consume_ordering>
> below).
>
> These statements contradict each other.

I'm sure that my statement was correct. As Peter suggested, otherwise
mutexes would not be able to work. See [intro.races]/3, in particular
this sentence:

Informally, performing a release operation on A forces prior side
effects on *other memory locations* to become visible to other threads
that later perform a consume or an acquire operation on A.

(Emphasis added)

The quoted text from cppreference is also correct, although slightly
incomplete as it only talks about writes wrt. release fences and loads
wrt. acquire fences. In fact both release and acquire fences prevent
reordering loads and stores in the corresponding direction.

ronaldho...@gmail.com

unread,
Oct 14, 2016, 1:45:52 PM10/14/16
to ISO C++ Standard - Discussion
Hey,

I checked Herb Mutters presentation in detail again, and indeed you guys are right!

This also means that I finally fully understand the rest of the statements that you were making. The world makes sense again.

It indeed appears that the cppreference website has incomplete statements. I will follow-up on this to try and correct it, this is confusing to a lot of people.

Thanks again, this was a great and insightfull discussion for me.

Best regards,

Ronald

Sergey Zubkov

unread,
Oct 14, 2016, 2:43:17 PM10/14/16
to ISO C++ Standard - Discussion, ronaldho...@gmail.com, thi...@macieira.org, tvan...@gmail.com, pdi...@mmltd.net


On Friday, October 14, 2016 at 8:16:04 AM UTC-4, Peter Dimov wrote:
Nothing gets reordered after a release (and before an acquire.) Reordering is possible in the other direction.

"Release" is like releasing a mutex, and "acquire" is like acquiring one. So if you have

acquire mutex;
r1 = x;
release mutex;

in thread 1, then

acquire mutex;
x = 5;
release mutex;

in thread 2, it's not allowed for the r1 = x to be reordered after the
release and for x = 5 to be reordered before the acquire. If it were
allowed, mutexes would be useless.


If that is true, this example does not demonstrate it. What observable behavior changes when x=5 (store) is reordered before the acquire in T2 and r1=x (load) is reordered after the release in T1? 



Sergey Zubkov

unread,
Oct 14, 2016, 3:13:39 PM10/14/16
to ISO C++ Standard - Discussion, ronaldho...@gmail.com, thi...@macieira.org, tvan...@gmail.com, pdi...@mmltd.net

in thread 2, it's not allowed for the r1 = x to be reordered after the
release and for x = 5 to be reordered before the acquire. If it were
allowed, mutexes would be useless.


If that is true, this example does not demonstrate it. What observable behavior changes when x=5 (store) is reordered before the acquire in T2 and r1=x (load) is reordered after the release in T1? 



nevermind, I see it would introduce a data race on the non-atomic x.. wonder how it follows from 1.10, though.

Peter Dimov

unread,
Oct 14, 2016, 3:25:29 PM10/14/16
to Sergey Zubkov, ISO C++ Standard - Discussion, ronaldho...@gmail.com, thi...@macieira.org, tvan...@gmail.com
Sergey Zubkov wrote:

> nevermind, I see it would introduce a data race on the non-atomic x..
> wonder how it follows from 1.10, though.

r1 = x in thread 1 is sequenced before "release mutex". "release mutex" in
thread 1 synchronizes-with "acquire mutex" in thread 2. "acquire mutex" in
thread 2 is sequenced before "x = 5". Therefore, r1 = x happens before x =
5.

Sergey Zubkov

unread,
Oct 14, 2016, 3:39:41 PM10/14/16
to ISO C++ Standard - Discussion, cubb...@gmail.com, ronaldho...@gmail.com, thi...@macieira.org, tvan...@gmail.com, pdi...@mmltd.net


On Friday, October 14, 2016 at 3:25:29 PM UTC-4, Peter Dimov wrote:

r1 = x in thread 1 is sequenced before "release mutex". "release mutex" in
thread 1 synchronizes-with "acquire mutex" in thread 2. "acquire mutex" in
thread 2 is sequenced before "x = 5". Therefore, r1 = x happens before x =
5.

Indeed. Thanks. Will try to make it clearer in other parts of that cppreference page as well (it's my fault it focused on visibility of side effects too much).
Reply all
Reply to author
Forward
0 new messages