C++0x: memory_order_acq_rel vs memory_order_seq

Dmitriy V'jukov

unread,

May 1, 2008, 2:20:11 PM5/1/08

to

I am trying to figure out difference between memory_order_acq_rel and
memory_order_seq_cst memory order in C++0x.

As I understand memory_order_seq_cst is similar to volatile keyword in
Java, i.e.:

std::atomic_bool b1, b2;
std::atomic_store_explicit(&b1, true, std::memory_order_seq_cst);
// #StoreLoad is emitted by compiler
if (std::atomic_load(&b2, std::memory_order_seq_cst))
//...

Right?

But what is memory_order_acq_rel? And when I need it? Can you provide
some example?

Dmitriy V'jukov

Anthony Williams

unread,

May 1, 2008, 4:45:49 PM5/1/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> writes:

> I am trying to figure out difference between memory_order_acq_rel and
> memory_order_seq_cst memory order in C++0x.

seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
variables form a single total order for any given run of the application: it
is as-if the threads were reduced to a direct interleaving of instructions.

acq_rel provides *pairwise* ordering between two threads. It is possible for
everything to follow acq_rel semantics, and two threads see modifications to
two variables in different orders.

See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:

<http://www.justsoftwaresolutions.co.uk/threading/future-of-concurrency-accu-2008-slides.html>

> But what is memory_order_acq_rel? And when I need it? Can you provide
> some example?

Sequential Consistency can impose high synchronization costs on machines with
many CPUs due to the requirement for a single total order. The pairwise
synchronization of acq_rel can be a significant performance optimization in
those cases.

Anthony
--
Anthony Williams | Just Software Solutions Ltd
Custom Software Development | http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL

Dmitriy V'jukov

unread,

May 2, 2008, 6:49:19 AM5/2/08

to

On 2 май, 00:45, Anthony Williams <anthony_w....@yahoo.com> wrote:

> "Dmitriy V'jukov" <dvyu...@gmail.com> writes:
> > I am trying to figure out difference between memory_order_acq_rel and
> > memory_order_seq_cst memory order in C++0x.
>
> seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
> variables form a single total order for any given run of the application: it
> is as-if the threads were reduced to a direct interleaving of instructions.
>
> acq_rel provides *pairwise* ordering between two threads. It is possible for
> everything to follow acq_rel semantics, and two threads see modifications to
> two variables in different orders.
>
> See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:
>

> <http://www.justsoftwaresolutions.co.uk/threading/future-of-concurrenc...>

>
> > But what is memory_order_acq_rel? And when I need it? Can you provide
> > some example?
>
> Sequential Consistency can impose high synchronization costs on machines with
> many CPUs due to the requirement for a single total order. The pairwise
> synchronization of acq_rel can be a significant performance optimization in
> those cases.

Ok. Thank you. So memory_order_acq_rel is what usually called "full
fence" (mfence on x86). And memory_order_seq_cst also enforces total
order between *all* modifications in the system (locked instruction on
x86).

Hmmm... Now I am trying to figure out when memory_order_acq_rel is
insufficient, and one need memory_order_seq_cst. I can't imagine any
example straight off. But there must be some substantial reasons for
inclusion memory_order_seq_cst into standard. Can you provide some
example with memory_order_seq_cst?

Dmitriy V'jukov

Anthony Williams

unread,

May 2, 2008, 7:11:47 AM5/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> writes:

> On 2 май, 00:45, Anthony Williams <anthony_w....@yahoo.com> wrote:
>> "Dmitriy V'jukov" <dvyu...@gmail.com> writes:
>> > I am trying to figure out difference between memory_order_acq_rel and
>> > memory_order_seq_cst memory order in C++0x.
>>
>> seq_cst implies Sequential Consistency: *all* seq_cst accesses to *all*
>> variables form a single total order for any given run of the application: it
>> is as-if the threads were reduced to a direct interleaving of instructions.
>>
>> acq_rel provides *pairwise* ordering between two threads. It is possible for
>> everything to follow acq_rel semantics, and two threads see modifications to
>> two variables in different orders.
>>
>> See slides 8-12 in the presentation I did on C++0x threads at ACCU 2008:
>>
>> <http://www.justsoftwaresolutions.co.uk/threading/future-of-concurrenc...>
>>
>> > But what is memory_order_acq_rel? And when I need it? Can you provide
>> > some example?
>>
>> Sequential Consistency can impose high synchronization costs on machines with
>> many CPUs due to the requirement for a single total order. The pairwise
>> synchronization of acq_rel can be a significant performance optimization in
>> those cases.
>
>
> Ok. Thank you. So memory_order_acq_rel is what usually called "full
> fence" (mfence on x86). And memory_order_seq_cst also enforces total
> order between *all* modifications in the system (locked instruction on
> x86).

x86 is a bad architecture to use for examples, since there are too many
implicit fences, but I think you understand. I believe that the differences
are particularly apparent on architectures like PPC or alpha where
synchronization is explicit.

It is also worth noting that seq_cst only enforces a total order of seq_cst
operations: relaxed operations are still unordered, and a seq_cst operation is
treated as acq_rel by any other acquire/release/acq_rel operations.

> Hmmm... Now I am trying to figure out when memory_order_acq_rel is
> insufficient, and one need memory_order_seq_cst. I can't imagine any
> example straight off. But there must be some substantial reasons for
> inclusion memory_order_seq_cst into standard. Can you provide some
> example with memory_order_seq_cst?

seq_cst is included, and is the default, because it is easiest to reason
about. If you looked at my slides, many people will find slide 11 hard to deal
with: the two reader threads see different orders of events.

Atomic ops are hard anyway, but non-SC atomics are an order of magnitude
harder.

Dmitriy V'jukov

unread,

May 2, 2008, 7:31:36 AM5/2/08

to

On 2 май, 15:11, Anthony Williams <anthony_w....@yahoo.com> wrote:

> > Ok. Thank you. So memory_order_acq_rel is what usually called "full
> > fence" (mfence on x86). And memory_order_seq_cst also enforces total
> > order between *all* modifications in the system (locked instruction on
> > x86).
>
> x86 is a bad architecture to use for examples, since there are too many
> implicit fences, but I think you understand.

I hope :)

> It is also worth noting that seq_cst only enforces a total order of seq_cst
> operations: relaxed operations are still unordered, and a seq_cst operation is
> treated as acq_rel by any other acquire/release/acq_rel operations.
>
> > Hmmm... Now I am trying to figure out when memory_order_acq_rel is
> > insufficient, and one need memory_order_seq_cst. I can't imagine any
> > example straight off. But there must be some substantial reasons for
> > inclusion memory_order_seq_cst into standard. Can you provide some
> > example with memory_order_seq_cst?
>
> seq_cst is included, and is the default, because it is easiest to reason
> about. If you looked at my slides, many people will find slide 11 hard to deal
> with: the two reader threads see different orders of events.
>
> Atomic ops are hard anyway, but non-SC atomics are an order of magnitude
> harder.

I agree that CS is much easier to reason about because it's like
interleaving of all operations. But, in my opinion, it's... strange
reason for inclusion into standard... there must be at least some use
cases for CS...

I think that situation on slide 11 is quite... unrealistic. Well, not
situation is unrealistic. Unrealistic that mentioned output can break
some real algorithm. Or I'm not right? I'm trying to figure out some
use cases for CS...

So reasoning of WG21 is "Use seq_cst until you don't understand slide
11 of Anthony Williams' Presentation. Then forget about seq_cst and
use acq_rel" :)

Dmitriy V'jukov

unread,

May 2, 2008, 7:41:54 AM5/2/08

to

On 2 май, 15:31, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:

> I agree that CS is much easier to reason about because it's like
> interleaving of all operations. But, in my opinion, it's... strange
> reason for inclusion into standard... there must be at least some use
> cases for CS...
>
> I think that situation on slide 11 is quite... unrealistic. Well, not
> situation is unrealistic. Unrealistic that mentioned output can break
> some real algorithm. Or I'm not right? I'm trying to figure out some
> use cases for CS...

Put it this way. If you are implementing your own atomics library for
those who understand slide 11, will you include memory_order_seq_cst
or not? If yes, what is your reasoning?

Dmitriy V'jukov

Anthony Williams

unread,

May 2, 2008, 8:55:24 AM5/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> writes:

I think that providing a layer that is easier to reason about *is* a good use
case. It is also the same level of guarantee provided by the Java 1.5 memory
model in the absence of data races, if I understand correctly.

http://java.sun.com/docs/books/jls/third_edition/html/memory.html

> I think that situation on slide 11 is quite... unrealistic. Well, not
> situation is unrealistic. Unrealistic that mentioned output can break
> some real algorithm. Or I'm not right? I'm trying to figure out some
> use cases for CS...

Non-SC atomics can be very hard to use correctly. The members of the memory
model group (of which I'm only a peripheral member) have had long discussions
about the meaning of several examples. If the experts have to have long
discussions, we need a simpler layer.

> So reasoning of WG21 is "Use seq_cst until you don't understand slide
> 11 of Anthony Williams' Presentation. Then forget about seq_cst and
> use acq_rel" :)

;-)

That's the simplest example I could think of to demonstrate the point. The
consequences can be quite far-reaching, though, especially if you have other
threads in the mix. You're currently working on lock-free stuff, so you should
be well-aware of that.

Anthony Williams

unread,

May 2, 2008, 9:00:15 AM5/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> writes:

Yes.

Non-SC atomics are hard, even for experts. IIRC, at one of the memory model
group meetings, Paul McKenney said that he might use SC atomics for
prototyping code, and then selectively change some of them to acquire/release
semantics or even relaxed operations in order to optimise the code, once it
was working.

Dmitriy V'jukov

unread,

May 2, 2008, 9:21:27 AM5/2/08

to

On 2 май, 17:00, Anthony Williams <anthony_w....@yahoo.com> wrote:

> > Put it this way. If you are implementing your own atomics library for
> > those who understand slide 11, will you include memory_order_seq_cst
> > or not? If yes, what is your reasoning?
>
> Yes.
>
> Non-SC atomics are hard, even for experts. IIRC, at one of the memory model
> group meetings, Paul McKenney said that he might use SC atomics for
> prototyping code, and then selectively change some of them to acquire/release
> semantics or even relaxed operations in order to optimise the code, once it
> was working.

It's very good point. And good use case.

I think that inverse process can take place too. If user defines
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS macro then I will ignore all
memory order parameters and use memory_order_seq_cst instead.
So if some unit-test starts failing, then user can try to define
FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS first just to see whether it's
problem with fine-grained memory ordering or it's problem in algorithm
itself.
What do you think?

Dmitriy V'jukov

Anthony Williams

unread,

May 2, 2008, 9:30:15 AM5/2/08

to

"Dmitriy V'jukov" <dvy...@gmail.com> writes:

> I think that inverse process can take place too. If user defines
> FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS macro then I will ignore all
> memory order parameters and use memory_order_seq_cst instead.
> So if some unit-test starts failing, then user can try to define
> FORCE_SEQ_CST_FOR_ALL_ATOMIC_OPERATIONS first just to see whether it's
> problem with fine-grained memory ordering or it's problem in algorithm
> itself.
> What do you think?

That's a good idea. The standard doesn't require that the fine-grained
ordering actually be more fine-grained, so it would still be a compliant
library.

C++0x: memory_order_acq_rel vs memory_order_seq_cst

Dmitriy V'jukov

Anthony Williams

Dmitriy V'jukov

Anthony Williams

Dmitriy V'jukov

Dmitriy V'jukov

Anthony Williams

Anthony Williams

Dmitriy V'jukov

Anthony Williams