Ordering guarantee for CAS operation

174 views
Skip to first unread message

Xinwei (Mason) Fu

unread,
Nov 5, 2021, 3:04:06 PM11/5/21
to pmem
Hi, all!

I have one question about the ordering guarantee for CAS operation.

Currently, I am writing the code like below: 

// suppose a and b are allocated in the PM.
std::atomic<...> a;
std::atomie<...> b;
......
a.compare_exchange_strong(old_a, new_a);
FLUSH(&a);
FENCE();
b.compare_exchange_strong(old_b, new_b);
FLUSH(&b);
FENCE();

I wanna guarantee the write to "a" is persisted before "b" is changed.

Does the "b.compare_exchange_strong(old_b, new_b);" instruction itself contain a FENCE() to guarantee the persistence ordering?

In other words, if I write the code like below: 

a.compare_exchange_strong(old_a, new_a);
FLUSH(&a);
// I assume b.CAS can enforce a fence before writing, so I delete the FENCE here.
b.compare_exchange_strong(old_b, new_b);
FLUSH(&b);
FENCE();

Will the new code still guarantee that the write to "a" is persisted before "b" is changed?

Thank you,
-Xinwei

--
Xinwei (Mason) Fu
Ph.D. candidate, Computer Science, 2016-?
Virginia Tech

ppbb...@gmail.com

unread,
Nov 8, 2021, 7:18:57 AM11/8/21
to pmem
Hi,

Purely from the language perspective - no, I don't believe you can assume that omitting a fence in this code would be safe. I'm not a C++ expert but I assume that the C++11 atomic semantics are defined with respect to the atomic operations on the atomic objects. AFAIK The C++ memory model does not define flushing.

Now, in practice, using atomic *stores* with sequentially consistent ordering will likely result in an mfence on x86. But you are asking about a sequence of compare-and-swap operations - which will compile down to two `lock cmpxchg` in a row. Intel Software Developer’s Manuals (SDM) state that "Locked operations are atomic with respect to all other memory operations and all externally visible events." (8.1.2.2) and "Locked instructions have a total order." (8.2.2). I recommend reading Chapter 8 from the SDM if you want to know the details. But still - for C++ atomics this relies on compiler-defined behavior.

Piotr

Jan K

unread,
Nov 8, 2021, 9:32:07 AM11/8/21
to ppbb...@gmail.com, pmem
> I'm not a C++ expert but I assume that the C++11 atomic semantics are defined with respect to the atomic operations on the atomic objects.

C++ defines memory ordering among both atomic and ordinary
load/stores/read-modify-writes. Upon each operation on atomics the
programmer selects whether ordinary loads or stores can be reordered
with this operation. Operations on atomics are ordered among
themselves regardless of requested memory ordering.
(And there's a atomic_thread_fence if you want to order loads/stores
without any synchronizing operation.)
Have a look here: https://en.cppreference.com/w/cpp/atomic/memory_order

> But still - for C++ atomics this relies on compiler-defined behavior.

Is required from a C++11-compliant compiler to respect requested
memory ordering, both when optimising code and producing machine code.

> AFAIK The C++ memory model does not define flushing.

I also don't think C++ defines "flushing" – but why would a programmer
care about it? C++ restricts what one thread can see when another
thread issues several operations, abstracting from where the data
sits.
C++ standards so far are not compatible with pmem – I believe that the
main problem is that reading a data item without writing it beforehand
is considered "undefined behaviour", and that happens when you read
from pmem.

Regards,
Jan
> --
> You received this message because you are subscribed to the Google Groups
> "pmem" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pmem+uns...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pmem/ba7ad28d-bc48-4c76-8116-39254ec4ce23n%40googlegroups.com.
>

ppbb...@gmail.com

unread,
Nov 8, 2021, 10:22:07 AM11/8/21
to pmem
Hi Jan,

poniedziałek, 8 listopada 2021 o 15:32:07 UTC+1 Jan K napisał(a):
> I'm not a C++ expert but I assume that the C++11 atomic semantics are defined with respect to the atomic operations on the atomic objects.

C++ defines memory ordering among both atomic and ordinary
load/stores/read-modify-writes. Upon each operation on atomics the
programmer selects whether ordinary loads or stores can be reordered
with this operation. Operations on atomics are ordered among
themselves regardless of requested memory ordering.
(And there's a atomic_thread_fence if you want to order loads/stores
without any synchronizing operation.)
Have a look here: https://en.cppreference.com/w/cpp/atomic/memory_order

I didn't mean to imply that the memory ordering rules are only relevant for atomic variables.
What I meant is that "flush" is not an atomic operation as defined by C++. Here's a quote from https://www.cplusplus.com/reference/atomic/memory_order/
"All atomic operations produce well-defined behavior with respect to an atomic object when multiple threads access it"
This means, that, technically speaking, there can exist an implementation for some platform where the atomic (visibility) semantics have nothing to do with persistence semantics and, I think, it would be standard-compliant (again, not a C++ expert).



> But still - for C++ atomics this relies on compiler-defined behavior.

Is required from a C++11-compliant compiler to respect requested
memory ordering, both when optimising code and producing machine code.

> AFAIK The C++ memory model does not define flushing.

I also don't think C++ defines "flushing" – but why would a programmer
care about it? C++ restricts what one thread can see when another
thread issues several operations, abstracting from where the data
sits.
C++ standards so far are not compatible with pmem – I believe that the
main problem is that reading a data item without writing it beforehand
is considered "undefined behaviour", and that happens when you read
from pmem.

Typically, I wouldn't expect programmers to care about this. But, for this question, I think this is important because it asks whether or not a specific C++ function `compare_exchange_strong` has a certain specific side-effect on the memory subsystem.
And, I believe, the answer is that this is compiler-defined for x86. Maybe I'm being pedantic, but I think it's an important distinction.

Xinwei (Mason) Fu

unread,
Nov 10, 2021, 3:16:51 PM11/10/21
to pmem
Hi, Piotr and Jan!

Thank you so much for your answers!

I have a follow-up question. 

Let's get rid of the high C++11 or LLVM level.
If we have a code snippet in x86_64 assembly level like this:

Store(X)
Flush(X)
(no fence here)
CAS(Y)

My understanding is that the CAS(Y) may guarantee the completion of Store(X), i.e., Store(X) is performed, drained from the store buffer, merged into the cache.
My question is:
can the CAS(Y) guarantee the completion of asynchronous flush operation FLUSH(X)?

Thanks,
-Xinwei


Wu, Dennis

unread,
Nov 10, 2021, 8:00:52 PM11/10/21
to Xinwei (Mason) Fu, pmem

Looks like the CAS is independent with the previous instructions, check the CAS description, it just keep the CAS operation as atomic instead assure the previous store finished. I think fense() is still needed.

ppbb...@gmail.com

unread,
Nov 15, 2021, 6:04:01 AM11/15/21
to pmem
Hi,

Like I said earlier, official docs state that locked operations observe total order. Compare-and-swap is such an operation when used with the lock prefix (which is pretty much all the time).

Piotr
Reply all
Reply to author
Forward
0 new messages