Re: TSan, fences and CAS failure ordering

Dmitry Vyukov

unread,

Mar 24, 2016, 10:10:57 AM3/24/16

to Kuba Brecka, Anna Zaks, Devin Coughlin, thread-s...@googlegroups.com

+thread-sanitizer mailing list

On Thu, Mar 24, 2016 at 2:50 PM, Kuba Brecka <jbr...@apple.com> wrote:
> Hi Dmitry!
>
> I’ve just seen some false positives from TSan due to missing support of some
> C++11 atomics, namely fences and atomic_compare_exchange’s “failure memory
> ordering”. In compiler-rt, there is a FIXME for fences:
>
> static void AtomicFence(ThreadState *thr, uptr pc, morder mo) {
> // FIXME(dvyukov): not implemented.
> __sync_synchronize();
> }
>
> and a comment about fmo in CAS:
>
> static bool AtomicCAS(ThreadState *thr, uptr pc,
> volatile T *a, T *c, T v, morder mo, morder fmo) {
> (void)fmo; // Unused because llvm does not pass it yet.
> ... // rest of code doesn’t use ‘fmo’ at all.
> }
>
> Are these two things “just not implemented yet” or is there something hard
> about adding support for these? Are fences and CAS+fmo used so rarely that
> you don’t see false positives due to them? Why isn’t the fmo parameter
> correctly passed to AtomicCAS()?

Hi Kuba,

CAS failure order should not lead to false positives, because we use
success order which must be not weaker than failure order.
The comment looks like it is from ancient times when tsan was not part
of llvm repository. So I guess we now can use fmo.
I think we can support fmo if we do vector clock operations after the
CAS itself when we know the result. This can remove some false
negatives.

Standalone fences are more tricky. Any implementation that I can think
of increases cost of relaxed atomic operations, because the fences
turn them into synchronization operations. In KernelThreadSanitizer we
have to support them, and we do some dirty hacks to reduce the
additional cost (see KT_TAME_COUNTER_LIMIT in
https://github.com/google/ktsan/blob/tsan/mm/ktsan/sync_atomic.c). I
know how reduce cost of release fence to zero (again assuming somewhat
imprecise modelling). But I don't know how to get rid of the acquire
fence cost.

Two our major code bases virtually don't use standalone fences. So we
don't see any false positives due to fences. Do you see the false
positive on a real app? I guess we will need to support them sooner or
later. We can always turn them off with a flag in our setup.

Kuba Brecka

unread,

Mar 24, 2016, 10:20:27 AM3/24/16

to Dmitry Vyukov, Anna Zaks, Devin Coughlin, thread-s...@googlegroups.com

On 24 Mar 2016, at 15:10, Dmitry Vyukov <dvy...@google.com> wrote:

+thread-sanitizer mailing list

On Thu, Mar 24, 2016 at 2:50 PM, Kuba Brecka <jbr...@apple.com> wrote:
Hi Dmitry!

I’ve just seen some false positives from TSan due to missing support of some
C++11 atomics, namely fences and atomic_compare_exchange’s “failure memory
ordering”. In compiler-rt, there is a FIXME for fences:

static void AtomicFence(ThreadState *thr, uptr pc, morder mo) {
// FIXME(dvyukov): not implemented.
__sync_synchronize();
}

and a comment about fmo in CAS:

static bool AtomicCAS(ThreadState *thr, uptr pc,
volatile T *a, T *c, T v, morder mo, morder fmo) {
(void)fmo; // Unused because llvm does not pass it yet.
... // rest of code doesn’t use ‘fmo’ at all.
}

Are these two things “just not implemented yet” or is there something hard
about adding support for these? Are fences and CAS+fmo used so rarely that
you don’t see false positives due to them? Why isn’t the fmo parameter
correctly passed to AtomicCAS()?

Hi Kuba,

CAS failure order should not lead to false positives, because we use
success order which must be not weaker than failure order.

I’m seeing the false positive in:

std::atomic_compare_exchange_strong_explicit(&ptr, &expected, value, std::memory_order_release, std::memory_order_acquire);

Here, on failure, the success ordering doesn’t give me the “acquire” I need. Or is this not a valid use of atomic_compare_exchange_strong_explicit?

The comment looks like it is from ancient times when tsan was not part
of llvm repository. So I guess we now can use fmo.

By looking at the IR generated by the above call, it seems to me that the fail order is lost somewhere:

%1 = cmpxchg i64* ... release monotonic

Why is the fail order “monotonic” here?

Kuba

Dmitry Vyukov

unread,

Mar 24, 2016, 10:23:41 AM3/24/16

to Kuba Brecka, Anna Zaks, Devin Coughlin, thread-s...@googlegroups.com

Yeah, it's invalid code on your side. Failure order can't be stronger
than success order.
You need (std::memory_order_acq_rel, std::memory_order_acquire).

Kuba Brecka

unread,

Mar 24, 2016, 10:35:18 AM3/24/16

to Dmitry Vyukov, Anna Zaks, Devin Coughlin, thread-s...@googlegroups.com

> On 24 Mar 2016, at 15:23, Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Thu, Mar 24, 2016 at 3:20 PM, Kuba Brecka <jbr...@apple.com> wrote:
>>
>> std::atomic_compare_exchange_strong_explicit(&ptr, &expected, value,
>> std::memory_order_release, std::memory_order_acquire);
>
>

> Yeah, it's invalid code on your side. Failure order can't be stronger
> than success order.
> You need (std::memory_order_acq_rel, std::memory_order_acquire).

Aah, I see.

So memory_order_acquire is considered *stronger* than memory_order_release? And memory_order_release is *weaker* than memory_order_acquire? I always considered these two to be incomparable...

Kuba

Dmitry Vyukov

unread,

Mar 24, 2016, 10:50:19 AM3/24/16

to Kuba Brecka, Anna Zaks, Devin Coughlin, thread-s...@googlegroups.com

20.7.2.5/32
Requires: failure shall not be memory_order_release,
memory_order_acq_rel, or stronger than success.

Kuba Brecka

unread,

Mar 24, 2016, 12:41:20 PM3/24/16

to Dmitry Vyukov, Anna Zaks, Devin Coughlin, thread-s...@googlegroups.com

> On 24 Mar 2016, at 15:49, Dmitry Vyukov <dvy...@google.com> wrote:
>
> 20.7.2.5/32
> Requires: failure shall not be memory_order_release,
> memory_order_acq_rel, or stronger than success.

Sure. I’m just curious about the definition of “stronger” here.

Kuba

rjmc...@gmail.com

unread,

Mar 24, 2016, 7:03:38 PM3/24/16

to thread-sanitizer, dvy...@google.com, ga...@apple.com, dcou...@apple.com, jbr...@apple.com

Chandler, JF Bastien, Tim Northover, and I discussed this on LLVM IRC, and the result was:

1. The standard doesn't clearly specify what "stronger" means. However, we all agreed that the sensible interpretation is that memory orderings are a lattice, not a totally ordered set, and the relationships are:

relaxed < (release | (consume < acquire)) < acq_rel < seq_cst

2. Given that two orderings aren't necessarily comparable, it isn't clear whether the requirement is (failure <= success) or !(success < failure).

However, it is extremely likely that this will be resolved as merely a deficiency in the current wording. In particular, it is clear that there are architectures where some incomparable pairs can be compiled much more efficiently than the nearest comparable pair; most notably, on AArch64 (release, consume) can be much more efficient than (acq_rel, consume). So I would strongly encourage you to handle these cases correctly.

John.

Dmitry Vyukov

unread,

Mar 25, 2016, 6:42:39 AM3/25/16

to rjmc...@gmail.com, thread-sanitizer, Anna Zaks, Devin Coughlin, Kuba Brecka

On Thu, Mar 24, 2016 at 11:05 PM, <rjmc...@gmail.com> wrote:
> On Thursday, March 24, 2016 at 9:41:20 AM UTC-7, Kuba Brecka wrote:
>>
>>
>> > On 24 Mar 2016, at 15:49, Dmitry Vyukov <dvy...@google.com> wrote:
>> >
>> > 20.7.2.5/32
>> > Requires: failure shall not be memory_order_release,
>> > memory_order_acq_rel, or stronger than success.
>>
>> Sure. I’m just curious about the definition of “stronger” here.
>
>
> Chandler, JF Bastien, Tim Northover, and I discussed this on LLVM IRC, and
> the result was:
>
> 1. The standard doesn't clearly specify what "stronger" means. However, we
> all agreed that the sensible interpretation is that memory orderings are a
> lattice, not a totally ordered set, and the relationships are:
>
> relaxed < (release | (consume < acquire)) < acq_rel < seq_cst
>
> 2. Given that two orderings aren't necessarily comparable, it isn't clear
> whether the requirement is (failure <= success) or !(success < failure).

20.7.2.5/32 explicitly rules out memory_order_release and
memory_order_acq_rel, so I don't see any ambiguity.

> However, it is extremely likely that this will be resolved as merely a
> deficiency in the current wording. In particular, it is clear that there
> are architectures where some incomparable pairs can be compiled much more
> efficiently than the nearest comparable pair; most notably, on AArch64
> (release, consume) can be much more efficient than (acq_rel, consume). So I
> would strongly encourage you to handle these cases correctly.

This is not necessary possible to implement portably. Consider
Itanium-style atomic operations, where you supply ordering constraint
to the instruction itself (st.rel, ld.acq). Unless you pass both
failure and success memory orders to the instruction, the instruction
won't be able to omit acquire part in success case. And on compiler
level you obviously don't know whether CAS will fail or not, so you
have to promote (release, acquire) to (acq_rel, acquire) anyway.

It seems to me that what you want is memory_order_con_rel. Why not? It
is useful in other cases (e.g. when you exchange a pointer with
another pointer). (the fact that no existing compiler supports consume
aside)

Kuba Brecka

unread,

Mar 25, 2016, 8:48:37 AM3/25/16

to rjmc...@gmail.com, thread-sanitizer, Dmitry Vyukov, Anna Zaks, dcou...@apple.com

The combination (release, acquire) and (release, consume) seem to currently get weakened by LLVM to (release, relaxed):

void foo() {

std::atomic_compare_exchange_strong_explicit(&ptr, &expected, 0l, std::memory_order_release, std::memory_order_acquire);

}

$ clang++ -std=c++11 a.cpp -S -o - -emit-llvm -O1

...

%2 = cmpxchg i64* getelementptr inbounds (%"struct.std::__1::atomic", %"struct.std::__1::atomic"* @ptr, i64 0, i32 0, i32 0, i32 0), i64 %1, i64 0 release monotonic

...

So it’s not just TSan not supporting this, but also LLVM.

BTW, Clang should warn about invalid combinations.

Kuba

Dmitry Vyukov

unread,

Mar 25, 2016, 8:50:47 AM3/25/16

to Kuba Brecka, rjmc...@gmail.com, thread-sanitizer, Anna Zaks, Devin Coughlin

On Fri, Mar 25, 2016 at 1:48 PM, Kuba Brecka <jbr...@apple.com> wrote:
> The combination (release, acquire) and (release, consume) seem to currently
> get weakened by LLVM to (release, relaxed):
>
> void foo() {
> std::atomic_compare_exchange_strong_explicit(&ptr, &expected, 0l,
> std::memory_order_release, std::memory_order_acquire);
> }
>
> $ clang++ -std=c++11 a.cpp -S -o - -emit-llvm -O1
> ...
> %2 = cmpxchg i64* getelementptr inbounds (%"struct.std::__1::atomic",
> %"struct.std::__1::atomic"* @ptr, i64 0, i32 0, i32 0, i32 0), i64 %1, i64 0
> release monotonic
> ...
>
> So it’s not just TSan not supporting this, but also LLVM.
>
> BTW, Clang should warn about invalid combinations.

Agree.

John McCall

unread,

Mar 25, 2016, 11:40:52 AM3/25/16

to Dmitry Vyukov, thread-sanitizer, Anna Zaks, Devin Coughlin, Kuba Brecka

On Friday, March 25, 2016, Dmitry Vyukov <dvy...@google.com> wrote:

On Thu, Mar 24, 2016 at 11:05 PM, <rjmc...@gmail.com> wrote:
> On Thursday, March 24, 2016 at 9:41:20 AM UTC-7, Kuba Brecka wrote:
>>
>>
>> > On 24 Mar 2016, at 15:49, Dmitry Vyukov <dvy...@google.com> wrote:
>> >
>> > 20.7.2.5/32
>> > Requires: failure shall not be memory_order_release,
>> > memory_order_acq_rel, or stronger than success.
>>
>> Sure. I’m just curious about the definition of “stronger” here.
>
>
> Chandler, JF Bastien, Tim Northover, and I discussed this on LLVM IRC, and
> the result was:
>
> 1. The standard doesn't clearly specify what "stronger" means. However, we
> all agreed that the sensible interpretation is that memory orderings are a
> lattice, not a totally ordered set, and the relationships are:
>
> relaxed < (release | (consume < acquire)) < acq_rel < seq_cst
>
> 2. Given that two orderings aren't necessarily comparable, it isn't clear
> whether the requirement is (failure <= success) or !(success < failure).

20.7.2.5/32 explicitly rules out memory_order_release and
memory_order_acq_rel, so I don't see any ambiguity.

It rules them out on the failure ordering, not success. Trust me, the committee sees ambiguity here.

> However, it is extremely likely that this will be resolved as merely a
> deficiency in the current wording. In particular, it is clear that there
> are architectures where some incomparable pairs can be compiled much more
> efficiently than the nearest comparable pair; most notably, on AArch64
> (release, consume) can be much more efficient than (acq_rel, consume). So I
> would strongly encourage you to handle these cases correctly.

This is not necessary possible to implement portably. Consider
Itanium-style atomic operations, where you supply ordering constraint
to the instruction itself (st.rel, ld.acq). Unless you pass both
failure and success memory orders to the instruction, the instruction
won't be able to omit acquire part in success case.

Of course it's possible to implement portably. Most architectures are not able to take optimal advantage of memory orderings and have to fall back on something stronger in at least some case. Itanium, for example, has to implement consume with acquire because IIRC it doesn't have a dependency-ordering rule. That doesn't mean that consume is useless or unimplementable. (Well, other than the dependency rule as standardized being unimplementable.)

And on compiler

level you obviously don't know whether CAS will fail or not, so you
have to promote (release, acquire) to (acq_rel, acquire) anyway.

The compiler is actually quite capable of reasoning about whether the CAS failed or not, since there's presumably a branch on the result (and it could introduce a new one if it really wanted to).

It seems to me that what you want is memory_order_con_rel. Why not? It
is useful in other cases (e.g. when you exchange a pointer with
another pointer). (the fact that no existing compiler supports consume
aside)

I agree that con_rel would be useful for some of the other operations like unconditional exchange. It is nonetheless stronger than I require.

John.

--
I suppose you'd like my real thoughts as well.

Dmitry Vyukov

unread,

Mar 25, 2016, 11:57:09 AM3/25/16

to John McCall, thread-sanitizer, Anna Zaks, Devin Coughlin, Kuba Brecka

Why (con_rel, consume) is stronger than (release, consume)?
Compiler will have to do consume part regardless of success/failure.
And on AArch64 level consume is no-op (provided that compiler did it
part).

John McCall

unread,

Mar 25, 2016, 12:26:43 PM3/25/16

to Dmitry Vyukov, thread-sanitizer, Anna Zaks, Devin Coughlin, Kuba Brecka

For consume, you are correct that it doesn't matter, because if the exchange succeeds the loaded value is dropped (it is not written back to the expected slot) and so it cannot have any dependent uses. This is a rule that can be applied even on an architecture or compiler that doesn't otherwise implement consume.

For acquire, it creates an additional point of synchronization. This potentially affects code generation — you could otherwise implement the compare-exchange as an ordinary load, a comparison, and an acquire barrier only in the failure case, which might be profitable depending on hardware or if the exchange is overwhelmingly likely to succeed — but also causes significant problems for the optimizer, which now must reload global memory even along the success path.

John.

Dmitry Vyukov

unread,

Mar 25, 2016, 12:31:44 PM3/25/16

to John McCall, thread-sanitizer, Anna Zaks, Devin Coughlin, Kuba Brecka

OK, I see this is for split LL/SC implementation.
But we still need to wait for the standard committee to legalize this
and for LLVM support.

John McCall

unread,

Mar 25, 2016, 12:35:29 PM3/25/16

to Dmitry Vyukov, thread-sanitizer, Anna Zaks, Devin Coughlin, Kuba Brecka

Yeah, an implementation with only a combined cmpxchg instruction has to

strengthen to the memory orderings directly supported by the instruction.

But we still need to wait for the standard committee to legalize this
and for LLVM support.

Fair enough. I'll forward you any early indications of this conversation.

John.

j...@chromium.org

unread,

May 23, 2018, 12:12:52 PM5/23/18

to thread-sanitizer

Hi all,

John brought this up with me a while ago, which led me to write the following paper:

https://wg21.link/p0418r2

It was accepted into C++17.

The requirement is now only:

Requires: The failure argument shall not be memory_order_release nor memory_order_acq_rel.

This clause is now gone:

The failure argument shall be no stronger than the success argument.

Do you think tsan could be updated to reflect this change?

Thanks,

JF

Kuba Mracek

unread,

May 23, 2018, 12:25:23 PM5/23/18

to thread-s...@googlegroups.com

Hi JF!

If what I posted in 2016 is still the case...

std::atomic_compare_exchange_strong_explicit(&ptr, &expected, value, std::memory_order_release, std::memory_order_acquire);

By looking at the IR generated by the above call, it seems to me that the fail order is lost somewhere:

%1 = cmpxchg i64* ... release monotonic

...then I believe there is no bug in TSan, but rather a bug in LLVM or Clang which mis-generates the IR for the above-mentioned pair of memory orderings in a CAS. Can you file a bug on LLVM?

Kuba

--
You received this message because you are subscribed to the Google Groups "thread-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John McCall

unread,

May 23, 2018, 2:34:20 PM5/23/18

to thread-sanitizer

On Wed, May 23, 2018 at 12:25 PM, Kuba Mracek <mra...@apple.com> wrote:

Hi JF!

If what I posted in 2016 is still the case...

std::atomic_compare_exchange_strong_explicit(&ptr, &expected, value, std::memory_order_release, std::memory_order_acquire);

By looking at the IR generated by the above call, it seems to me that the fail order is lost somewhere:

%1 = cmpxchg i64* ... release monotonic

...then I believe there is no bug in TSan, but rather a bug in LLVM or Clang which mis-generates the IR for the above-mentioned pair of memory orderings in a CAS. Can you file a bug on LLVM?

My memory is that TSan mis-handled IR that used a fail order that wasn't weaker than the success order. It's quite possible that it was necessary to generate that IR manually (e.g. in the Swift frontend) instead of using Clang's builtins, though.

John.

Kuba

To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitizer+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "thread-sanitizer" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/thread-sanitizer/bTOVyLAo8p4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to thread-sanitizer+unsubscribe@googlegroups.com.

Kuba Mracek

unread,

May 24, 2018, 2:58:26 PM5/24/18

to thread-sanitizer, John McCall, j...@chromium.org

You're right:

template<typename T>

static bool AtomicCAS(ThreadState *thr, uptr pc,
volatile T *a, T *c, T v, morder mo, morder fmo) {
(void)fmo; // Unused because llvm does not pass it yet.

...

Kuba

To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.

JF Bastien

unread,

May 24, 2018, 3:00:25 PM5/24/18

to Kuba Mracek, John McCall, thread-sanitizer

On Thu, May 24, 2018 at 11:58 AM Kuba Mracek <mra...@apple.com> wrote:

You're right:

template<typename T>
static bool AtomicCAS(ThreadState *thr, uptr pc,
volatile T *a, T *c, T v, morder mo, morder fmo) {
(void)fmo; // Unused because llvm does not pass it yet.

That comment is pretty out of date. IIRC failure order was added to LLVM IR around 2013.

...

Kuba

Kuba

To unsubscribe from this group and stop receiving emails from it, send an email to thread-sanitiz...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "thread-sanitizer" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/thread-sanitizer/bTOVyLAo8p4/unsubscribe.

To unsubscribe from this group and all its topics, send an email to thread-sanitiz...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dmitry Vyukov

unread,

May 26, 2018, 6:48:36 AM5/26/18

to thread-s...@googlegroups.com, Kuba Mracek, John McCall

Hi,

So what's the action item here? Properly support failure order in
tsan's implementation of compare_exchange?

John McCall

unread,

May 26, 2018, 4:26:40 PM5/26/18

to Dmitry Vyukov, thread-sanitizer, Kuba Mracek

On Sat, May 26, 2018 at 6:48 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Thu, May 24, 2018 at 9:00 PM, JF Bastien <j...@chromium.org> wrote:
> >
> > On Thu, May 24, 2018 at 11:58 AM Kuba Mracek <mra...@apple.com> wrote:
> >>
> >> You're right:
> >>
> >> template<typename T>
> >> static bool AtomicCAS(ThreadState *thr, uptr pc,
> >> volatile T *a, T *c, T v, morder mo, morder fmo) {
> >> (void)fmo; // Unused because llvm does not pass it yet.
> >
> >
> > That comment is pretty out of date. IIRC failure order was added to LLVM IR
> > around 2013.
>

> So what's the action item here? Properly support failure order in
> tsan's implementation of compare_exchange?

I think so. And clang should be fixed to honor failure orders properly in the builtin, which should just be a matter of doing a more complete switch in `emitAtomicCmpXchgFailureSet`.

John.

Dmitry Vyukov

unread,

Jun 5, 2018, 5:12:41 AM6/5/18

to John McCall, thread-sanitizer, Kuba Mracek

Re clang, there is a similar issue in bug tracker already:
https://bugs.llvm.org/show_bug.cgi?id=33332
Not sure if need a new one or not.

To clarify, now it's legal to do compare_exchange(..., relaxed,
seq_cst) and formally this is exactly this -- successful CAS is only
relaxed, while a failed CAS is seq_cst, is my understanding correct?

JF Bastien

unread,

Jun 5, 2018, 5:15:29 AM6/5/18

to thread-sanitizer, John McCall, Kuba Mracek

Yes, it's now legal and those would be the semantics. Implementations can of course strengthen their guarantees.

This almost feels like a defect in C++11 / C++14, and it seems like we could just loosen libc++ and clang's handling of cmpxchg memory order as C++17 does.

Dmitry Vyukov

unread,

Jun 6, 2018, 2:39:12 PM6/6/18

to thread-sanitizer, John McCall, Kuba Mracek

Filed https://github.com/google/sanitizers/issues/970 for this.

Thanks

John McCall

unread,

Jun 9, 2018, 10:19:15 PM6/9/18

to JF Bastien, thread-sanitizer, Kuba Mracek

On Tue, Jun 5, 2018 at 2:15 AM, JF Bastien <j...@chromium.org> wrote:

I agree that we should just treat this as a defect.

John.

Reply all

Reply to author

Forward