Hi Todd,
Here are two problems:
1. Tsan is bad at handling stand-alone memory barriers. I don't know
how to implement it without performance hit on _all_ relaxed atomic
operations. It can be a severe hit for the programs we are interested
in.
2. I don't think we want to support memory models and atomic
interfaces other than C/C++ atomics. Hopefully, C/C++ atomics is the
dominating interface for native synchronization. As for other
interfaces, some of them are [mostly] equivalent to C/C++ atomics, and
way too expressive (like
https://groups.google.com/d/msg/comp.programming.threads/PidOpfQUEb8/VRG25OSVgKAJ).
Correct way to express seqlock in C/C++ is:
// writer
atomic_store(&seq, seq+1, memory_order_relaxed);
atomic_thread_fence(memory_order_release);
atomic_store(&data[0], ..., memory_order_relaxed);
...
atomic_store(&data[N], ..., memory_order_relaxed);
atomic_thread_fence(memory_order_release);
atomic_store(&seq, seq+1, memory_order_relaxed);
// reader
atomic_load(&seq, memory_order_relaxed);
atomic_thread_fence(memory_order_acquire);
d0 = atomic_load(&data[0], memory_order_relaxed);
...
dN = atomic_load(&data[N], memory_order_relaxed);
atomic_thread_fence(memory_order_acquire);
atomic_load(&seq, memory_order_relaxed);
Unfortunately tsan won't understand this, as it does not understand
stand-alone memory fences at all.
If tsan supports them, then it will be support for these C/C++ atomics
rather than store-store/load-load fences.
And here is a way to express seqlock that is both correct, is
understood by tsan and is no overhead on x86:
// writer
atomic_store(&seq, seq+1, memory_order_relaxed);
atomic_store(&data[0], ..., memory_order_release);
...
atomic_store(&data[N], ..., memory_order_release);
atomic_store(&seq, seq+1, memory_order_release);
// reader
atomic_load(&seq, memory_order_acquire);
d0 = atomic_load(&data[0], memory_order_acquire);
...
dN = atomic_load(&data[N], memory_order_acquire);
atomic_load(&seq, memory_order_relaxed);