That's some good points.
It is /relatively/ easy tell if you have no data races in your code even
if you are not strict about using C++11/C11 atomics or
implementation-specific atomics. If your code does not access
potentially shared objects that are bigger than the hardware's write
size, and it only uses reads or writes to them (it doesn't assume that
"x += 1;" is atomic), then you are not going to get data races. When
two 32-bit cores both try to write to the same 32-bit memory address,
one core will hit first - you are not going to get a mixed write unless
you have a rather unusual and specialised system.
You might have noticed quite a few if's there - you have to be quite
restricted in your code to avoid the possibility of data races (as
defined by the C++ standard). And without additional guarantees from
your hardware, such as a total ordering on volatile accesses, it is not
going to be enough to be useful. If your system is relatively simple -
like a single core microcontroller - then you /do/ have such guarantees,
and "volatile" can be enough. For the OP, however, he doesn't have such
hardware - and "volatile" is not going to cut it.
Proving that you don't have any more general "race conditions" or
improper synchronisation is a lot harder.
>
> In any machine I have ever come across, using a volatile scalar which
> is atomic at the hardware level, such as an int, has identical effect
> to using the equivalent atomic variable with relaxed memory ordering.
> The code emitted by the compiler is identical.
>
Agreed.
I can imagine machines for which that is not the case - perhaps using
caches which do not snoop between other cpus' caches. This would make
some aspects of programming significantly harder, but could make
hardware a lot simpler (and therefore faster and/or cheaper). I'd
expect to see it only in quite specialised systems.
> If you need lock-free synchronization for your volatile ints, you can
> use fences just as you can use fences with atomic ints with relaxed
> memory ordering. (If you are using locks to synchronize, volatile
> or atomic variables are unnecessary and a pessimization - hence the
> "happens before" in the text I have quoted.) So Bonita's use may be
> fine.
>
> The question is: now that we have C and C++ standards which provide
> atomic scalars with relaxed memory ordering, why are you using a
> volatile int at all? The answer "because I don't want to rewrite my
> code unnecessarily" seems to me to be a reasonable answer, provided the
> program is indeed adequately synchronized by some other means such as
> fences, so that it does not contain a race condition in common parlance.
>
Volatile accesses have certain advantages over atomics - even relaxed
atomics. Let us restrict ourselves to the world of single-core systems,
as is typical for microcontrollers - since when you have multi-core
systems you will be needing atomics and locks (correctness trumps
efficiency and convenience every time, and we know volatile is not
enough for most purposes in such systems). We'll assume a 32-bit system
for convenience.
Such systems are often asymmetric in their threading. You have a
hierarchy. If you have a RTOS, you have a layers of threads that have
strictly controlled priorities. A higher priority thread can pre-empt a
lower priority thread, but not vice versa - but complicated by priority
boosting at times. Above that, you have layers of interrupts at
different priorities, usually more strictly prioritised.
Imagine a timer interrupt function that tracks time as a 64-bit counter.
The "global_timer" variable is only ever written within that interrupt
function. It can be declared "int64_t global_timer;", and incremented
as "global_timer++;". This is safe - no need for atomics, volatile, or
anything else - these would be pessimisations. For code that reads this
value from another thread or context, you need to be smarter. Here you
/do/ need volatile accesses.
#define volatileAccess(v) *((volatile typeof((v)) *) &(v))
(Forgive the gcc'ism and C style - in C++, you'd make a template but it
doesn't affect the principle.)
You can read your global timer in various ways, such as:
disable_interrupts();
int64_t now = volatileAccess(global_timer);
enable_interrupts();
or
int64_t now = volatileAccess(global_timer);
int64_t now2 = volatileAccess(global_timer);
while (now != now2) {
now = now2;
now2 = volatileAccess(global_timer);
}
(You can also break this last one into separate high and low words to be
slightly more efficient.)
Both of these are much more efficient than using relaxed 64-bit atomic
access, as such accesses are often implemented. Even more importantly,
both of them /work/ - unlike some implementations I have seen of atomic
accesses (like
<
https://gcc.gnu.org/wiki/Atomic/GCCMM?action=AttachFile&do=view&target=libatomic.c>)
which rely on spinlocks.
A key point here with volatiles is that you can have a normal object,
and use volatile accesses on it. You can't do that with atomic objects,
in the way you can with the macro above, limiting your flexibility to
use your knowledge of the program to use different access types for
different balances of efficiency and synchronisation control. It's true
that a relaxed atomic load or store is going to be efficient for small
enough sizes - and the cost is just the ugly and verbose syntax. For
larger sizes, it's a different matter. If your code is within a
critical section (due to a lock, or interrupt control) and you know it
cannot possibly clash with other access to the same object, there is a
huge efficiency difference between using a normal access to the object,
and using an atomic access (even a relaxed one).