It means that proper synchronization/atomicity is not supported on certain SMP systems
(depending on how their memory bus / cache interconnect is designed).
The real issue is that read/write to/from main memory happen in a very non-deterministic way on modern systems,
which is much different than what is dictated by the source code.
- An agressive compiler will liberally reorder memory loads and stores in the generated source code if it can determine
that this doesn't change the semantics of the program (this is why using the "restrict" keyword can have significant
impacts on performance). All it can do is reorder instructions in the generated machine code, but you can somehow
instruct it to avoid this reordering by using "volatile" variables and/or compiler-specific "barriers".
- After that, the CPU itself may also do some reordering (when converting the machine code into micro-ops or whatever)
that are issued to the cache (the CPU never talks directly to main memory), very conservatively though.
- Then, the cache will speculately load cache lines from main memory, even before the CPU requests the corresponding
data, and only write back the cache lines to main memory well after the CPU performed corresponding memory stores.
In a multi-core setup, each CPU has its own cache, which may perform speculative loads / delayed writes at different times.
Atomic operations, by definition, need to read/write values from/to main memory that is the only one shared by all CPUs.
Without some sort of cache coherency support, this cannot work well in SMP. Fortunately, multi-core architectures support this
through the use of specific hardware machine instructions to deal with this.
These are usually described as "read barrier" and "write barrier".
In a nutshell (though real implementation details are a lot weirder than that):
- a "read barrier" ensures that a loaded value comes from main memory, and not a speculated or populated cache line.
- a "write barrier" ensures that a writes goes directly to main memory.
Given today's architectures, they are significantly slower than the corresponding cached read or write.
The rules are, if you need to share data through main memory, you need to use a read barrier before reading the
data, and a write barrier to write it. Always couple a read barrier with a write barrier too.
These are also called "atomic acquire" and "atomic release" semantics in the litterature.
Note that a "cmpxchg" hardware instruction also usually corresponds to a "full barrier" (i.e. both read and write).
These special machine instructions only exist on CPU architectures that support multi-core (hence not ARMv5, I'm unsure
about ARMv6).
The point of the description message is that we need to add specific functions within the C library to perform
atomic_read() and atomic_write(), then modify our code to use it whenever appropriate.
On ARMv5, these functions would be simple loads/stores, and would not change the generated machine code at all.
Hope this makes it clear. Here are a few links to help you go further.
http://en.wikipedia.org/wiki/Memory_barrierhttp://people.redhat.com/drepper/cpumemory.pdf