I'm currently struggling with some concepts related to memory ordering and barriers. After searching many pages on the web, I found it's hard to find an accurate explanation. Then I want to ask here and hope someone can help me. Thanks.
In Paul E. McKenney's perfbook C.3 and C.4, there are two concepts named Store Buffer and Invalidate Queues which are the root cause of why we need write/read barrier. The same explanation can also be found on Wikipedia:
https://en.wikipedia.org/wiki/MESI_protocol#Store_Buffer> As a result, memory barriers are required. A store barrier will flush the store buffer, ensuring all writes have been applied to that CPU's cache. A read barrier will flush the invalidation queue, thus ensuring that all writes by other CPUs become visible to the flushing CPU. Furthermore, memory management units do not scan the store buffer, causing similar problems. This effect is visible even in single threaded processors.
This sounds self-consistent and the term "memory ordering" is visibility ordering between cores. Memory barriers are the tools to force the visibility ordering.
BUT as we all know, instructions can also be executed out of order in _one_ core. This is an intrinsic reordering. For example, the code list is:
```
if (a == 1) {
if (b == 1) {
xxx
}
}
```
If the core first executes "loading b" and then executes "loading a", this is also a "reordering". Can memory barriers deal with this? About this question, a reasonable explanation is that the retirements are in-order though executions are out-of-order. So in the perspective of other cores, `a` is first loaded and then `b` is loaded. Is this explanation correct? If we don't use the store buffer and invalidation queues, are the memory barriers instructions useless?
RISC-V is an opening arch, so I hope I could get more accurate explanations. Thanks!
Thanks,
Hao Lee