1. var = new_data2. CLWB(var)3. SFENCE
1. var = new_data2. SFENCE3. CLWB(var)4. SFENCE
Jungsik Choi
PhD student
College of Software
Sungkyunkwan University
ch...@skku.edu
CLWB is implicitly ordered with older stores executed by the logical processor to the same address.
--
You received this message because you are subscribed to the Google Groups "pmem" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pmem+uns...@googlegroups.com.
To post to this group, send email to pm...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pmem/0f373699-f3eb-4043-a52b-3f5659f8006a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
char buff[64]
buff[13]=5
CLWB(&buff[0])
SFENCE
Is it possible that the the order is swapped (assuming the 2 addresses are in the same cache line)?
Ziv
But I think it is more likely you were asking something like this:/* assume the cache line starts off containing zeros */buff[0] = 1buff[8] = 1CLWB buffSFENCEAssuming you have taken steps to prevent the compiler from reordering the stores, then a younger store will not pass an older store to the same cache line. If my example code gets interrupted by a crash, on recovery either buff is all zeros, only buff[0] is 1, or both buff[0] is 1 and buff[1] are 1. It is not possible for buff[8] to be 1 and buff[0] to be zero.Once again, this is only true because they are the same cache line. If buff crosses between multiple cache lines, the ordering property I described doesn't hold.
The 8-byte atomicity I wrote about in the ;login: article is talking about failure atomicity, or the size of a store that is guaranteed to be atomic in the face of failure. For example, if you store 8 bytes (that are aligned) and the system experiences a power failure while that store is in-flight, on reboot the contents of that location will contain the old 8 bytes, the new 8 bytes, but not a partial update. Anything larger and this is no longer guaranteed, so if you did the two store instructions that it would take for buff[0] and buff[8], it is possible only one of them happens in the face of failure.What I was talking about in the post below is about ordering of multiple stores to the same cache line, not about atomicity.
So, do I get this right: when I flush a cache line, it is persisted in chunks of 8 bytes in no particular order, but it is guaranteed, that stores to that cache-line since the last flush do not get reordered?
Intel hardware won't reorder writes to the same cache line, so you can at least reason that the failure cases are a) no write is done, b) first write is done but second isn't and c) both are done. But it's not possible that the second is done whilst the first isn't.
A cache line is flushed atomically. You just can't completely control when. It may be flushed spontaneously by the hardware between any other instructions if there is cache memory pressure. This is radically different to block I/O, where the memory->storage writeback occurs only when you explicitly tell it to.
Does that mean that a cache line reaches the persistence domain either as a whole or not at all, when it is flushed? I couldn't find that in the documentation.
It is important to distinguish between what is *likely* to happen and what is *architecturally guaranteed* to happen. In normal operation of a system, it is likely that data is delivered to DIMMs as full cache lines that are not torn by powerfail on most CPUs. Right now, the only architectural guarantees are what I stated, that an 8-byte store is failure atomic and that a younger store won't pass an older store to the same cache line.In future CPUs, a new instruction called MOVDIR64B will provide a 64-byte failure atomic store to persistent memory. You can read about this instruction in the SDM on intel.com if you are interested. Until then, you should only be depending on the 8-byte atomicity.So when Jonathan said, that "a cache line is flushed atomically", that was not correct?
The architectural guarantees are what I summarized above. A cache line is not failure atomic until MOVDIR64B is available.Ok. And that an older store won't pass a younger store to the same cache line is guaranteed on the DIMM?
-andy
I feel that somehow two different kinds of guarantees got mixed up here.
If I get it right, one set of guarantees is the memory model - the
rules what one CPU may see when another CPU accesses same data
concurrently.
For instance, when one thread executes "write 1 to x; read y", another
thread executes "write 1 to y; read x", then x86_64 is fine with
returning the initial value (say 0) in both threads, contrary to
expectations. And x86_64 memory model disallows observing stores in a
different order then they were issued.
But these guarantees hold until failure, and I believe there's no
continuity between crashes.
The other set of guarantees is what will be present in the persistent
memory after a failure. And here, as far as I understand, the only
guarantee is that aligned 8-byte chunks are either written fully into
the persistent memory, or not at all.
Am I correct that after issuing two stores 8 bytes apart, if the
failure does not happen, then one can never see only the second store
alone, but if the failure does happen, then it is allowed that only
the second store is present in pmem?
E.g., when a thread executes "write 1 to x; write 1 to y;", then upon
no failures executing "read y; read x" can yield "0,0", "0,1" or
"1,1", but if a failure occurs, then (after restart obviously) it can
also yield "1,0"?