MESI and 'atomicity'

585 views
Skip to first unread message

Peter Veentjer

unread,
Nov 25, 2019, 11:49:53 AM11/25/19
to mechanica...@googlegroups.com
I have a question about MESI.

My question isn't about atomic operations; but about an ordinary write to the same cacheline done by 2 CPU's.

If a CPU does a write, the write is placed on the store buffer.

Then the CPU will send a invalidation request to the other cores (RFO)  for the given cacheline if the cacheline isn't in Exclusive or Modified state, and once acknowledgement of the other CPUS have been received, the write is allowed to move from the store buffer into the L1 cache.

My confusion is about the 'atomic' behavior of requesting ownership till writing the change on the cacheline in the L1 cache. What prevents another CPU directly after the first CPU has requested ownership to do the same? So what prevents another CPU getting lucky and stealing the cacheline after the acknowledgements to the first CPU have been received, but before the first CPU writes to the L1 cache.

I guess that the first CPU will just ignore any competing bus transactions as long as it has not completed the write. There is a ton of information about MESI, but I could not find a lot of sensible information about this behavior.

Vitaly Davidovich

unread,
Nov 25, 2019, 2:20:42 PM11/25/19
to mechanical-sympathy
On Mon, Nov 25, 2019 at 11:50 AM Peter Veentjer <pe...@hazelcast.com> wrote:
I have a question about MESI.

My question isn't about atomic operations; but about an ordinary write to the same cacheline done by 2 cores.

If a CPU does a write, the write is placed on the store buffer.

Then the CPU will send a invalidation request to the other cores (RFO)  for the given cacheline if the cacheline isn't in Exclusive or Modified state, and once acknowledgement of the other cores have been received, the write is allowed to move from the store buffer into the L1 cache.

My confusion is about the 'atomic' behavior of requesting ownership till writing the change on the cacheline in the L1 cache. What prevents another core directly after the first core has requested ownership to do the same?
Nothing prevents it :) In fact, that's what one will get if they hammer writes to shared memory across cores - an RFO storm. 
So what prevents another core getting lucky and yanking away the cacheline after the acknowledgements to the first core have been received, but before the first cores write to the L1 cache  (so the first core ending up with a write on a cacheline he owned for a short time, but before you could complete his action got yanked from under his feet).
Core 1 must ack the RFO from Core 2 - it can't just yank ownership away, it's a cooperative protocol. 

I guess that the first CPU will just ignore any competing bus transactions as long as it has not completed the write. There is a ton of information about MESI, but I could not found a lot of sensible information about this behavior. 
What likely happens is Core 2's RFO will sit in Core 1's "invalidate" (I've seen this referred to via other names as well) queue.  Once Core 1 commits the write from the store buffer to L1D, it can then reply to Core 2's RFO and send the updated cacheline (data) along the way.  At least on Intel, RFOs and moving writes from the store buffer to L1D happen after the store instruction retires, so it's fairly late in the process and I'd imagine the window of time between RFO ack and moving data to L1D is fairly small.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/64a40287-3e44-4b83-9971-dc04814e4ae8%40googlegroups.com.

alarm...@gmail.com

unread,
Jul 29, 2020, 3:12:45 AM7/29/20
to mechanical-sympathy
In addition to the excellent answer above, this feature is called cache locking.

So when the cache line is loaded, it is also locked and while the cache line is locked, the CPU won't respond to any cache coherence requests of other CPUs until the CPU has written to the cache line and unlocked it.

Travis Downs

unread,
Dec 15, 2020, 4:05:44 AM12/15/20
to mechanical-sympathy
One could usefully distinguish between a vanilla RFO and an "RFO prefetch".

An RFO prefetch would when the core sends the RFO before it is ready to commit the store to cache (generally, before the store is at the head of the store buffer: i.e., next in line). It might still send an RFO prefetch early in this case, because your scenario usually doesn't happen, and if it waited until each store was at the head of the queue before processing it, no memory level parallelism would be possible for stores. This RFO prefetch could be triggered when the the store address (STA) part of the store executes: i.e., when its address is calculated, or it could also be triggered by some component that looks at the upcoming entries in the store buffer and issues RFO prefetches for the request.

In the case of an RFO prefetch, the line could be lost before you the core is ready to commit the line, as you have described. Usually this does not happen because most lines are not heavily contented (or contended at all), but it could. It only causes a performance problem, not a forward progress one, because the core can ask for the line again.

The second type of RFO, what I call "vanilla", would occur when the store is at the head of the store queue. In this case, the store can be committed as soon as the line is received in the exclusive state, so there is "no time" for another core to interrupt the process (in practice, it may not be instantaneous, but the core back either temporarily ignore or NACK incoming requests for this line by other cores).
Reply all
Reply to author
Forward
0 new messages