--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
who's to say the line hasn't been "stolen"/invalidated by the time you're ready to write to it?
Yes, I can see how it was confusing, particularly since we started talking about true inter-instruction OoO execution, speculative loads, etc. I think part of the confusion also is that thread talking about "extra read before writing" to the cacheline, but not elaborating that it's talking about single instruction.
Ok, with that out of the way ... I don't know how the lookahead would work; right now the execution engine "looks backwards" to determine what's in flight already, detect dependencies, etc. This is going the other way: CPU sees a load, and now wants to know if there's a store to it later - not possible since we're processing instructions in a stream. Ok, we'll issue the load right now and have it enter the load queue for servicing. At some point we see a store to a cacheline - do we see if there are pending loads? Do we check every store for this? Mark their load requests as exclusive? Or they've already left (possibly retired), ok we're back to today's scenario - issue RFO and drop the store into the store buffer.
I'm not a hardware engineer, so take this with a grain of salt, but I don't see this being practical or efficient. We do want to minimize interconnect traffic, but must not create other hazards/stalls/inefficiencies in the process. Perhaps hardware will have better answers/implementations as the core counts increase, but I doubt having many writers on same memory will ever scale well.
sent from my phone