Non-temporal Loads in PMem

141 views
Skip to first unread message

Lawrence Benson

unread,
May 4, 2021, 2:24:57 PM5/4/21
to pmem
Hey,

while playing around a bit, I've started to wonder how Intel implements non-temporal loads with PMem. There is a lot out there regarding non-temporal stores that bypass the cache hierarchy, but surprisingly little information on loads. 

Intel Developer Manual states that non-temporal loads are only applied to Write-Combining (WC) memory. I'm aware that Optane has a write-combining buffer, but I don't know if Optane is classified as WC memory. The guide uses graphical memory as an example. 

So I'm wondering if using a non-temporal load (e.g., via _mm512_stream_load_si512) actually bypasses the cache when reading data or if the non-temporal hint is ignored and data is pulled through the hierarchy. I'd appreciate any pointers to documentation that clarifies this behavior or any information on this in general. Or does anyone know a way to verify cache residency with a short C/C++ program? 

Best,
Lawrence  

Andy Rudoff

unread,
May 4, 2021, 4:01:02 PM5/4/21
to pmem
Hi Lawrence,

As you discovered, the NT loads apply to WC memory, and typically that is only used in specialized situations like graphics memory.  Optane PMem is write-back mapped (WB) so the loads will go through the cache.  That's the reason you really only hear us talking about NT stores when we're talking about PMem...

-andy
Reply all
Reply to author
Forward
0 new messages