Hi,
I have a few questions regarding the DRAM directory cache in Sniper:
1) Is my understanding of the implementation correct - The coherence directory is in the DRAM
(which keeps track of the sharers of the entire system memory). The directory cache is a separate
structure which exploits locality in coherence directory accessees. Finally, inter-socket coherence is enforced
by the DRAM directory and intra-socket coherence is enforced by the shared LLC (via snooping?)
2) If the above picture is true, then do the dram costs reported in dumpstats.py (dram-remote and dram-local for
multi-socket designs) include this overhead of accessing the coherence directory. Are these reported in td-access
which is used by llcstack.py?
3) Based on the discussion in thread
link, it seems that the average LLC miss cost (uncore-totaltime) does not
account for miss-overlapping. My observations also corroborate this fact - uncore-totaltime > barrier-global.time.
Does sniper track LLC miss costs (particularly, coherence directory accesses) after accounting for miss-overlapping?
My use case is to identify the overhead of coherence (accessing the coherence directory, forwarding data requests
to owners, invalidating sharers, etc.) and plot them in the cycle stack. I think the cyclestack already identifies most
of the coherence overhead (through L1_S, L2_S, etc.) but not the latency of accessing dram directory. Any
pointers on tracking this would be appreciated.
Also, I would appreciate any resources that explain the coherence protocol implementation used in sniper. It looks
like the protocol implementation follows Section 8.6.1 of the "Primer on memory consistency and cache coherence"
Thanks,
Vignesh