Hello,
the Intel Forum post is about Intel architectures from 2013 (SandyBridge - Haswell). There the FP rate events were not counting retirement but issuing/executing which leads to overcounting if the instruction needs to be rescheduled in order to wait for data (reservation station).
LIKWID measures the memory traffic directly on the memory controllers for Intel Cascadelake SP. In my tests they are quite accurate [1]. Don't get confused by the undercounting for small sizes, the heurisitics when to keep non-modified data in the L3 victim cache is quite clever. I can't remember any case where the events overcounted.
CAS_COUNT_RD: "Counts all CAS (Column Access Select) read commands issued to DRAM on a per channel basis. CAS commands are issued to specify the address to read or write on DRAM, and this event increments for every read. This event includes underfill reads due to partial write requests. This event counts whether AutoPrecharge (which closes the DRAM Page automatically after a read/write) is enabled or not."
CAS_COUNT_WR: "Counts all CAS (Column Address Select) commands issued to DRAM per memory channel. CAS commands are issued to specify the address to read or write on DRAM, and this event increments for every write. This event counts whether AutoPrecharge (which closes the DRAM Page automatically after a read/write) is enabled or not."
Best,
Thomas