Hello LIKWID developers,
Two newbie questions. I will put the shorter question first
#1
Suppose I have an AMD and Intel system, I measure the data movement of applications on both systems. How much variation would I expect in the measured data movement? I am seeing about 20% which given compiler differences is probably reasonable I think. But if you think this variation is too large, please let me know.
#2 (This is the longer question)
I am double-checking to make sure I am not off by a factor of *two* in my data movement and memory bandwidth measurements.
Here are the machine specs:
n.a.romero@sal-hyperplane01:~/miniAMR_WL/mpi/1sphere_likwid_amd_64$ likwid-topology
--------------------------------------------------------------------------------
CPU name: AMD EPYC 7313 16-Core Processor
CPU type: AMD K19 (Zen3) architecture
CPU stepping: 1
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets: 2
Cores per socket: 16
Threads per core: 2
...
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains: 2
--------------------------------------------------------------------------------
Domain: 0
Processors: ( 0 32 1 33 2 34 3 35 4 36 5 37 6 38 7 39 8 40 9 41 10 42 11 43 12 44 13 45 14 46 15 47 )
Distances: 10 32
Free memory: 253541 MB
Total memory: 257840 MB
--------------------------------------------------------------------------------
Domain: 1
Processors: ( 16 48 17 49 18 50 19 51 20 52 21 53 22 54 23 55 24 56 25 57 26 58 27 59 28 60 29 61 30 62 31 63 )
Distances: 32 10
Free memory: 233706 MB
Total memory: 257987 MB
--------------------------------------------------------------------------------
And the relevant performance groups:
n.a.romero@sal-hyperplane01:~/miniAMR_WL/mpi/1sphere_likwid_amd_64$ likwid-perfctr -a
MEM2 Main memory bandwidth in MBytes/s (channels 4-7)
...
MEM1 Main memory bandwidth in MBytes/s (channels 0-3)
Here is my question:
It doesn't look like I can measure MEM1 and MEM2 at the same time, but for the most part the measurements appear to be equal in value.
The total data movement of my application should be equal to the sum of the data volume reported by MEM1 + MEM2, correct?
Here is the part that is tripping me up:
Is the total memory bandwidth rate also equal to the sum of memory bandwidth rates reported by MEM1 and MEM2 groups? Or is it the average?
If it's the sum, then the values appear to exceed what is obtainable by likwid-bench if a bound by main memory bandwidth:
To be specific:
If I am running my application at 8 MPI ranks on 1 socket, I would not expect the memory bandwidth rate of MEM1 + MEM2 for an application to exceed the value of:
likwid-bench -t copy_mem -w S1:1GB:8
(I am assuming my application is bound by main memory bandwidth -- which might be incorrect)
Thanks again for your help,
-- Nichols A. Romero, Ph.D.