Hello,
I am attempting to collect information about Memory Data Volume for an application using LIKWID. The reported values are not matching what I would expect as I modify the number of MPI ranks, so I’m hoping somebody in this group / email list might have some insight.
I am using LIKWID v5.2.2, and have collected data on 2 separate systems. I am also using the LIKWID Marker API to collect data for specific regions of the application, and register each marker region with LIKWID_MARKER_REGISTER immediately after calling LIKWID_MARKER_INIT. For the purposes of this request for help, I’ll focus on a single marker region which wraps the following code:
LIKWID_MARKER_START(mystr);
int64_t ch_indx=0;
int dims_size=pencil_dims[0]*pencil_dims[1]*pencil_dims[2];
for(int i0=d2_array_start[0];i0<d2_array_start[0]+local_sizes[0];i0++){
for(int i1=d2_array_start[1];i1<d2_array_start[1]+local_sizes[1];i1++){
for(int i2=d2_array_start[2];i2<d2_array_start[2]+local_sizes[2];i2++){
int64_t local_indx=pencil_dims[2]*(pencil_dims[1]*i0+i1) + i2;
assert(local_indx < dims_size);
assert(ch_indx <chunk_size && ch_indx >= 0 && local_indx>=0 && local_indx < dims_size);
d->d2_chunk[ch_indx]=a[local_indx];
ch_indx++;
}
}
}
LIKWID_MARKER_START(mystr);
System 1
$likwid-topology
--------------------------------------------------------------------------------
CPU name: Intel(R) Xeon(R) Silver 4215 CPU @ 2.50GHz
CPU type: Intel Cascadelake SP processor
Running on 1 rank with likwid-mpirun -np 1 -g MEM -m ./TestDfft 1 600 yields the following Memory Data volumes
When running on more than 1 rank, each rank gets a subset of the overall domain. The marker region shown above loops over every grid cell on each rank, summing to the full domain. In other words, running on 1 rank has a total of 216,000,000 loop iterations. Running the same problem on 8 ranks, each rank has 27,000,000 loop iterations, with the sum across all 8 ranks equaling 216,000,000 loop iterations. So, I would expect the “Sum” of the Memory data volume to be approximately equal across varying rank counts.
However, running on 8 ranks with likwid-mpirun -np 8 -g MEM -m ./TestDfft 1 600 yields the following Memory Data volumes
The marker region shows much less data volume (approximately 14% of the 1 rank case), but the overall simulation has similar data volume.
This same behavior is seen on different hardware as well.
System 2
$likwid-topology
--------------------------------------------------------------------------------
CPU name: AMD EPYC 7313 16-Core Processor
CPU type: AMD K19 (Zen3) architecture
Running on 1 rank with likwid-mpirun -np 1 -g MEM1 -m ./TestDfft 1 600 and likwid-mpirun -np 1 -g MEM2 -m ./TestDfft 1 600 and summing the data volumes between those 2 runs yields
Running on 8 ranks with likwid-mpirun -np 8 -g MEM1 -m ./TestDfft 1 600 and likwid-mpirun -np 8 -g MEM2 -m ./TestDfft 1 600 and summing the data volumes between those 2 runs yields
Again, the marker region shows much less data volume (approximately 15% of the 1 rank case), but the overall simulation has similar data volume.
I thought this might have something to do with cache performance as the number of ranks varies, so I looked at the cache data volumes (likwid-mpirun -np 1 -g CACHES -m ./TestDfft 1 600). However, comparing caches between 1 and 8 ranks shows similar cache Data Volumes between 1 rank and 8 ranks. (see attached; I couldn't get the table to paste inline).
Does anybody have any thoughts on why the total Memory Data Volume is conserved, but this marker region has significantly different Memory Data Volume as the number of MPI ranks varies?
Thank you!
Matt