Hi,
I am using the MEM group on an Ampere Altra (ARM neoverse n1). There are two observations and then a more general question.
Observations:
1. If I use the MEM group on an Intel architecture, I get values of 0 for most of the MPI ranks. I only see a non-zero value for 1 - 2 ranks. Do the non-zero values correspond to the number of unique sockets being used? There is only one socket on the Ampere Altra, but each MPI rank gives a non-zero value.
2. The sum of the values of MPI ranks for the MEM group for the Ampere Altra gives a memory bandwidth value that far exceeds the theoretical memory bandwidth that is possible. It was something like over 400 GB/s.
Is this a bug and should I file a github issue for it?
General question:
Suppose I have a dual socket system with N hardware threads and a total memory bandwidth of B. Does the maximum achievable memory bandwidth scale linearly with the number of ranks in the program?
In other words,
achievable memory bandwidth = # ranks / N x B
modulo some NUMA effects which may or may not be severe.
So that if I am only using half the hardware threads, the best that a benchmark like STREAM could do is achieve 0.5 x B?
Thanks
-- Nichols A. Romero, Ph.D.