I have a hybrid application that I want to measure with likwid-perfctr. Before I do want to make the hybrid measurements, I want to measure a pure multi-threading and then a pure MPI version of it. The multi-threading measuring is working fine, but I am running into problems concerning the MPI version. My problem is that likwid does not seem to measure the cores I want it to measure.
From the documentation I gathered that there are 3 main ways of measuring pure MPI applications:
mpirun -np 32 likwid-perfctr -c S0:0-31 -g CACHE -o mpi_%h.txt ./a.out (from likwid-perfctr documentation)
likwid-mpirun -np 4 -pin S0:0_S0:1_S0:2_S0:3 -g CACHE ./a.out > mpi.txt (from likwid-mpirun documentation and TutorialMPI)
mpirun -np 1 likwid-perfctr -c S0:0 -g CACHE ./a.out : -np 1 likwid-perfctr -c S0:1 -g CACHE ./a.out : … (from TutorialMPI)
Some background information before I start explaining my problem. I work on a two-socket machine with 64 cores per socket. Hyper-threading is enabled, but I only want one process per core. In physical indexing core 0 has threads 0 and 128, core 1 has threads 1 and 129, and so on. While in the logical indexing core 0 has threads 0 and 1, core 1 has threads 2 and 3, and so on. I do the CPU binding via hwlock inside my code, so I only need likwid for measuring. I use Likwid 5.2.
In this example I want to run 32 processes and I want all of them to run on S0 and on the cores 0-31. Later I probably also want other configurations.
When using the first option I get the error that “WARN: Selected affinity domain S0 has only -1 hardware threads, but selection string evaluates to 32 threads. This results in multiple threads on the same hardware thread.” and crashes. Which I think is weird because domain S0 does contain 64 cores (likwid-topology also shows this). So, when omitting the S0 and just writing “-c 0-31” the code runs, but in the output file it sometimes shows the correct HWThreads are measured and sometimes that HWThreads 64-95 were measured instead and also some of those measurements are just a zero or a dash. It also gives me the info that “INFO: You are running LIKWID in a cpuset with 128 CPUs. Taking given IDs as logical ID in cpuset”, so I assume that I should use the logical CPU ID’s in the command line. When doing that it again will sometimes measure the correct cores and sometimes completely different ones. Is this measuring wrong cores just a weird side effect of something else I am doing and should I just redo the measurement until the correct cores show in the output file? If I use “-o mpi%r.txt” instead of “-o mpi%h.txt” I get a file per rank, am I right to assume that when using the latter, the output is just combined into one file?
When using the second option with 32 processes and thread group “S0” with IDs 0-31 it will always measure cores: 0, 65, 2, 67, 4, 69, 6, 71, 8, 73, 10, 75, 12, 77, 14, 79, 16, 81, 18, 83, 20, 85, 22, 87, 24, 89, 26, 91, 28, 93, 30, 95.
When using the thread group “N” with IDs 0,2,4,6… the cores: 0, 66, 4, 70, 8, 74, 12, 78, 16, 82, 20, 86, 24, 90, 28, 94, 32, 98, 36, 102, 40, 106, 44, 110, 48, 114, 52, 118, 56, 122, 60, 126 are measured.
When using no thread group with IDs 0,2,4,6… the same cores as with “N” are measured, which makes sense since both are logical orderings, but they are still not the ones I stated in the command line.
When using the third option pretty much the same as with the second option happens.
I realize this question is really huge and I am thankful for everyone that read it. So, am I missing or misunderstanding something when it comes to measuring specific cores? The first option does measure the correct cores sometimes, but since it does not do it always it has made my doubtful of the results (even though they were in the expected range and could very well be right). Thank you for your responses.