Thank you Bin for the quick response!
1. For generating the traces we ran the workloads with these commands-
Stressapp - ./stressapptest -s 20 -M 256 -m 8
Multiload - ./multiload -n 2 -t 16 - m 512M
Multichase- ./ multiload -n 5 -t 16 -m 512M -1 memcpy-line
We observed 27 trace files when running stressapp when we should have 8. For multiload and multichase we observed one extra trace file so we ran 16 threads but had 17 traces.
2. Yes, we ran the perf stat command in user mode with the :u, i.e. perf stat command-e instructions:u ./stressapptest -s 20 -M 256 -m 8.
We have tested many workloads from the SpecCPU 2017 benchmarks and AI workloads like Llama2, and we did not observe this issue, the instruction count correctly matches our profiler, and the number of threads match the trace files generated.
Please let us know if you have any suggestions and let us know if we can provide further information.