When generating traces for certain workloads such as stressapp, we observe that the number of trace files generated does not match the number of threads that the application is run with, and some extra trace files are generated. The total number of instructions that are present in the trace files do not match what is observed when running perf stat and we observe a lower number of total instructions in the trace files. So generating the trace causes more trace files to be generated compared to the number of threads, while checking the total instruction count results in fewer instructions.
We currently observe this issue for stressapp and multiload. For most applications we observe the same number of trace files as the number of threads and correct instruction counts. We have our own instruction counter tool to check the number of instructions in the trace file. This issue seems to occur regardless of x86 or ARM platforms.
The traces are generated by using
/bin64/drrun -t drcachesim -offline -outdir . -- ./application, and preprocessed using /bin64/drrun -t drcachesim - indir tracefile