Hi,
I tried your suggestion of using DRX_COUNTER_REL_ACQ, however with and without DRX_COUNTER_64BIT, the measured opcodes are not accurate with any number of threads (besides 1), the counted instructions always under count the real number of executed instructions, and as I increase the number of threads the results get farther and farther from what is expected. I am running some in-line GCC assembly (volatile) so I am pretty sure the instructions are being executed.
I have an equivalent set of assembly code for x86 that I tested in the same way as the AARCH64 one, and in this case the results are accurate even when scaling up the number of threads (and using 64 bit counters), so it seems it is an issue with the AARCH64 counting.
Any idea on how I could fix this (could it be my a problem on my side?), or should I just open an issue on github?
Thank you for your time,
José Morgado