Usually one does not compare executions of the entire test-suite, but
look for which programs have regressed. In this scenario only relative
changes between programs matter, so μs are only compared to μs and
seconds only compared to seconds.
> In any case, it would at least be great if the JSON data contained the time unit per test,
> but that is not happening either.
What do you mean? Don't you get the exec_time per program?
> Do you think that the lack of time unit info is a problem ? If yes, do you like the
> solution of adding the time unit in the JSON or do you want to propose an alternative?
You could also normalize the time unit that is emitted to JSON to s or ms.
>
> The second question has to do with re-running the benchmarks: I do
> cmake + make + llvm-lit -v -j 1 -o out.json .
> but if I try to do the latter another time, it just does/shows nothing. Is there any reason
> that the benchmarks can't be run a second time? Could I somehow run it a second time ?
Running the programs a second time did work for me in the past.
Remember to change the output to another file or the previous .json
will be overwritten.
> Lastly, slightly off-topic but while we're on the subject of benchmarking,
> do you think it's reliable to run with -j <number of cores> ? I'm a little bit afraid of
> the shared caches (because misses should be counted in the CPU time, which
> is what is measured in "exec_time" AFAIU)
> and any potential multi-threading that the tests may use.
It depends. You can run in parallel, but then you should increase the
number of samples (executions) appropriately to counter the increased
noise. Depending on how many cores your system has, it might not be
worth it, but instead try to make the system as deterministic as
possible (single thread, thread affinity, avoid background processes,
use perf instead of timeit, avoid context switches etc. ). To avoid
systematic bias because always the same cache-sensitive programs run
in parallel, use the --shuffle option.
Michael
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Usually one does not compare executions of the entire test-suite, but
look for which programs have regressed. In this scenario only relative
changes between programs matter, so μs are only compared to μs and
seconds only compared to seconds.
What do you mean? Don't you get the exec_time per program?
Running the programs a second time did work for me in the past.
It depends. You can run in parallel, but then you should increase the
number of samples (executions) appropriately to counter the increased
noise. Depending on how many cores your system has, it might not be
worth it, but instead try to make the system as deterministic as
possible (single thread, thread affinity, avoid background processes,
use perf instead of timeit, avoid context switches etc. ). To avoid
systematic bias because always the same cache-sensitive programs run
in parallel, use the --shuffle option.
Also, depending on what you are trying to achieve (and what your platform target is), you could enable perfcounter collection;
Btw, when using perf (i.e., using TEST_SUITE_USE_PERF in cmake), it seems that perf runs both during the
build (i.e., make) and the run (i.e., llvm-lit) of the tests. It's not important but do you happen to know
why does this happen?
You know the unit of time from the top-level folder. MicroBenchmarks
is microseconds (because Google Benchmark reports microseconds),
everything is seconds.
That might be confusing when you don't know about it, but if you do
you there is no ambiguity.