Just wanted to share an idea about how to eliminate the effect of machine noise across separate runs of a set of benchmarks on the same (or similarly spec'd) machine (see my side project repo here: https://github.com/bensanmorris/benchmark_monitor
The scenario / idea:
1 - developer creates a set of benchmarks for their product
2 - developer triggers a run of the benchmarks
3 - (for each benchmark) google benchmark internally spins up a "control benchmark" that runs in parallel (a background thread) with the developer's benchmark, the control benchmark simply increments a counter (a google benchmark counter). the code for the control benchmark never changes.
4 - when the developer's benchmark finishes, the control benchmark counter associated with the developer's benchmark is recorded as a metric against the developer's benchmark i.e. counters["developer_benchmark_name_control_counter"] = control_benchmark_counter_value
5 - the benchmark run (.json) is saved into a benchmark history folder
6 - the developer at some point makes a code change and CI triggers a build (on the same machine or identically spec'd machine)
7 - steps 4 -> 6 are performed
8 - let's say the machine (on which the benchmarks are running) is having a bad day and is running at 90% the previous speed, we would expect (in theory) to see the control benchmark running at 0.9x so it would record 0.9x of the previous run's control counter
9 - we then perform a step change analysis but perform a pre-processing step that scales the developer's benchmark metrics by the counter value (choose any counter value as a baseline relative to which all other are scaled)
10 - we then perform a step change detection using a technique similar to my repo
I've added it as an idea to my project's repo. It seems like it might work in theory but as I haven't yet tried it it might not work in practice. If you take a look at my repo's memory_counters branch (something else google benchmark could provide) then you'll see the basic idea - spin up a worker thread that runs in parallel with a benchmark (a bit hacky as it's work in progress).
What do you think?