WIP gathering some early feedback.
Several issues that keep me from understanding the scheduled benchmarks:
Solution:
Possible Downside:
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
WIP gathering some early feedback.
Several issues that keep me from understanding the scheduled benchmarks:
- The highly dynamic bot_platforms.py evolved into so many custom branches for each platforms that it's impossible to follow
- Repeats can be defined in multiple places: BenchmarkConfig, with .Repeat() overrides, in the Benchmark itself, or in the sharding config
- With configurable crossbench benchmarks we're running into trouble of accidentally creating new benchmarks.
Solution:
- Create per-benchmark CSV files with the needed metadata and explicit repeat configs.
- Engineers think in benchmark and benchmark variants, so this should be much easier to keep track.
Possible Downside:
- Complex inline flags might have to be escaped (but the solution is typically to create a custom benchmark with fixed flags instead)
For the purpose of getting a list of benchmarks and where they run, I'd recommend not using bot_platforms.py as the data source. Instead, use the JSON files from tools/perf/core/shard_maps, which are the actual data used to control waterfall. Those JSON files should be much easier to handle.
For longer term strategy of how to control the schedule, we need some more thinking. The current strategy has some historical reasons. It was designed to automatically discover all Telemetry benchmarks and schedule them on all our bots, unless specifically overridden. The idea is after creating a new benchmark, it automatically gets scheduled. The developer doesn't need to do anything other than running a Python script, unless the developer doesn't want to run it everywhere, in which case the developer needs to exclude it. However, this model didn't get moved over to crossbench-based benchmarks, and it's worth considering whether the model still make sense going forward.
There are primarily two ways of repeating. The shard config causes the benchmark to run on multiple shards, and is generally the preferred way, as running on multiple shards provides some parallelism and smooths out some device variations. The other methods all cause the same benchmark or story to run multiple times on the same shard, similar to passing --repeat to crossbench. In general, it should only be used when the desired number of repeats exceeds the number of shards available. (I suppose we can make the number of repeats global, and let the scheduler automatically calculate how to repeat inside vs across the shards.)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |