Hi everyone,
I've been following this discussion closely. I completely agree with Oscar's assessment that ASV struggles with SymPy's development workflow—especially its inability to isolate cold-cache performance and effectively visualize asymptotic scaling.
Inspired by Oscar's mention of PR #29094 (Snapshot testing), I realized that a heavier DevOps tool isn't the answer. Instead, we need a lightweight, math-aware harness.
To test this idea, I built a quick Proof of Concept (PoC) and submitted it as a Draft PR here:
https://github.com/sympy/sympy_benchmarks/pull/124
This Python prototype forces clear_cache() to accurately measure cold-cache time and compares SymPy Poly with NumPy across different degrees.
Interestingly, when I pushed the test up to N=200, I noticed significant time spikes around some Ns across all runners, likely due to OS noise. This practically proves why a naive single-pass runner isn't enough, and why we need a custom harness that can execute multiple passes and manage GC to filter out flakiness. I have attached the asymptotic plot in the PR description.
As I begin drafting my GSoC proposal, I have a quick questions that since the previous bot had security vulnerabilities, our focus is shifting towards this custom snapshot harness(like webpage and comparison), should rebuilding the CI commenting bot be excluded from this 175-hour project? Or is a secure PR reporter still a high priority? And do you guys think this is a good start to working on or there are other things I need to focus?
I would love to hear your feedback on whether this math-centric prototype aligns with your vision.
Best regards,
Chengxi Meng