[GSOC 2026] Benchmarks and performance

185 views
Skip to first unread message

PRAYAG V

unread,
Mar 1, 2026, 8:57:24 AMMar 1
to sympy
I have interested same project that i applied last year ie,  Benchmarks and performance.

I am planning to do
- Move benchmark from sympy repo to benchmark repo.

-  I will added benchmark as much as possible, actually we can get some of benchmarks from performance issues and slow tests.

- Currently benchmarks are running in github action that can be noisy some times, instead we use self-hosted.Examples of self-host environment like digital ocean(https://www.digitalocean.com/open-source) it is free for opensource or osu osl(https://osuosl.org/services/hosting/). I just search and found these options.
i have tested self-hosted in github codespace.

-  I will the add regression and improvement benchmark as pr comments. for this i actually ask in  scientific python (https://discuss.scientific-python.org/t/how-are-you-handling-benchmark-regression-reporting-on-prs/2225) as by suggestion in https://github.com/sympy/sympy/issues/23085. I dont get that much useful reply only one replied. Long ago i created a issue in asv about implementing --only-improved and --only-regressed this to flags to filter benchmark output , But no one implemented and i raise a pr (https://github.com/airspeed-velocity/asv/pull/1575) but maintainers suggestion was to implement in asv_spyglass , this place where result manipulation occurs also adding similar flag in asv not good so i raise a pr there and get merged(https://github.com/airspeed-velocity/asv_spyglass/pull/10).With now we can easily comment on pr. I have implemented a rough version in my repo (https://github.com/vprayag2005/sympy/pull/3)

- At last in idea it says do performance improvements in project for that i am thinking about possibility to implement fraction free for qr as suggestion in (https://github.com/sympy/sympy/pull/29040) and fraction free for lu solver.

   
These all things , i am planning to implement for as part of gsoc project.

Any suggestions will be appreciated or or guidance on how to approach the project.   




PRAYAG V

unread,
Mar 1, 2026, 9:02:36 AMMar 1
to sympy
Also In https://github.com/sympy/sympy/issues/21374 is says about  codespped.io i checked their doc they tell about pytest only. To confirm is asked codespeed people and they said currently asv not integrated.

Oscar Benjamin

unread,
Mar 1, 2026, 9:35:15 AMMar 1
to sy...@googlegroups.com
There used to be something that would comment on PRs with speed
differences although various people complained about the way that the
information was presented so that could have been improved. The main
problem was that it is difficult to make a GitHub Action that can
comment on a PR without security vulnerabilities. The previous Action
used for this was removed because someone pointed out a vulnerability
that could leak the GitHub secrets from the repo.

There are a few problems with benchmarking in sympy right now:

- Many of the things that are currently slow are too slow to run in
the benchmarks but those are the things where we should focus on
improving performance. Instead the benchmarks are all random things
that are not currently slow.
- The way that asv times things is not suitable for timing something
that uses a runtime cache as sympy expressions do. The timings
reported by asv are not reflective of the time that it actually takes
to do something with cold cache which is usually what matters.
- The asv tool itself is awkward to use when you just want to run a
benchmark and see the timings. It has been designed in a particular
way that I think is not very usable for sympy development.

I think I would like to see some new benchmark runner that is not asv
and that is more suitable for sympy development. Ideally what I would
want is in general a new test harness perhaps like the snapshot
testing discussed at:
https://github.com/sympy/sympy/pull/29094

It would be better if more of sympy's test suite was like that so that
the tests can be easily updated but also it can be possible to measure
performance regressions for each individual test. Maybe it would make
sense to do that in the sympy benchmarks repo and build up a big
test/benchmark suite like that.

I would like to have benchmarks where we can compare sympy with other
things e.g. sympy's Poly vs python-flint's equivalents or sympy vs
maxima or something. I would like to see plots of asymptotic
performance of many operations and comparison of outputs all
represented nicely in a website somehow.

You can see a previous attempt to make a big test suite for dsolve here:
https://github.com/sympy/kamke-test-suite
Ideally we should have curated test suites like that and measure both
correctness and speed over all examples and have good ways of
visualising the results.

--
Oscar
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/581b1c39-4f10-4395-a9eb-04343d322a01n%40googlegroups.com.

PRAYAG V

unread,
Mar 6, 2026, 1:01:14 PMMar 6
to sympy
It sounds interesting. i hope you actually mean to remove asv and build custom style runner like Snapshot testing (with measure both
correctness and speed ).  we may have two mode in running benchmark cold and warm Cache.and may be we can run in n loop can take the average time. And add the o/p in json then  once the pr merged we can add to repo.and we can add filter like regression,improved,to see changed or unchanged benchmark.

I will add important benchmark  to compare sympy with other like python-flint's.In website we show Asymptotic Performance Plots, Improvement graph of all benchmarks we use different chart libraries to display.May be for user we can show the graph for regression or  improved instead of text or table.
 

PRAYAG V

unread,
Mar 7, 2026, 3:38:39 AMMar 7
to sympy
Also how should we consider this project. ie medium or large project ?

Also should we consider  self-host runners for running benchmark , sometimes may be github action can be noisy.

Oscar Benjamin

unread,
Mar 7, 2026, 5:57:03 AMMar 7
to sy...@googlegroups.com
Whether it is a medium or large project is really just a question for
you. How much time do you want to spend on a project?
> To view this discussion visit https://groups.google.com/d/msgid/sympy/effe44ae-7fc8-434a-9100-83614a21f765n%40googlegroups.com.

Chengxi Meng

unread,
Mar 18, 2026, 4:49:01 PM (5 days ago) Mar 18
to sympy

Hi everyone,

I've been following this discussion closely. I completely agree with Oscar's assessment that ASV struggles with SymPy's development workflow—especially its inability to isolate cold-cache performance and effectively visualize asymptotic scaling.

Inspired by Oscar's mention of PR #29094 (Snapshot testing), I realized that a heavier DevOps tool isn't the answer. Instead, we need a lightweight, math-aware harness.

To test this idea, I built a quick Proof of Concept (PoC) and submitted it as a Draft PR here:

https://github.com/sympy/sympy_benchmarks/pull/124

This Python prototype forces clear_cache() to accurately measure cold-cache time and compares SymPy Poly with NumPy across different degrees.

Interestingly, when I pushed the test up to N=200, I noticed significant time spikes around some Ns across all runners, likely due to OS noise. This practically proves why a naive single-pass runner isn't enough, and why we need a custom harness that can execute multiple passes and manage GC to filter out flakiness. I have attached the asymptotic plot in the PR description.

As I begin drafting my GSoC proposal, I have a quick questions that since the previous bot had security vulnerabilities, our focus is shifting towards this custom snapshot harness(like webpage and comparison), should rebuilding the CI commenting bot be excluded from this 175-hour project? Or is a secure PR reporter still a high priority? And do you guys think this is a good start to working on or there are other things I need to focus?

I would love to hear your feedback on whether this math-centric prototype aligns with your vision.

Best regards,

Chengxi Meng

Reply all
Reply to author
Forward
0 new messages