Contributing new SymPy benchmarks (comparative & non-time metrics)

127 views
Skip to first unread message

Sam Brockie

unread,
Sep 28, 2022, 5:07:34 AM9/28/22
to sympy
Hi All,

I'd like to begin adding some additional benchmarks to SymPy to help inform the code generation work that I'm doing as part of the CZI grant.

I'm aware of the benchmarks in the benchmarks repository. My understanding is that these are run using airspeed velocity as part of the CI, and track how the performance of a particular benchmark has changed relative to the most recent SymPy release and the master branch.

There are two other types of benchmark that I think might be useful:

1. Comparison of multiple ways to do equivalent computations

Below is a contrived example in which there are two functions, option_1 and option_2, that produce the same result but have different implementations.

>>> from sympy import Matrix, symbols
>>>
>>> def option_1(a, b):
...     return Matrix([a+b, a*b]).jacobian(Matrix([a, b]))
...
>>> def option_2(a, b):
...     Matrix([[(a+b).diff(a), (a+b).diff(b)],
...             [(a*b).diff(a), (a*b).diff(b)]])
...
>>> a, b = symbols(“a, b”)
>>> option_1(a, b) == option_2(a, b)
True

A benchmark in this case would time the execution of both option_1 and option_2 (for a range of inputs), compare the relative speeds, and report the differences. As this type of benchmark is not comparing the same benchmark across different SymPy versions, I believe that airspeed velocity may not be the best tool to use here.

I see this type of benchmark as being useful for: (1) determining which algorithm to use when implementing a new function or refactoring an existing function; and (2) ensuring that an implementation remains superior to alternatives as changes are made elsewhere in SymPy.

I have had success in the past implementing these sorts of benchmarks using pytest-benchmark. Is there currently anything similar anywhere is SymPy? Would the sympy/sympy_benchmarks repository be the best place to contribute PRs for these sorts of benchmarks? Does anyone have any differing opinions about how and where these should be implemented, or the value of this type of benchmark?

2. Measurement of non-time metrics

Below is another contrived example in which common subexpression elimination is used on an expression, y, and it is shown that the result of cse(y) involves fewer operations that the original expression.

>>> from sympy import count_ops, cse, exp, sin, symbols
>>>
>>> a, b = symbols(“a, b”)
>>> y = (sin(a/b) + (a/b) - exp(b)) * ((a/b) - exp(b))
>>>
>>> count_ops(y)
10
>>> count_ops(cse(y))
6

A benchmark in this case would count the number of operations in the return value from cse(y) and compare this to 6. Assuming that the implementation of the cse function has been changed, if the number of operations is six then we know that its performance hasn’t been changed by the refactor. If the count is greater than six a regression has taken place. If the count is less than six the performance of the function has been improved. Benchmarking for a range of inputs would obviously be required.

I see this type of benchmark as being useful for: (1) measuring SymPy’s performance in instances where timing code snippets isn’t necessarily the best, or only valuable, indicator of performance; and (2) ensuring regressions haven’t occurred during refactoring.

I believe this type of benchmark can be implemented using airspeed velocity’s track prefix. Or perhaps this type of benchmark would be best implemented as regression tests in the sympy/sympy repository’s test suite, comparing the non-time metrics to hard-coded values.

As before, is there currently anything similar anywhere is SymPy? Should PRs for these sorts of benchmarks be contributed to the sympy/sympy_benchmarks repository using airspeed velocity's track or as regression tests in the sympy/sympy repository? Does anyone have any differing opinions about how and where these should be implemented, or the value of this type of benchmark?

Sam

Jason Moore

unread,
Sep 29, 2022, 2:25:35 AM9/29/22
to sy...@googlegroups.com
Hi Sam,

We used to have benchmarks (and maybe still do) in the main sympy repo, but these were essentially never run. We were working on transferring them to the sympy_benchmarks repo. The sympy_benchmarks repo was created and Bjorn, Aaron, and I used to run that on every commit and publish the web output using our own dedicated machines but I don't think that occurs anymore. Oscar more recently connected it up to run on pairs of commits and output those results to new PRs.

Benchmarks that can work with airspeed velocity should go in the sympy_benchmarks repo. But your type 2 does fit nicely in the unit tests and we have a handful of those in the main sympy repo. The key thing is that they end up being run by CI and that people see them and hopefully don't ignore regressions. If the tests fail due to unit tests then it can't be ignored (having your type 2 in the main sympy repo). The airspeed results in the PRs can more easily be ignored.

If it makes sense to add pytest-benchmark you can, but you'll have to get the machinery running in CI. Note that I don't' think we actually use pytest yet (still an old fork of it).

Jason

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/36e40796-3caa-4aa1-9753-1606773e9288n%40googlegroups.com.

Aaron Meurer

unread,
Sep 29, 2022, 6:33:35 PM9/29/22
to sy...@googlegroups.com
On Wed, Sep 28, 2022 at 2:04 PM Sam Brockie <sbro...@g-tudelft.nl> wrote:
>
> Hi All,
>
> I'd like to begin adding some additional benchmarks to SymPy to help inform the code generation work that I'm doing as part of the CZI grant.
>
> I'm aware of the benchmarks in the benchmarks repository. My understanding is that these are run using airspeed velocity as part of the CI, and track how the performance of a particular benchmark has changed relative to the most recent SymPy release and the master branch.
>
> There are two other types of benchmark that I think might be useful:
>
> 1. Comparison of multiple ways to do equivalent computations
>
> Below is a contrived example in which there are two functions, option_1 and option_2, that produce the same result but have different implementations.
>
> >>> from sympy import Matrix, symbols
> >>>
> >>> def option_1(a, b):
> ... return Matrix([a+b, a*b]).jacobian(Matrix([a, b]))
> ...
> >>> def option_2(a, b):
> ... Matrix([[(a+b).diff(a), (a+b).diff(b)],
> ... [(a*b).diff(a), (a*b).diff(b)]])
> ...
> >>> a, b = symbols(“a, b”)
> >>> option_1(a, b) == option_2(a, b)
> True
>
> A benchmark in this case would time the execution of both option_1 and option_2 (for a range of inputs), compare the relative speeds, and report the differences. As this type of benchmark is not comparing the same benchmark across different SymPy versions, I believe that airspeed velocity may not be the best tool to use here.
>
> I see this type of benchmark as being useful for: (1) determining which algorithm to use when implementing a new function or refactoring an existing function; and (2) ensuring that an implementation remains superior to alternatives as changes are made elsewhere in SymPy.
>
> I have had success in the past implementing these sorts of benchmarks using pytest-benchmark. Is there currently anything similar anywhere is SymPy? Would the sympy/sympy_benchmarks repository be the best place to contribute PRs for these sorts of benchmarks? Does anyone have any differing opinions about how and where these should be implemented, or the value of this type of benchmark?

For now let's just add these to the benchmarks repo
https://github.com/sympy/sympy_benchmarks. asv is currently limited in
what it is able to do, but we shouldn't let that stop us from writing
useful benchmarks. The important thing is to write the benchmark down,
in a way that it can at least be run in some capacity. Better tooling
around it, CI, etc. can come later.

There have been some recent discussions about improving it and other
benchmarking tooling among some other projects in the ecosystem, and
I'll make sure to keep you involved in the conversations.

The reason we have a separate benchmarking repo is that it makes it
easier to run benchmarks across different versions of SymPy. Also,
unlike tests, it doesn't really make sense to ship benchmarks with the
SymPy releases.

>
> 2. Measurement of non-time metrics
>
> Below is another contrived example in which common subexpression elimination is used on an expression, y, and it is shown that the result of cse(y) involves fewer operations that the original expression.
>
> >>> from sympy import count_ops, cse, exp, sin, symbols
> >>>
> >>> a, b = symbols(“a, b”)
> >>> y = (sin(a/b) + (a/b) - exp(b)) * ((a/b) - exp(b))
> >>>
> >>> count_ops(y)
> 10
> >>> count_ops(cse(y))
> 6
>
> A benchmark in this case would count the number of operations in the return value from cse(y) and compare this to 6. Assuming that the implementation of the cse function has been changed, if the number of operations is six then we know that its performance hasn’t been changed by the refactor. If the count is greater than six a regression has taken place. If the count is less than six the performance of the function has been improved. Benchmarking for a range of inputs would obviously be required.
>
> I see this type of benchmark as being useful for: (1) measuring SymPy’s performance in instances where timing code snippets isn’t necessarily the best, or only valuable, indicator of performance; and (2) ensuring regressions haven’t occurred during refactoring.
>
> I believe this type of benchmark can be implemented using airspeed velocity’s track prefix. Or perhaps this type of benchmark would be best implemented as regression tests in the sympy/sympy repository’s test suite, comparing the non-time metrics to hard-coded values.

I would say both things are useful. The main benefit of having it in
the asv benchmarks is that we can see how things changed over time,
whereas having it in the test suite prevents regressions.

One thing I would say for this sort of thing in asv is that it's
possible that the implementation of count_ops itself might change or
have changed. So it might be a good idea to write a simple version of
count_ops just for use in the benchmark.

>
> As before, is there currently anything similar anywhere is SymPy? Should PRs for these sorts of benchmarks be contributed to the sympy/sympy_benchmarks repository using airspeed velocity's track or as regression tests in the sympy/sympy repository? Does anyone have any differing opinions about how and where these should be implemented, or the value of this type of benchmark?

I think it's valuable. We could do similar things with functions like
simplify(), and potentially even use it to track features being
implemented (e.g., how many of a suite of integrals is SymPy able to
compute across different versions). Again, asv is somewhat limited in
what it can do, but I'm hopeful that can be improved in the future.

Aaron Meurer

Aaron Meurer

unread,
Oct 6, 2022, 3:32:09 PM10/6/22
to sy...@googlegroups.com
These discussions are happening publicly over at
https://github.com/airspeed-velocity/asv/issues/1219. I encourage
everyone here to join that discussion and notate what you'd like to
see in the existing Python benchmarking tooling.

Aaron Meurer
Reply all
Reply to author
Forward
0 new messages