Lambdify - performance using sympy as the module

260 views
Skip to first unread message

sandona...@gmail.com

unread,
Sep 14, 2021, 6:12:40 PM9/14/21
to sympy
Hello,
let's say I'd like to numerically evaluate a single sympy function over an array using sympy as the module. Curiously, passing in regular Python's float numbers makes the evaluation much faster then passing in Sympy's Float instances. I tried several sympy functions, they tend to follow this trend.

Why does this happen? Considering that the following lambda function always return Float instances,  I thought that the input was sympified inside a sympy function, thus making the actual python's float evaluation slower due to casting...

from sympy import *
import numpy as np
from mpmath import mpf

func = sin(x)
f = np.frompyfunc(lambdify(x, func, "sympy"), 1, 1)
domain = np.linspace(0, 100, 1000)
domain_mpf = [mpf(str(t)) for t in domain]
domain_sympy = [Float(t) for t in domain]

%timeit f(domain)
%timeit f(domain_mpf)
%timeit f(domain_sympy)

Aaron Meurer

unread,
Sep 14, 2021, 7:37:04 PM9/14/21
to sympy
What are the results you get from these? For me the last one is
slightly faster, which makes sense because the inputs don't have to be
converted to SymPy Floats first.

Aaron Meurer

>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/0bac1ecd-52b2-4256-a1e6-90fa7c502030n%40googlegroups.com.

Davide Sandona'

unread,
Sep 15, 2021, 4:36:36 AM9/15/21
to sy...@googlegroups.com
What are the results you get from these? For me the last one is
slightly faster, which makes sense because the inputs don't have to be
converted to SymPy Floats first.

This chart summarizes my findings. Each dot represents the evaluation time with the specified module + data type.
I guess the results depend on the machine being used. This one has been generated on an old processor. I'll try
to see what a modern processor is going to produce.
lambdify.png

Davide.


Aaron Meurer

unread,
Sep 15, 2021, 5:22:37 AM9/15/21
to sympy
On Wed, Sep 15, 2021 at 2:36 AM Davide Sandona' <sandona...@gmail.com> wrote:
What are the results you get from these? For me the last one is
slightly faster, which makes sense because the inputs don't have to be
converted to SymPy Floats first.

This chart summarizes my findings. Each dot represents the evaluation time with the specified module + data type.
I guess the results depend on the machine being used. This one has been generated on an old processor. I'll try
to see what a modern processor is going to produce.
lambdify.png

This seems consistent with what I would expect, although I'm a little surprised that mpmath is slower than SymPy.  

Aaron Meurer

Oscar Benjamin

unread,
Sep 15, 2021, 8:41:21 AM9/15/21
to sympy
Running that here I get:

   ...: %timeit f(domain)
   ...: %timeit f(domain_mpf)
   ...: %timeit f(domain_sympy)
756 µs ± 58.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.67 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
4.46 ms ± 64.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

However if I switch from timeit to time and run each in a separate process (close and reopen isympy in between each %time) I get:

   ...: %time ok = f(domain)
CPU times: user 920 ms, sys: 4 ms, total: 924 ms
Wall time: 921 ms

   ...: %time ok = f(domain_mpf)
CPU times: user 824 ms, sys: 8 ms, total: 832 ms
Wall time: 834 ms

   ...: %time ok = f(domain_sympy)
CPU times: user 856 ms, sys: 16 ms, total: 872 ms
Wall time: 877 ms

So now f(domain) is slowest where %timeit suggested that it was fastest. Also the time measured by %time is 2-3 orders of magnitude greater than the result from %timeit.

The reason that %time shows a much slower time than %timeit is because common operations in SymPy are cached. Since %timeit times things in a loop and records only the fastest runs it only ever really measures the warm-cache time. Conversely it never really tells you the time to compute something *once* which is in fact all that really matters in most situations.

The 3 millisecond timing difference that you are asking about here is dwarfed by the actual 1 second time that it really takes to compute this result the first time. Most likely the time differences you see are just to do with exactly how efficient the cache lookups are for different types.

It is possible to run timings with the cache disabled (env var: SYMPY_USE_CACHE=no). However then SymPy is often much slower and so those timings are also not necessarily representative of the timings seen with the cache enabled which is how SymPy is normally used.

By all means investigate timeit results when micro-optimising but when you actually want to evaluate the actual real time that it takes to do something then you need to use a fresh process and time the result of running the operation precisely once. Yes, that will give noisy results compared to timeit but the results from timeit eliminate the "noise" to give a consistent but meaningless measure. Alternatively you can use %timeit but with SYMPY_USE_CACHE=no bearing in mind the caveat that in practice sympy is not normally used with that setting so it may also be unrepresentative for different reasons.

--
Oscar

Oscar Benjamin

unread,
Sep 15, 2021, 9:14:25 AM9/15/21
to sympy
On Wed, 15 Sept 2021 at 13:41, Oscar Benjamin <oscar.j....@gmail.com> wrote:
On Tue, 14 Sept 2021 at 23:12, sandona...@gmail.com <sandona...@gmail.com> wrote:
Hello,
let's say I'd like to numerically evaluate a single sympy function over an array using sympy as the module. Curiously, passing in regular Python's float numbers makes the evaluation much faster then passing in Sympy's Float instances. I tried several sympy functions, they tend to follow this trend.

The 3 millisecond timing difference that you are asking about here is dwarfed by the actual 1 second time that it really takes to compute this result the first time. Most likely the time differences you see are just to do with exactly how efficient the cache lookups are for different types.

If you want to see what is taking the time then use a profiler. From isympy/ipython you can use %prun or if you want to run separate processes you can use python -m cProfile but that will also time how long it takes to import sympy so it's only useful for things that take at least several seconds.

In the first run the time is taken by sin.eval and most of that by the extract_multiplicatively function. 

In [2]: %prun -s cumulative f(domain)

         2069604 function calls (2047615 primitive calls) in 1.585 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.585    1.585 {built-in method builtins.exec}
        1    0.001    0.001    1.585    1.585 <string>:1(<module>)
     1000    0.002    0.000    1.584    0.002 <lambdifygenerated-1>:1(_lambdifygenerated)
16989/1000    0.022    0.000    1.582    0.002 cache.py:69(wrapper)
     1000    0.014    0.000    1.579    0.002 function.py:450(__new__)
     1000    0.013    0.000    1.392    0.001 function.py:270(__new__)
     1000    0.012    0.000    1.270    0.001 trigonometric.py:266(eval)
     2997    0.029    0.000    0.988    0.000 expr.py:2183(extract_multiplicatively)
     7993    0.020    0.000    0.835    0.000 assumptions.py:460(getit)
     8993    0.336    0.000    0.722    0.000 facts.py:499(deduce_all_facts)
6993/2997    0.007    0.000    0.529    0.000 decorators.py:88(__sympifyit_wrapper)
     2997    0.010    0.000    0.525    0.000 numbers.py:1315(__truediv__)
      999    0.005    0.000    0.511    0.001 expr.py:2453(could_extract_minus_sign)
      999    0.001    0.000    0.505    0.001 expr.py:1662(as_coefficient)
      999    0.001    0.000    0.480    0.000 numbers.py:759(__truediv__)
      999    0.001    0.000    0.478    0.000 decorators.py:254(_func)
      999    0.001    0.000    0.476    0.000 decorators.py:129(binary_op_wrapper)
      999    0.003    0.000    0.474    0.000 expr.py:260(__truediv__)
     4996    0.015    0.000    0.459    0.000 assumptions.py:472(_ask)
      999    0.015    0.000    0.457    0.000 operations.py:46(__new__)
      999    0.037    0.000    0.408    0.000 mul.py:178(flatten)
     3997    0.005    0.000    0.357    0.000 assumptions.py:444(copy)
     3997    0.014    0.000    0.352    0.000 assumptions.py:432(__init__)
   198823    0.074    0.000    0.254    0.000 {built-in method builtins.all}
   570492    0.158    0.000    0.209    0.000 facts.py:533(<genexpr>)
     1999    0.032    0.000    0.126    0.000 sets.py:1774(__new__)
   183867    0.083    0.000    0.083    0.000 facts.py:482(_tell)
      999    0.007    0.000    0.078    0.000 evalf.py:1425(evalf)
    15987    0.023    0.000    0.077    0.000 sympify.py:92(sympify)
     1000    0.003    0.000    0.066    0.000 function.py:214(nargs)
     7996    0.026    0.000    0.065    0.000 compatibility.py:501(ordered)
     6994    0.015    0.000    0.063    0.000 numbers.py:1197(_new)
    29977    0.020    0.000    0.057    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
   391655    0.054    0.000    0.054    0.000 {method 'get' of 'dict' objects}
 1998/999    0.007    0.000    0.053    0.000 evalf.py:1332(evalf)
     1998    0.053    0.000    0.053    0.000 mul.py:449(_gather)
      999    0.005    0.000    0.046    0.000 evalf.py:781(evalf_trig)
   ...

Now a second run in the same process just shows 1000 cache lookups:

In [3]: %prun -s cumulative f(domain)

         2003 function calls in 0.002 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
        1    0.001    0.001    0.002    0.002 <string>:1(<module>)
     1000    0.001    0.000    0.001    0.000 <lambdifygenerated-1>:1(_lambdifygenerated)
     1000    0.001    0.000    0.001    0.000 cache.py:69(wrapper)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This is the second run when the inputs are Expr:

In [5]: %prun -s cumulative f(domain_sympy)
         5003 function calls in 0.014 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.014    0.014 {built-in method builtins.exec}
        1    0.007    0.007    0.014    0.014 <string>:1(<module>)
     1000    0.002    0.000    0.008    0.000 <lambdifygenerated-1>:1(_lambdifygenerated)
     1000    0.002    0.000    0.006    0.000 cache.py:69(wrapper)
     1000    0.001    0.000    0.004    0.000 numbers.py:1480(__hash__)
     1000    0.001    0.000    0.002    0.000 numbers.py:806(__hash__)
     1000    0.001    0.000    0.001    0.000 expr.py:126(__hash__)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The cache lookups here are slower because the pure Python Expr.__hash__ methods are slower than for numpy's float64.__hash__ which is implemented in C.

Either way this shows that the differences are all just to do with caching and are not representative of the actual time that it would take to compute these results once. The bulk of the time in the first run is ultimately consumed by deduce_all_facts i.e. the old assumptions. Note that the old assumptions are themselves cached on each Basic instance separately from the main cache which can also affect timing results if you don't use a separate process for timings:

In [1]: e = sin(1)

In [2]: e._assumptions
Out[2]: {}

In [3]: e.is_real
Out[3]: True

In [4]: e._assumptions
Out[4]:
{'real': True,
 'positive': True,
 'finite': True,
 'infinite': False,
 'extended_positive': True,
 'extended_real': True,
 'commutative': True,
 'imaginary': False,
 'hermitian': True,
 'complex': True,
 'extended_nonnegative': True,
 'nonpositive': False,
 'negative': False,
 'extended_nonpositive': False,
 'extended_nonzero': True,
 'extended_negative': False,
 'zero': False,
 'nonnegative': True,
 'nonzero': True}

The deduce_all_facts function is the one that applies all of the implications relating these different assumption predicates so that as soon as one predicate (e.g. positive=True) is known then all of the others can be stored in the _assumptions dict. Although this makes things faster on a second assumptions query it does in fact consume the bulk of the time for computing many things in SymPy.

Assumptions queries that take place during automatic evaluation are often a cause of slowness in SymPy. The bottom of the profile report shows evalf being called 2000 times i.e. twice per evaluation of the sin function. The evalf calls are part of the assumptions query and are reasonably fast for a simple sin(float) but can be very expensive for large expressions.


--
Oscar

Davide Sandona'

unread,
Sep 15, 2021, 9:19:53 AM9/15/21
to sy...@googlegroups.com
Thanks Oscar for the wonderful clarification!

I rerun my code with SYMPY_USE_CACHE=no, now the results make much more sense.

lambdify-no-cache.png

Davide.


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

Aaron Meurer

unread,
Sep 16, 2021, 5:58:46 PM9/16/21
to sympy
What is the difference between "numpy" and "scipy" in your graph? I wouldn't expect scipy to make any difference for sin or cos which are NumPy functions. 

Aaron Meurer

Aaron Meurer

unread,
Sep 16, 2021, 6:10:56 PM9/16/21
to sympy
I suppose you might be using f = lambdify(x, sin(x), 'scipy'). This produces a function that calls scipy.sin instead of numpy.sin. However, scipy.sin is just a deprecated wrapper for np.sin, which would explain why it is a little slower. 

I'm a little unclear why it doesn't raise a deprecation warning when you call it in the lambdified function, but you can confirm that it is indeed using scipy.sin by looking at f.__globals__['sin']. In any case, you should generally not just use 'scipy' by itself, but rather ['numpy', 'scipy']. Actually lambdify already includes scipy by default, so there's no need to specify either. See https://github.com/sympy/sympy/issues/20294.

Aaron Meurer
Reply all
Reply to author
Forward
0 new messages