Blog post about the the future of SymPy

Oscar Benjamin

unread,

Aug 16, 2023, 11:14:05 AM8/16/23

to sympy

Hi all,

I have written a blog post about recent work on SymPy and what I think
should be future development plans:

https://oscarbenjamin.github.io/blog/czi/post1.html

This is the first in a series of posts and in this one I describe what
I think are the core parts of SymPy and outline from a high level what
I think should be done going forwards. Subsequent posts will describe
specific components in more detail.

Oscar

Aaron Meurer

unread,

Aug 16, 2023, 2:09:07 PM8/16/23

to sy...@googlegroups.com

Thank you for the blog post Oscar. I submitted it to Hacker News as
well https://news.ycombinator.com/item?id=37150867

I agree with all the things you wrote. I agree with the idea of having
computation subsystems that are targeted and fast. This concept is
related to what James Powell calls a "restricted computational
domain". See https://twitter.com/dontusethiscode/status/1362809888360005636
(unfortunately you need to be signed into Twitter to read that whole
thread. If you don't have an account you can read it here
https://web.archive.org/web/20210219170345/https://twitter.com/dontusethiscode/status/1362809888360005636).

One thing I like about the polys module is that everything is
organized in layers, where the lower layers operate directly on data
structures and the top layer is designed for user interaction. The top
layer has things that are a little slower but more user friendly, like
type checking, which the bottom layers do not have.

One of the reasons SymPy ended up the way it is is that there's really
only one symbolic system. This has made things very simple and makes
the whole thing quite easy for developers to work on, but it also
means that there's no distinction between lower level fast data
structures and higher level data structures. And in particular, it
means that slower abstractions like automatic evaluation have leaked
in which really should only ever be part of the top level layers.

I look forward to reading the followup blog posts.

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxTNyu8eXUZM_SesY7DAkUbpeFd--55wzkF4opfZ1jYeUg%40mail.gmail.com.

Aaron Meurer

unread,

Aug 17, 2023, 1:36:43 AM8/17/23

to sy...@googlegroups.com

On Wed, Aug 16, 2023 at 12:08 PM Aaron Meurer <asme...@gmail.com> wrote:
>
> Thank you for the blog post Oscar. I submitted it to Hacker News as
> well https://news.ycombinator.com/item?id=37150867
>
> I agree with all the things you wrote. I agree with the idea of having
> computation subsystems that are targeted and fast. This concept is
> related to what James Powell calls a "restricted computational
> domain". See https://twitter.com/dontusethiscode/status/1362809888360005636
> (unfortunately you need to be signed into Twitter to read that whole
> thread. If you don't have an account you can read it here
> https://web.archive.org/web/20210219170345/https://twitter.com/dontusethiscode/status/1362809888360005636).

By the way, this talk in particular by James is very relevant to your
work https://www.youtube.com/watch?v=Ix04KpZiUA8. I'd say your
approach to making SymPy faster exactly lines up with the general
framework for improving Python performance that he outlines in that
talk.

Aaron Meurer

David Bailey

unread,

Aug 19, 2023, 5:52:10 PM8/19/23

to sy...@googlegroups.com

I do wonder slightly if pushing for higher speed in symbolic operations
is worth it if it causes a lot of disruption, or distracts from other
work that might be more useful.

Symbolic calculations are wonderful in the right circumstances, however,
the cost of symbolic calculations grows pretty steeply with the size of
the problem.

Everything becomes clumsier, for example fractional coefficients get
much more complicated after a few mathematical steps.

That means that multiplying the raw computational speed by 10 (say) may
only expand the pool of problems that can be solved in a reasonable time
very slightly.

There is also the problem that complicated symbolic results are often
not very useful. For example the solutions of a quadratic equation are
well known and very useful. However, the corresponding solutions for
cubic or quartic equations are really only of theoretical interest. It
is even possible to get solutions for quintic equations (so I am told)
by using theta functions. My point is, that these are of no practical use.

Do you have some motivational examples of what can be achieved by moving
from SymPy to SymEngine? (I just use SymEngine for comparison purposes)

David

Peter Stahlecker

unread,

Aug 19, 2023, 11:41:23 PM8/19/23

to sy...@googlegroups.com

I use sympy only via sympy.physics.mechanics to set up symbolic equations of motion for rigid body systems. Some of them may take quite some time be be set up.

I would imagine, that a 'faster sympy' may make this process faster, too.

( I personally only play around with this, but I am sure there are real world applications using sympy.physics.mechanics)

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/547aac8a-68b4-ac68-c0b6-c7d5cda9a44e%40dbailey.co.uk.

--

Best regards,

Peter Stahlecker

Oscar Benjamin

unread,

Aug 20, 2023, 3:47:36 PM8/20/23

to sy...@googlegroups.com

On Sat, 19 Aug 2023 at 22:52, David Bailey <da...@dbailey.co.uk> wrote:
>
> On 16/08/2023 16:13, Oscar Benjamin wrote:
> > Hi all,
> >
> > I have written a blog post about recent work on SymPy and what I think
> > should be future development plans:
> >
> > https://oscarbenjamin.github.io/blog/czi/post1.html
> >
> > This is the first in a series of posts and in this one I describe what
> > I think are the core parts of SymPy and outline from a high level what
> > I think should be done going forwards. Subsequent posts will describe
> > specific components in more detail.
> >
> > Oscar
> >
> I do wonder slightly if pushing for higher speed in symbolic operations
> is worth it if it causes a lot of disruption, or distracts from other
> work that might be more useful.

I think it is worth noting that most of the changes I am proposing are
also good ideas for other reasons besides just speed. I have to
emphasise speed though because I have promised to improve speed and I
need to write about that. The changes are also good though for
reducing bugs, making the codebase easier to maintain, adding features
that users want etc.

> Symbolic calculations are wonderful in the right circumstances, however,
> the cost of symbolic calculations grows pretty steeply with the size of
> the problem.
>
> Everything becomes clumsier, for example fractional coefficients get
> much more complicated after a few mathematical steps.

What you say is true but...

> That means that multiplying the raw computational speed by 10 (say) may
> only expand the pool of problems that can be solved in a reasonable time
> very slightly.

I think that you greatly underestimate the enormity of the problem of
some things in SymPy being *much* slower than they could be. Here is a
simple example:

With SymPy 1.12 I will time computing the inverse of a matrix or
integers (note that I have gmpy2 installed in all cases which does
affect the timings).

The randMatrix function creates a matrix of a given size having
integer entries from 0 to 99:

In [3]: print(repr(randMatrix(4)))
Matrix([
[ 4, 61, 23, 16],
[88, 22, 34, 57],
[84, 77, 47, 60],
[74, 33, 66, 32]])

Let's time with random matrices of size 1x1, 2x2, 5x5 etc up to 18x18
(Wall time is the thing to look at):

In [5]: for n in [1,2,5,10,15,16,17,18]:
...: M = randMatrix(n)
...: print(f'\ntime for inverse of {n}x{n} matrix of small
integers (SymPy 1.12)')
...: %time ok = M.inv()
...:

time for inverse of 1x1 matrix of small integers (SymPy 1.12)
CPU times: user 4.12 ms, sys: 17 µs, total: 4.13 ms
Wall time: 4.16 ms

time for inverse of 2x2 matrix of small integers (SymPy 1.12)
CPU times: user 3.7 ms, sys: 0 ns, total: 3.7 ms
Wall time: 3.72 ms

time for inverse of 5x5 matrix of small integers (SymPy 1.12)
CPU times: user 25.2 ms, sys: 0 ns, total: 25.2 ms
Wall time: 25 ms

time for inverse of 10x10 matrix of small integers (SymPy 1.12)
CPU times: user 60.3 ms, sys: 0 ns, total: 60.3 ms
Wall time: 60.1 ms

time for inverse of 15x15 matrix of small integers (SymPy 1.12)
CPU times: user 2.29 s, sys: 13.3 ms, total: 2.3 s
Wall time: 2.3 s

time for inverse of 16x16 matrix of small integers (SymPy 1.12)
CPU times: user 7.36 s, sys: 10 ms, total: 7.37 s
Wall time: 7.53 s

time for inverse of 17x17 matrix of small integers (SymPy 1.12)
CPU times: user 17.9 s, sys: 33.7 ms, total: 18 s
Wall time: 18 s

time for inverse of 18x18 matrix of small integers (SymPy 1.12)
CPU times: user 1min 30s, sys: 117 ms, total: 1min 30s
Wall time: 1min 30s

There is no way that it should be that slow. The big-O is also all
wrong and is exponential or worse.

With the current SymPy master branch it is:

In [24]: for n in [1,2,5,10,15,16,17,18]:
...: M = randMatrix(n)
...: print(f'\ntime for inverse of {n}x{n} matrix of small
integers (master)')
...: %time ok = M.inv()
...:

time for inverse of 1x1 matrix of small integers (master)
CPU times: user 34.8 ms, sys: 0 ns, total: 34.8 ms
Wall time: 34.4 ms

time for inverse of 2x2 matrix of small integers (master)
CPU times: user 1.58 ms, sys: 0 ns, total: 1.58 ms
Wall time: 1.59 ms

time for inverse of 5x5 matrix of small integers (master)
CPU times: user 3.15 ms, sys: 0 ns, total: 3.15 ms
Wall time: 3.16 ms

time for inverse of 10x10 matrix of small integers (master)
CPU times: user 16 ms, sys: 0 ns, total: 16 ms
Wall time: 16 ms

time for inverse of 15x15 matrix of small integers (master)
CPU times: user 21.7 ms, sys: 0 ns, total: 21.7 ms
Wall time: 21.7 ms

time for inverse of 16x16 matrix of small integers (master)
CPU times: user 21.2 ms, sys: 0 ns, total: 21.2 ms
Wall time: 21.2 ms

time for inverse of 17x17 matrix of small integers (master)
CPU times: user 18.8 ms, sys: 0 ns, total: 18.8 ms
Wall time: 18.8 ms

time for inverse of 18x18 matrix of small integers (master)
CPU times: user 22.9 ms, sys: 0 ns, total: 22.9 ms
Wall time: 22.9 ms

The difference for an 18x18 matrix is that master is 4000x faster than
SymPy 1.12 but this ratio increases as you go to larger sizes. With
SymPy 1.12 I have to wait 1.5 minutes for an 18x18 matrix. With
current master I can invert a 300x300 matrix in that time:

In [6]: %time ok = randMatrix(300).inv()
CPU times: user 1min 29s, sys: 136 ms, total: 1min 29s
Wall time: 1min 29s

I estimate that with SymPy 1.12 the complexity of this operation is
something like O(3^n) and so the time to invert a 300x300 matrix would
be something like:

In [9]: 90*(3**(300-18)) # units of seconds
Out[9]:
318006751451727010731725715838119971940403301857387549847246646448097631356170140331478920672468737
83545669933976619042089991647160696810

That's about 1e120 times the age of the universe.

So no some things are very slow and can be made a lot faster. It does
also mean a significant increase in problem size (e.g. from 18x18 to
300x300 in this example). The particular operation that I showed here
can be made faster still but hopefully it gives a sense of the
magnitude of speedups that are possible.

Another example is this:

https://github.com/sympy/sympy/issues/19920

In that issue SymPy is essentially unable to compute the inverse of a
3x3 matrix with simple symbolic expressions as the entries:

In [19]: from sympy import symbols, Matrix
...:
...: a, b, c, d = symbols('a b c d')
...:
...: Z = Matrix([
...: [1/b + 1/a, -1/b, 0],
...: [ -1/b, 1/c + 1/b, -1/c],
...: [ 0, -1/c, 1/d + 1/c]])
...:
...: %time Zinv = Z.inv()
^C---------------------------------------------------------------------------
KeyboardInterrupt

I left that running for 5 minutes before giving up. It was previously
faster in older SymPy versions but then slowed down because of a
change in algorithm. The change of algorithm was motivated by speeding
up some other things though. The problem is that in the current
implementation it is difficult to choose the best algorithm because we
do not have any concept like "domains" that we can use to classify
arbitrary expressions and then consider which algorithm to use.

There is now a faster matrix class called DomainMatrix that can do
this and uses domains to distinguish different kinds of expressions.
With current master you can create this matrix with the to_DM()
method:

In [20]: dZ = Z.to_DM()

In [21]: dZ
Out[21]: DomainMatrix({0: {0: (a + b)/(a*b), 1: -1/b}, 1: {0: -1/b, 1:
(b + c)/(b*c), 2: -1/c}, 2: {1: -1/c, 2: (c + d)/(c*d)}}, (3, 3),
ZZ(a,b,c,d))

The important part to notice is the domain: ZZ(a,b,c,d) which says
that we use rational functions (ratios of polynomials) in the 4
symbols a, b, c and d. We can convert a matrix to this format and
compute the result and convert back to an ordinary matrix:

In [22]: %time Zinv = Z.to_DM().inv().to_Matrix()
CPU times: user 75.5 ms, sys: 0 ns, total: 75.5 ms
Wall time: 73 ms

In [23]: print(repr(Zinv))
Matrix([
[(a*b + a*c + a*d)/(a + b + c + d), (a*c + a*d)/(a + b + c
+ d), a*d/(a + b + c + d)],
[ (a*c + a*d)/(a + b + c + d), (a*c + a*d + b*c + b*d)/(a + b + c
+ d), (a*d + b*d)/(a + b + c + d)],
[ a*d/(a + b + c + d), (a*d + b*d)/(a + b + c
+ d), (a*d + b*d + c*d)/(a + b + c + d)]])

Now the time goes from an hour to 70 milliseconds. We can also make
this faster still very easily. Currently on master this faster class
is not used in this case but it is in the integer case above. What is
needed is to enable this for more cases.

> There is also the problem that complicated symbolic results are often
> not very useful. For example the solutions of a quadratic equation are
> well known and very useful. However, the corresponding solutions for
> cubic or quartic equations are really only of theoretical interest. It
> is even possible to get solutions for quintic equations (so I am told)
> by using theta functions. My point is, that these are of no practical use.

No but we can still represent the roots of polynomials of higher
degree as RootOf:

In [15]: M = randMatrix(5)

In [16]: print(M.eigenvals())
{CRootOf(x**5 - 287*x**4 + 8804*x**3 + 352285*x**2 - 12562287*x +
173719704, 0): 1, CRootOf(x**5 - 287*x**4 + 8804*x**3 + 352285*x**2 -
12562287*x + 173719704, 1): 1, CRootOf(x**5 - 287*x**4 + 8804*x**3 +
352285*x**2 - 12562287*x + 173719704, 2): 1, CRootOf(x**5 - 287*x**4 +
8804*x**3 + 352285*x**2 - 12562287*x + 173719704, 3): 1, CRootOf(x**5
- 287*x**4 + 8804*x**3 + 352285*x**2 - 12562287*x + 173719704, 4): 1}

Knowing the minimal polynomial of each eigenvalue is sufficient (in
the computational algebra subsystem) to compute all of the
eigenvectors and do many more things without needing any quintic
formula.

At some point I will write something to explain why even the quadratic
formula should be avoided.

> Do you have some motivational examples of what can be achieved by moving
> from SymPy to SymEngine? (I just use SymEngine for comparison purposes)

As far as I know the advantages of using SymEngine are only for speed.
SymEngine does not have as many features as SymPy but it has many of
the "core" parts and it is *much* faster for those.

There are other reasons for moving SymPy to a new design of symbolic
subsystem that are not only motivated by speed though but would also
make things a lot faster. I will explain more later.

If you are concerned about the time spent working on this then
understand that for many of these things much of the hard work has
already been done. There is now a faster implementation of e.g. linear
algebra (DomainMatrix) in SymPy already and most of the difficult code
for it has already been written but it is not yet being used by
default. Users can use it directly if they want to (to_DM()) but
really most users want to call high-level functions liks solve etc and
so what needs to be done is to flip a few switches so that the fast
code gets used by default instead of the slow code. Then things will
be faster automatically for end users. Here are three pull requests
that show examples of this:

https://github.com/sympy/sympy/pull/25458 (charpoly)
https://github.com/sympy/sympy/pull/25443 (rref for ZZ/QQ)
https://github.com/sympy/sympy/pull/25452 (inv)

It might not be obvious when looking at those pull requests but in
each case I had to limit the scope and argue carefully about what
operations may or may not be affected by the more limited change I was
proposing. I have to do this so that other SymPy developers can
understand and hopefully merge the changes but actually it would be
better not to limit the scope in this way because it means that we are
not improving all of the things that can be improved.

The problem though is that enabling the faster code fully would change
the form of outputs in some cases and would "break" some test cases in
the test suite. Many people who might review a pull request might
object to e.g. test cases being changed but this is actually what
needs to happen. I have so far avoided more disruptive changes because
I just wanted to get some pull requests merged that showed some
significant speed ups so that at least some effect of the changes
would be visible.

To get other SymPy contributors to understand the need for making
these changes I need to ensure that they understand what the changes
are and what their impact is which is partly why I'm writing these
blog posts. Hopefully if more developers have a good understanding of
these things then it will be easier for them to review these changes
and at least people will have a better understanding of the reasons
for them. Also for some things it is actually better if users
understand how to use them rather than just expecting SymPy itself to
figure out the best algorithm for them.

--
Oscar

Freya the Goddess

unread,

Aug 21, 2023, 3:05:06 AM8/21/23

to sy...@googlegroups.com

Dear Oscar Benjamin,

great blog post, I read it and was wondering about Flint, can Flint (C Library) do all that SymPy capable of now? Like computing symbolic derivative and integral. Flint was released in 2021, very new. Even Fortran have no competitor for Python' SymPy, Julia Lang uses SymPy from Python in Conda. A great plus for Python.

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxTNyu8eXUZM_SesY7DAkUbpeFd--55wzkF4opfZ1jYeUg%40mail.gmail.com.

--

С наилучшими пожеланиями, Богиня Фрейя

Atenciosamente, Freya the Goddess

Meilleurs voeux, Freya the Goddess

Liebe Grüße, Freya the Goddess

Best wishes, Freya the Goddess

よろしくお願いします、Freya the Goddess

最好的祝福，Freya the Goddess

Matakwa mema, Freya the Goddess

مع أطيب التمنيات ، فريا الإلهة

David Bailey

unread,

Aug 21, 2023, 3:55:16 PM8/21/23

to sy...@googlegroups.com

Oscar,

Thanks for your very detailed reply.

However, I do wonder if inversions of integer matrices of increasing
size is the best test of the speed of SymPy vs another product.

Maybe some messy algebra is better. For example:

series(cos(k*x/(1-k*(1-k*x**2)/(1-x))),x,0,n)

Where n is successively replaced by
6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,etc might be a bit more realistic.

It might be best if the result is assigned to a variable so as to
exclude the cost of converting it to a string.

I am sure I am teaching my grandmother to suck eggs (not that either of
them ever did), but I hope you can see my point.

David

Aaron Meurer

unread,

Aug 21, 2023, 4:01:47 PM8/21/23

to sy...@googlegroups.com

On Mon, Aug 21, 2023 at 1:55 PM David Bailey <da...@dbailey.co.uk> wrote:
>
> Oscar,
>
> Thanks for your very detailed reply.
>
> However, I do wonder if inversions of integer matrices of increasing
> size is the best test of the speed of SymPy vs another product.
>
> Maybe some messy algebra is better. For example:
>
> series(cos(k*x/(1-k*(1-k*x**2)/(1-x))),x,0,n)
>
> Where n is successively replaced by
> 6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,etc might be a bit more realistic.

The main reason Oscar benchmarked matrices is that that's the part of
SymPy that he's focused on making faster so far. Actually, matrices
are more important than you'd think. They end up being used internally
in many calculations, in places like the solvers or integrals, so
making matrices faster will also make other parts of SymPy faster.

But actually, matrix inverse and your series example are very similar.
They both produce unwieldy expressions when computed naively. But
these expressions are much simpler if they are simplified:

>>> series(cos(k*x/(1-k*(1-k*x**2)/(1-x))),x,0,10)
1 + x**4*(k**4/(24*(1 - k)**4) - k**2*(2*k**2/(k - 1) + 3*k**2/(k -
1)**2 - 2*k/(k - 1))/(2*(1 - k)**2)) + x**5*(-k**5/(6*(1 - k)**4*(k -
1)) - k**2*(-4*k**3/(k - 1)**3 + 2*k**2/(k - 1) + 6*k*(-k**2/(k - 1) +
k/(k - 1))/(k - 1) - 2*k/(k - 1))/(2*(1 - k)**2)) +
x**6*(-k**6/(720*(1 - k)**6) + k**4*(4*k**2/(k - 1) + 10*k**2/(k -
1)**2 - 4*k/(k - 1))/(24*(1 - k)**4) - k**2*(5*k**4/(k - 1)**4 +
2*k**2/(k - 1) - 12*k**2*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**2 +
6*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) - 2*k/(k - 1) + 3*(-k**2/(k -
1) + k/(k - 1))**2)/(2*(1 - k)**2)) + x**7*(k**7/(120*(1 - k)**6*(k -
1)) + k**4*(-20*k**3/(k - 1)**3 + 4*k**2/(k - 1) + 20*k*(-k**2/(k - 1)
+ k/(k - 1))/(k - 1) - 4*k/(k - 1))/(24*(1 - k)**4) - k**2*(-6*k**5/(k
- 1)**5 + 20*k**3*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**3 + 2*k**2/(k -
1) - 4*k**2*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**2 - 8*k*(-k**2/(k -
1) + k/(k - 1))**2/(k - 1) + 6*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) -
4*k*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) + (-k**2/(k - 1) + k/(k -
1))**2)/(k - 1) - 2*k/(k - 1) + 6*(-k**2/(k - 1) + k/(k -
1))**2)/(2*(1 - k)**2)) + x**8*(k**8/(40320*(1 - k)**8) -
k**6*(6*k**2/(k - 1) + 21*k**2/(k - 1)**2 - 6*k/(k - 1))/(720*(1 -
k)**6) + k**4*(35*k**4/(k - 1)**4 + 4*k**2/(k - 1) - 60*k**2*(-k**2/(k
- 1) + k/(k - 1))/(k - 1)**2 + 20*k*(-k**2/(k - 1) + k/(k - 1))/(k -
1) - 4*k/(k - 1) + 10*(-k**2/(k - 1) + k/(k - 1))**2)/(24*(1 - k)**4)
- k**2*(7*k**6/(k - 1)**6 - 30*k**4*(-k**2/(k - 1) + k/(k - 1))/(k -
1)**4 + 5*k**3*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**3 + 2*k**2/(k - 1)
+ 15*k**2*(-k**2/(k - 1) + k/(k - 1))**2/(k - 1)**2 - 4*k**2*(-k**2/(k
- 1) + k/(k - 1))/(k - 1)**2 - 8*k*(-k**2/(k - 1) + k/(k - 1))**2/(k -
1) + 6*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) - 4*k*(2*k*(-k**2/(k - 1)
+ k/(k - 1))/(k - 1) + 2*(-k**2/(k - 1) + k/(k - 1))**2)/(k - 1) +
5*k*(k**2*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**2 + 2*k*(-k**2/(k - 1)
+ k/(k - 1))**2/(k - 1) + k*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) +
(-k**2/(k - 1) + k/(k - 1))**2)/(k - 1))/(k - 1) - 2*k/(k - 1) +
9*(-k**2/(k - 1) + k/(k - 1))**2 - 4*(-k**2/(k - 1) + k/(k -
1))*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) + (-k**2/(k - 1) + k/(k -
1))**2))/(2*(1 - k)**2)) + x**9*(-k**9/(5040*(1 - k)**8*(k - 1)) -
k**6*(-56*k**3/(k - 1)**3 + 6*k**2/(k - 1) + 42*k*(-k**2/(k - 1) +
k/(k - 1))/(k - 1) - 6*k/(k - 1))/(720*(1 - k)**6) + k**4*(-56*k**5/(k
- 1)**5 + 140*k**3*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**3 + 4*k**2/(k
- 1) - 20*k**2*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**2 - 40*k*(-k**2/(k
- 1) + k/(k - 1))**2/(k - 1) + 20*k*(-k**2/(k - 1) + k/(k - 1))/(k -
1) - 20*k*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) + (-k**2/(k - 1) +
k/(k - 1))**2)/(k - 1) - 4*k/(k - 1) + 20*(-k**2/(k - 1) + k/(k -
1))**2)/(24*(1 - k)**4) - k**2*(-8*k**7/(k - 1)**7 + 42*k**5*(-k**2/(k
- 1) + k/(k - 1))/(k - 1)**5 - 6*k**4*(-k**2/(k - 1) + k/(k - 1))/(k -
1)**4 - 24*k**3*(-k**2/(k - 1) + k/(k - 1))**2/(k - 1)**3 +
5*k**3*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**3 + 2*k**2/(k - 1) +
15*k**2*(-k**2/(k - 1) + k/(k - 1))**2/(k - 1)**2 - 4*k**2*(-k**2/(k -
1) + k/(k - 1))/(k - 1)**2 - 8*k*(-k**2/(k - 1) + k/(k - 1))**2/(k -
1) + 6*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) - 4*k*(2*k*(-k**2/(k - 1)
+ k/(k - 1))/(k - 1) + 3*(-k**2/(k - 1) + k/(k - 1))**2)/(k - 1) -
6*k*(k**3*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**3 + 3*k**2*(-k**2/(k -
1) + k/(k - 1))**2/(k - 1)**2 + k*(k**2*(-k**2/(k - 1) + k/(k - 1))/(k
- 1)**2 + 2*k*(-k**2/(k - 1) + k/(k - 1))**2/(k - 1) +
k*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) + (-k**2/(k - 1) + k/(k -
1))**2)/(k - 1))/(k - 1))/(k - 1) + 5*k*(k**2*(-k**2/(k - 1) + k/(k -
1))/(k - 1)**2 + 2*k*(-k**2/(k - 1) + k/(k - 1))**2/(k - 1) +
k*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) + 2*(-k**2/(k - 1) + k/(k -
1))**2)/(k - 1) + (-k**2/(k - 1) + k/(k - 1))*(2*k*(-k**2/(k - 1) +
k/(k - 1))/(k - 1) + (-k**2/(k - 1) + k/(k - 1))**2))/(k - 1) - 2*k/(k
- 1) + 12*(-k**2/(k - 1) + k/(k - 1))**2 - 4*(-k**2/(k - 1) + k/(k -
1))*(2*k*(-k**2/(k - 1) + k/(k - 1))/(k - 1) + (-k**2/(k - 1) + k/(k -
1))**2) - 4*(-k**2/(k - 1) + k/(k - 1))*(2*k*(-k**2/(k - 1) + k/(k -
1))/(k - 1) + 2*(-k**2/(k - 1) + k/(k - 1))**2) + 5*(-k**2/(k - 1) +
k/(k - 1))*(k**2*(-k**2/(k - 1) + k/(k - 1))/(k - 1)**2 +
2*k*(-k**2/(k - 1) + k/(k - 1))**2/(k - 1) + k*(2*k*(-k**2/(k - 1) +
k/(k - 1))/(k - 1) + (-k**2/(k - 1) + k/(k - 1))**2)/(k - 1)))/(2*(1 -
k)**2)) - k**2*x**2/(2*(1 - k)**2) + k**3*x**3/((1 - k)**2*(k - 1)) +
O(x**10)
>>> series(cos(k*x/(1-k*(1-k*x**2)/(1-x))),x,0,10).simplify()
(-40320 + 362880*k - 1451520*k**2 + 20160*k**2*x**2 + 3386880*k**3 -
141120*k**3*x**2 + 40320*k**3*x**3 + 40320*k**3*x**4 + 40320*k**3*x**5
+ 40320*k**3*x**6 + 40320*k**3*x**7 + 40320*k**3*x**8 +
40320*k**3*x**9 - 5080320*k**4 + 423360*k**4*x**2 - 241920*k**4*x**3 -
223440*k**4*x**4 - 161280*k**4*x**5 - 100800*k**4*x**6 -
40320*k**4*x**7 + 20160*k**4*x**8 + 80640*k**4*x**9 + 5080320*k**5 -
705600*k**5*x**2 + 604800*k**5*x**3 + 552720*k**5*x**4 +
194880*k**5*x**5 - 67200*k**5*x**6 - 248640*k**5*x**7 -
349440*k**5*x**8 - 369600*k**5*x**9 - 3386880*k**6 + 705600*k**6*x**2
- 806400*k**6*x**3 - 823200*k**6*x**4 + 107520*k**6*x**5 +
581336*k**6*x**6 + 685440*k**6*x**7 + 527520*k**6*x**8 +
208320*k**6*x**9 + 1451520*k**7 - 423360*k**7*x**2 + 604800*k**7*x**3
+ 823200*k**7*x**4 - 564480*k**7*x**5 - 1024968*k**7*x**6 -
651504*k**7*x**7 + 30576*k**7*x**8 + 679056*k**7*x**9 - 362880*k**8 +
141120*k**8*x**2 - 241920*k**8*x**3 - 552720*k**8*x**4 +
672000*k**8*x**5 + 984648*k**8*x**6 + 53088*k**8*x**7 -
865033*k**8*x**8 - 1208256*k**8*x**9 + 40320*k**9 - 20160*k**9*x**2 +
40320*k**9*x**3 + 223440*k**9*x**4 - 369600*k**9*x**5 -
621656*k**9*x**6 + 430416*k**9*x**7 + 1073353*k**9*x**8 +
711752*k**9*x**9 - 40320*k**10*x**4 + 80640*k**10*x**5 +
268800*k**10*x**6 - 389760*k**10*x**7 - 675696*k**10*x**8 +
75936*k**10*x**9 - 60480*k**11*x**6 + 120960*k**11*x**7 +
278880*k**11*x**8 - 309120*k**11*x**9 - 80640*k**12*x**8 +
161280*k**12*x**9 + O(x**10))/(40320*(k**9 - 9*k**8 + 36*k**7 -
84*k**6 + 126*k**5 - 126*k**4 + 84*k**3 - 36*k**2 + 9*k - 1))

By computing matrix inverse (or series) using the polys, you can avoid
the enormous expressions because the polys can only represent
polynomials in the more simplified "expanded" form. So cancellation of
terms happens automatically rather than being buried in the large
expression in a way that can never actually reveal itself until you
expand everything out. So the exact same principles that Oscar has
been talking about apply to this series expansion. The main difference
is that series expansions aren't used by as many other algorithms
(they are used in limits, but that's about it), whereas matrices are
used all over the place, so focusing on matrices for now can lead to
bigger performance wins.

Aaron Meurer

>
> It might be best if the result is assigned to a variable so as to
> exclude the cost of converting it to a string.
>
> I am sure I am teaching my grandmother to suck eggs (not that either of
> them ever did), but I hope you can see my point.
>
> David
>

> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/c4796186-9cfd-d0de-6e0e-5de3ad5c343f%40dbailey.co.uk.

David Bailey

unread,

Aug 21, 2023, 4:35:01 PM8/21/23

to sy...@googlegroups.com

On 21/08/2023 21:01, Aaron Meurer wrote:

Thanks Aaron for your amazingly fast response!

> The main reason Oscar benchmarked matrices is that that's the part of
> SymPy that he's focused on making faster so far. Actually, matrices
> are more important than you'd think. They end up being used internally
> in many calculations, in places like the solvers or integrals, so
> making matrices faster will also make other parts of SymPy faster.
>
> But actually, matrix inverse and your series example are very similar.
> They both produce unwieldy expressions when computed naively. But
> these expressions are much simpler if they are simplified:

If Oscar had mentioned that, I would never have written that reply -
perhaps he forgot that he was not talking to a SymPy developer!

Is there a readable account explaining how the internals of SymPy
perform their algebraic/calculus manipulations?

David

Sangyub Lee

unread,

Aug 31, 2023, 9:57:57 PM8/31/23

to sympy

Thanks for the blog post.

Unfortunately, I'm outside from using SymPy recently because
I'm now mainly involved in projects that use React and Typescript heavily,

However, I particularly find things like mathjs
josdejong/mathjs: An extensive math library for JavaScript and Node.js (github.com)
cortex-js/compute-engine: An engine for symbolic manipulation and numeric evaluation of math formulas expressed with MathJSON (github.com)
to be interesting, and had got me the impression that
small symbolic expression system, without automatic evaluation, is still very useful by itself.

And although SymPy, being fully featured CAS, with many features, easy to use,
had not got a good reputation that it is best at something, or it is best at anything at all.
It seemed like having having CAS not being best at performance or
having it too opinionated and limits about what kind of math it can do,
pushed me away from using general computer algebra systems,
to more dedicated matrix, algebra, polynomial libraries.

However, I just want to tackle that this problems may come from monolithic software designs of computer algebra systems.
Although `arb`, `flint` or `gmpy` achieves best performance at one specific math,
However, they don't really seem like restricting what you can build on top of it.
I have got a similar opinion, that CAS should be minimal like basic operating system that only connects the modules,
and start to have the boundary of what should be inside it or what should be built on top of it.

gu...@uwosh.edu

unread,

Sep 1, 2023, 8:56:09 AM9/1/23

to sympy

Take a look at SageMath (https://www.sagemath.org/), which does pulls together lots of disparate packages. I have used it and even contributed to it. I recommend it for people doing pure math or sophisticated niche work. However, It has historically been too heavy-weight and difficult to use correctly for the applications I have in the physical sciences and teaching. They are working towards making installable with pip, which may help with some of my issues.

Jonathan

emanuel.c...@gmail.com

unread,

Sep 3, 2023, 1:43:33 PM9/3/23

to sympy

[ Sagemath user and small-time developper here… ]

As far as I can tell, Oscar’s points are right on the spot. I’d second them with a remark : more generally, the user-visible functions and the algorithm implementations should be more separated in distinct layers, the “translation” between external (user-visible) representations and internal implementations of the same objects being impliicitely handled.

His point about the toxic side-effects of automatic evaluation is right. However, I note that this can be (at leas partially) simulated now, via the evaluate keyword accepted by numerous Basic functions and methods :

>>> from sympy import cos, pi, sqrt, sympify >>> cos(pi/3) 1/2 >>> cos(pi/3, evaluate=False) cos(pi/3) >>> sympify("sqrt(4)*cos(pi/3)") 1 >>> sympify("sqrt(4)*cos(pi/6)", evaluate=False) sqrt(4)*cos(pi/6)

I wonder if this problem could be handled with a auto_evaluate context handler, signalling to the symbolic expression system that the automatic evaluation is not allowed at this stage. Of course, you should have a tool to evaluate unevaluated expression (analogous to thehold/unhold mechanism available in large parts of Sage or the quoting/nouns mechanism (contraption…) of Maxima)…

Le vendredi 1 septembre 2023 à 14:56:09 UTC+2, gu…@uwosh.edu a écrit :

Take a look at SageMath (https://www.sagemath.org/), which does pulls together lots of disparate packages. I have used it and even contributed to it. I recommend it for people doing pure math or sophisticated niche work. However, It has historically been too heavy-weight and difficult to use correctly for the applications I have in the physical sciences and teaching. They are working towards making installable with pip, which may help with some of my issues.

One non-trivial issue of Sagemath is that a large number of the packages it uses are Unix-only ; this “weds” Sagemath to Unix (in the Catholic + shotgun version of “wedding” : don’t even think of divorce…). Since a generation of potential users have been force-fed Windows as their primary interface with computers, this may be a problem…

A Cygwin version of Sagemath has long been (heroically !) maintained, but this effort has been recently stopped. The recommended way to run Sagemath on Windows is nowadays to install Windows Subsystem for Linux version 2, install a Linux distribution, then install Sagemath from source on this.

None of this is especially difficult, but Windows users are in for a serious cultural schock… Furthermore, this may disrupt the integration of tool chains end users are used to. If you are used to integrate computational results in documents (e.g. via markdown/pandoc, R’s knitr or emacs’ org-mode), you’d better use a Unix version of your tools, creating some Windows-to-Unix bridge scripts if necessary.

Nevertheless, this might be (is, IMNSHO) worthwile : the range of mathematical tools accessible via Sage is astounding. BTW : this includes Sympy ;-), as well as some proprietary tools such as Magma, Mathematica, Maple or Matlab. Having one central software and language to access all these tools is indeed precious.

Aaron Meurer

unread,

Sep 3, 2023, 4:08:33 PM9/3/23

to sy...@googlegroups.com

On Sun, Sep 3, 2023 at 11:43 AM emanuel.c...@gmail.com
<emanuel.c...@gmail.com> wrote:
>
> [ Sagemath user and small-time developper here… ]
>
> As far as I can tell, Oscar’s points are right on the spot. I’d second them with a remark : more generally, the user-visible functions and the algorithm implementations should be more separated in distinct layers, the “translation” between external (user-visible) representations and internal implementations of the same objects being impliicitely handled.
>
> His point about the toxic side-effects of automatic evaluation is right. However, I note that this can be (at leas partially) simulated now, via the evaluate keyword accepted by numerous Basic functions and methods :
>
> >>> from sympy import cos, pi, sqrt, sympify >>> cos(pi/3) 1/2 >>> cos(pi/3, evaluate=False) cos(pi/3) >>> sympify("sqrt(4)*cos(pi/3)") 1 >>> sympify("sqrt(4)*cos(pi/6)", evaluate=False) sqrt(4)*cos(pi/6)
>
> I wonder if this problem could be handled with a auto_evaluate context handler, signalling to the symbolic expression system that the automatic evaluation is not allowed at this stage. Of course, you should have a tool to evaluate unevaluated expression (analogous to thehold/unhold mechanism available in large parts of Sage or the quoting/nouns mechanism (contraption…) of Maxima)…

SymPy already has an evaluate(False) context manager. But that doesn't
really solve the fundamental problem, which is that virtually every
algorithm in SymPy is written expecting evaluation. If you search the
issue tracker, you'll find hundreds of issues with various things that
don't work when you disable evaluation. Quite a few of them come from
just from printing, meaning the user isn't even really trying to do
anything with the expression. Basically, evaluate=False or 'with
evaluate(False)' only "works" in the very technical sense. If you
actually try using it it will lead to an endless number of bugs and
unexpected behavior.

Basically, when evaluation happens all the time and certain invariants
about expressions always hold, then functions are written assuming
those invariants. It isn't even done intentionally most of the time.
The only way out of this is to have a core that never evaluates in the
first place, so that code written against it will never assume
evaluation.

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/a14e9400-6bcd-4a9a-9e1b-ebcf0de6aafdn%40googlegroups.com.

Oscar Benjamin

unread,

Sep 4, 2023, 7:02:19 AM9/4/23

to sy...@googlegroups.com

On Sun, 3 Sept 2023 at 21:08, Aaron Meurer <asme...@gmail.com> wrote:
>
> On Sun, Sep 3, 2023 at 11:43 AM emanuel.c...@gmail.com
> <emanuel.c...@gmail.com> wrote:
> >
> > I wonder if this problem could be handled with a auto_evaluate context handler, signalling to the symbolic expression system that the automatic evaluation is not allowed at this stage. Of course, you should have a tool to evaluate unevaluated expression (analogous to thehold/unhold mechanism available in large parts of Sage or the quoting/nouns mechanism (contraption…) of Maxima)…
>
> SymPy already has an evaluate(False) context manager. But that doesn't
> really solve the fundamental problem, which is that virtually every
> algorithm in SymPy is written expecting evaluation. If you search the
> issue tracker, you'll find hundreds of issues with various things that
> don't work when you disable evaluation.

See also "Deprecating evaluate(False):
https://github.com/sympy/sympy/issues/25618
And also "Expression builder pattern":
https://github.com/sympy/sympy/issues/25620

Other computer algebra systems that have automatic evaluation and
things like hold/unhold also have many problems similar to SymPy's
problems with it. The solution is not complicated: just have an
unevaluating expression type and use it everywhere internally.
Explicit evaluation is easy: just add an .evaluate() method to the
unevaluating expression type. All internal code could then be changed
to use unevaluating expressions easily.

The tricky part is that many users will still want to use expressions
that have automatic evaluation but some would definitely want the
non-evaluating expressions. We need a UI for handling multiple
expression types that is not confusing. Many users are already
confused about the distinction between Python types like int and SymPy
types like Integer and then also things like NumPy arrays etc so
adding multiple types of SymPy expression risks confounding that.

--
Oscar

Reply all

Reply to author

Forward