Questions about the goals of CSymPy

232 views
Skip to first unread message

Aaron Meurer

unread,
Apr 22, 2014, 12:13:02 PM4/22/14
to sy...@googlegroups.com
I have some high level questions about CSymPy.

- What are the goals of the project?

- What are the things that should definitely go in CSymPy?

- What are the things that should definitely not go in CSymPy?

- How will CSymPy be architectured to allow things to happen in CSymPy
when they can but fallback to SymPy when they cannot.

My main concern here is that CSymPy has not clear separation from
SymPy, and as a result it will end up growing larger and larger, until
it becomes an independent CAS (which is fine if that's the goal, but
my understanding was that it was supposed to be just a small fast
core).

In particular, if there is some feature of SymPy functions, how will
CSymPy be architectured so that it can take advantage of it without
having to completely reimplement that function in C++?

For instance, a current goal of CSymPy is to implement trig functions.
But this can be quite complicated if you consider all the different
things you can do with trig functions. Without even thinking about
trig simplification, there are complicated evaluation issues (e.g.,
consider sin(pi/7).rewrite(sqrt) in SymPy). It would be a shame to
reimplement all this logic twice, especially it is not needed for
performance.

Finally, I noticed that the CSymPy Python API doesn't adhere to SymPy
standards, e.g., it doesn't use .args. I think a first step in this
direction would be to make all CSymPy objects completely compatible
with SymPy. Probably a good idea would be to mixin Basic to all CSymPy
classes (is this possible?).

Aaron Meurer

Ondřej Čertík

unread,
Apr 22, 2014, 1:05:15 PM4/22/14
to sympy
Hi Aaron,

Those are good questions. Here are the answers:

On Tue, Apr 22, 2014 at 10:13 AM, Aaron Meurer <asme...@gmail.com> wrote:
> I have some high level questions about CSymPy.
>
> - What are the goals of the project?

The goals of the project are:

* Fastest symbolic manipulation library, compared to other codes,
commercial or opensource
(Sage, GiNaC, Mathematica, ...).

* Extension/complement to SymPy

* If the above two goals allow, be able to also call it from other
languages easily and efficiently (Julia, Ruby, Mathematica, ...)

As to technical solution: the core should be a C++ library, which can
depend on other compiled libraries if needed.
The core should not depend on Python or Ruby or Julia, but rather be
just one language, C++. That lowers the barrier
of entry significantly, compared to a big mix of C++, Cython and
Python, makes it easier to make things fast
(you don't need to worry about Python at all). The Python (and other
languages) wrappers should be just a thin
wrappers around the C++ core (=just better syntax).

There might be other technical solutions to this, but I know that I
can deliver the above goals with this solution
(and I failed to deliver with other solutions, like writing the core
in Cython). So that's why we do it this way.

Also, by being "just a C++ library", other people can use it in their
projects. I hope to get interest of much broader
community that way, who can contribute back (somebody will need fast
symbolic manipulation in Julia, so they
can just use CSymPy with Julia wrappers, and contribute improvements back).

>
> - What are the things that should definitely go in CSymPy?

At the moment: all things to make specific applications fast, in
particular PyDy. For that, it needs basic
manipulation, differentiation, series expansion (I think) and
matrices. That's all roughly either done, or
on the way. Of course, lots of polishing is needed.

>
> - What are the things that should definitely not go in CSymPy?

Things that don't need to be fast. Things like limits. Also things
that are in SymPy, where CSymPy can
be used as a drop in replacement for the engine: PyDy, some stuff in
physics, and so on. There is no need
to rewrite PyDy in C++. Also most user's code would stay in Python.
They can just optionally change
to CSymPy for some intensive calculation, then finish the thing with SymPy.

>
> - How will CSymPy be architectured to allow things to happen in CSymPy
> when they can but fallback to SymPy when they cannot.

Currently you can simply mix and match SymPy and CSymPy expressions.
So you simply
convert an expression to SymPy to do some advanced manipulation, and
convert to CSymPy
to do some fast manipulation. I am open to suggestions how to improve this.

>
> My main concern here is that CSymPy has not clear separation from
> SymPy, and as a result it will end up growing larger and larger, until
> it becomes an independent CAS (which is fine if that's the goal, but
> my understanding was that it was supposed to be just a small fast
> core).

The goals are written above. I am myself concentrating on speed,
that's what I really
want to nail down. And then enough features so that it's useful for
all the people who
found SymPy slow. However, let's say somebody comes and reimplements the Gruntz
algorithm in CSymPy. Should we reject such a PR? My answer is that if
the code is nice,
maintainable, much faster than SymPy and has the same or similar
features, I am ok
with merging it. If the code is a mess, then not. But as I said, I am
spending my own
time on things which people need, and faster limits don't seem to be it.

>
> In particular, if there is some feature of SymPy functions, how will
> CSymPy be architectured so that it can take advantage of it without
> having to completely reimplement that function in C++?

You can convert any expression back and forth, so you keep it in SymPy
if you want to have some particular feature. See also the conversation
and specific examples here:

https://github.com/sympy/csympy/issues/153

>
> For instance, a current goal of CSymPy is to implement trig functions.
> But this can be quite complicated if you consider all the different
> things you can do with trig functions. Without even thinking about
> trig simplification, there are complicated evaluation issues (e.g.,
> consider sin(pi/7).rewrite(sqrt) in SymPy). It would be a shame to
> reimplement all this logic twice, especially it is not needed for
> performance.

Agreed. On the other hand, we really need very fast trig functions.
The functionality
that we need is simplifications like sin(2*pi) -> 0, differentiation
and series expansion.
This is needed for applications like PyDy.
Those are not difficult to implement. Things which get tricky are then
the advanced
simplifications and rewriting. But those I would do in SymPy, because
as you said,
those are typically not needed for performance.

>
> Finally, I noticed that the CSymPy Python API doesn't adhere to SymPy
> standards, e.g., it doesn't use .args. I think a first step in this
> direction would be to make all CSymPy objects completely compatible
> with SymPy. Probably a good idea would be to mixin Basic to all CSymPy
> classes (is this possible?).

The .args are not implemented yet in the Python wrappers, so I just
created an issue for it:

https://github.com/sympy/csympy/issues/162

I don't know if it's possible to mix Basic, but either way the
wrappers should be compatible.

Ondrej

Joachim Durchholz

unread,
Apr 22, 2014, 5:58:57 PM4/22/14
to sy...@googlegroups.com
Hm.

One:
> * Extension/complement to SymPy

Two:

> That lowers the barrier
> of entry significantly, compared to a big mix of C++, Cython and
> Python, makes it easier to make things fast
> (you don't need to worry about Python at all).

That's not an extension nor a complement, it's a replacement.

> The Python (and other
> languages) wrappers should be just a thin
> wrappers around the C++ core (=just better syntax).

I.e. replace the engine if not the API.

Not that I'm judging. I'm just pointing out perceived inconsistencies.

The more SymPy itself turns into a set of simplification rulese, the
less significance this will have in the end.

>> - How will CSymPy be architectured to allow things to happen in CSymPy
>> when they can but fallback to SymPy when they cannot.
>
> Currently you can simply mix and match SymPy and CSymPy expressions.
> So you simply
> convert an expression to SymPy to do some advanced manipulation, and
> convert to CSymPy
> to do some fast manipulation. I am open to suggestions how to improve this.

The alternative would be to have a data structure that can be
manipulated from both the C++ and the Python side, but that's going to
be unnatural for at least one of the sides.

Note that the data structures can become large-ish, and if the
simplification becomes complicated there may be a lot of back-and-forth.
It's possible that mixed execution will be slow for some algorithms or
use cases for that reason.

I do not think that this can be determined in advance, it's something to
keep an eye out for during benchmarks.

>> My main concern here is that CSymPy has not clear separation from
>> SymPy, and as a result it will end up growing larger and larger, until
>> it becomes an independent CAS (which is fine if that's the goal, but
>> my understanding was that it was supposed to be just a small fast
>> core).
>
> The goals are written above. I am myself concentrating on speed,
> that's what I really
> want to nail down.

I'm somewhat sceptical about this.
A conversion to C++ will give a linear improvement.
Better algorithms can improve the big-Oh class.
Unless algorithmic improvements have been exhausted, this would be
premature optimization. (Are algorithmic improvements exhausted yet?)

> However, let's say somebody comes and reimplements the Gruntz
> algorithm in CSymPy. Should we reject such a PR? My answer is that if
> the code is nice,
> maintainable, much faster than SymPy and has the same or similar
> features, I am ok
> with merging it.

Hm. It does excludes those with lack of C++ background will be excluded
from maintaining it.
Which means that the code instantly loses a lot of maintainability.

It's the main reservation I'm having about what the CSymPy project is doing.
This is the usual problem with dual-language projects. "Pure Python" is
one selling point of SymPy, and I think it's an important one - the
point of having Open Source is that in times of need, you can take a
look at what's going wrong, having a dual-language project eliminates
that advantage for a large fraction of users.

Ondřej Čertík

unread,
Apr 22, 2014, 7:11:53 PM4/22/14
to sympy
For the basic manipulation pretty much. One can get faster than the
current sympy,
see e.g. sympycore. But you don't get as fast as current CSymPy with
that approach.
The speedup is linear for O(n) problems. For O(n^3) problems (Gaussian
elimination [1]...)
the speedup is cubic. So it really matters for these kind of problems.

>
>
>> However, let's say somebody comes and reimplements the Gruntz
>> algorithm in CSymPy. Should we reject such a PR? My answer is that if
>> the code is nice,
>> maintainable, much faster than SymPy and has the same or similar
>> features, I am ok
>> with merging it.
>
>
> Hm. It does excludes those with lack of C++ background will be excluded from
> maintaining it.
> Which means that the code instantly loses a lot of maintainability.
>
> It's the main reservation I'm having about what the CSymPy project is doing.
> This is the usual problem with dual-language projects. "Pure Python" is one
> selling point of SymPy, and I think it's an important one - the point of
> having Open Source is that in times of need, you can take a look at what's
> going wrong, having a dual-language project eliminates that advantage for a
> large fraction of users.

Right, that's why we are sticking to just one language, as opposed to have mixed
Python / C++ code, which would bring a lot more intricacies.
Yes, you have to learn the language in order to contribute, but that's
it. You don't
need to learn two languages and how to mix them. (The Python wrappers
that we have
are simple and thin, i.e. there is no logic there, so if worse comes
to worse, I can maintain them myself.)

SymPy will stay pure Python.

You raise some good points, but I don't know a better alternative. As
I said, I can deliver
using C++. I tried other approaches, and I failed to deliver using
those. That does not mean that
there is some better way of course, only that I don't know any other better way.

Ondrej


[1] http://en.wikipedia.org/wiki/Gaussian_elimination#Computational_efficiency

Ondřej Čertík

unread,
Apr 22, 2014, 7:24:07 PM4/22/14
to sympy
No, I think it is still linear even for O(n^3) problems. If it has n^3
operations
and each operations takes 100x less with CSymPy, the whole calculation still
takes 100x less with CSymPy.

Ondrej

Aaron Meurer

unread,
Apr 22, 2014, 8:06:22 PM4/22/14
to sy...@googlegroups.com
I think that's already too much. Why is the series expansion slow in
SymPy? Is it because the algorithms are slow? If so, then implementing
the same inefficient algorithms in CSymPy won't help. They will be
faster, but for large enough expressions they will still slow down. Is
it because the expression manipulation is slow? In that case, if
CSymPy has faster expression manipulation, then just use those
expressions, but use the SymPy series algorithms.

My points are:

- I think CSymPy should focus on making expression manipulation fast
(i.e., the things that are the inner loop of any symbolic algorithm).
It should not reimplement the symbolic algorithms themselves. Those
are implemented in SymPy. If they use the CSymPy objects instead of
the SymPy objects, they will be faster.

- I would focus more on making CSymPy interoperate with SymPy and less
on reimplementing things that are in SymPy in CSymPy. Once there is
interoperation, we can see what is still slow, and then (and only
then) implement it in C++.
Why do you need to implement those in C++? If the expression
manipulation is fast, then won't it be fine to have the actual
formulas/algorithms in SymPy?

> This is needed for applications like PyDy.
> Those are not difficult to implement. Things which get tricky are then
> the advanced
> simplifications and rewriting. But those I would do in SymPy, because
> as you said,
> those are typically not needed for performance.
>
>>
>> Finally, I noticed that the CSymPy Python API doesn't adhere to SymPy
>> standards, e.g., it doesn't use .args. I think a first step in this
>> direction would be to make all CSymPy objects completely compatible
>> with SymPy. Probably a good idea would be to mixin Basic to all CSymPy
>> classes (is this possible?).
>
> The .args are not implemented yet in the Python wrappers, so I just
> created an issue for it:
>
> https://github.com/sympy/csympy/issues/162
>
> I don't know if it's possible to mix Basic, but either way the
> wrappers should be compatible.

It's more than just args. Every SymPy object is expected to have all
the methods of Basic.

Aaron Meurer

>
> Ondrej
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVD_RBVi6GnAHd59OgcuQ4eJUPYK1jqASmFoosUG2uc3wQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Aaron Meurer

unread,
Apr 22, 2014, 8:16:39 PM4/22/14
to sy...@googlegroups.com
That is true. I usually prefer to use faster algorithms.

But to be the devil's advocate, there are two issues with this line of thinking:

- Big O is basically useless. Consider the extreme effectiveness of
SAT solvers (which solve an NP-complete problem), or the difference
between the simplex and Khachiyan's algorithm, or AKS vs. more
efficient deterministic primality testing algorithms. Asymptotic
complexity is all fine, but at the end of the day, you don't care how
fast your algorithm is for increasingly large inputs, you care how
fast it is for *your* input.

- Faster algorithms have a complexity cost. You can get closer to the
metal in Python by being very careful about your use of data
structures, and avoiding things that are slow in Python (like function
calls), but the cost is high because you end up with code that is not
only harder to read and maintain, but harder to keep in its fast
state, because someone else who doesn't know all the little tricks
might come along and change things in a way that seems equivalent but
makes things slower.

With that being said, C++ is itself enough of a complexity cost that
doing this outweighs using it in many (most?) cases. (That's not just
a knock on C++; using any second language to Python brings a cost,
both because there are now two languages to think about, and because
of interoperability questions)

Aaron Meurer

>
>
>> However, let's say somebody comes and reimplements the Gruntz
>> algorithm in CSymPy. Should we reject such a PR? My answer is that if
>> the code is nice,
>> maintainable, much faster than SymPy and has the same or similar
>> features, I am ok
>> with merging it.
>
>
> Hm. It does excludes those with lack of C++ background will be excluded from
> maintaining it.
> Which means that the code instantly loses a lot of maintainability.
>
> It's the main reservation I'm having about what the CSymPy project is doing.
> This is the usual problem with dual-language projects. "Pure Python" is one
> selling point of SymPy, and I think it's an important one - the point of
> having Open Source is that in times of need, you can take a look at what's
> going wrong, having a dual-language project eliminates that advantage for a
> large fraction of users.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/5356E621.1030701%40durchholz.org.

Brian Granger

unread,
Apr 22, 2014, 8:23:55 PM4/22/14
to sympy
I too feel that csympy should implement the absolute minimum possible
for it to be fast. All actual mathematical algorithms should remain in
sympy. Trying to pull lots of algorithms into csympy will fail - not
that many people want to write complex mathematical algorithms in C++.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6%2BoyvxqYS-cj8vcrr%3DukBYZRssXJ6WEFOxfhd%2BKGm0aMg%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgra...@calpoly.edu and elli...@gmail.com

Ondřej Čertík

unread,
Apr 22, 2014, 11:21:21 PM4/22/14
to sympy
My experience is that it will actually help to implement the same algorithm,
because there is a little overhead with any Python operation. So if you do
a lot of them (like in series expansion, where you create intermediate
expressions
and so on --- and even if we use CSymPy, there is overhead in the
Python wrappers,
so essentially you don't want to be calling them too often, if you *really* care
about performance), they will add up.

>
> My points are:
>
> - I think CSymPy should focus on making expression manipulation fast
> (i.e., the things that are the inner loop of any symbolic algorithm).
> It should not reimplement the symbolic algorithms themselves. Those
> are implemented in SymPy. If they use the CSymPy objects instead of
> the SymPy objects, they will be faster.
>
> - I would focus more on making CSymPy interoperate with SymPy and less
> on reimplementing things that are in SymPy in CSymPy. Once there is
> interoperation, we can see what is still slow, and then (and only
> then) implement it in C++.

The interoperability is important and that is mostly the job of the
Python wrappers.
The C++ API underneath can change a lot, i.e. if we figure out a
faster way to represent
things, we'll switch. I will spend lots of time making the
interoperability work.
This summer though, I want to think hard about raw speed and making it work
for PyDy.
Maybe, that depends on this issue:

https://github.com/sympy/csympy/issues/153

The problem is that once you start thinking about Python+C++ at once
and performance, things get complex quickly. It's much easier
to think in terms of C++ only and how to write the fastest possible algorithm
(that is hard enough!). This sets the bar. Then one should try to see
if it is possible to match this with Python. Not the other way round,
because you need to set the bar first.



On Tue, Apr 22, 2014 at 6:16 PM, Aaron Meurer <asme...@gmail.com> wrote:
> On Tue, Apr 22, 2014 at 4:58 PM, Joachim Durchholz <j...@durchholz.org> wrote:
>> Hm.
>>
>> One:
>>> * Extension/complement to SymPy
>>
>> Two:
>>
>>
>>> That lowers the barrier
>>> of entry significantly, compared to a big mix of C++, Cython and
>>> Python, makes it easier to make things fast
>>> (you don't need to worry about Python at all).
>>
>>
>> That's not an extension nor a complement, it's a replacement.
>>
>>
>>> The Python (and other
>>>
>>> languages) wrappers should be just a thin
>>> wrappers around the C++ core (=just better syntax).
>>
>>
>> I.e. replace the engine if not the API.
>>
>> Not that I'm judging. I'm just pointing out perceived inconsistencies.
>>
>> The more SymPy itself turns into a set of simplification rulese, the less
>> significance this will have in the end.
>>
>>
>>>> - How will CSymPy be architectured to allow things to happen in CSymPy
>>>> when they can but fallback to SymPy when they cannot.
>>>
>>>
>>> Currently you can simply mix and match SymPy and CSymPy expressions.
>>> So you simply
>>> convert an expression to SymPy to do some advanced manipulation, and
>>> convert to CSymPy
>>> to do some fast manipulation. I am open to suggestions how to improve
>>> this.
>>
>>
>> The alternative would be to have a data structure that can be manipulated
>> from both the C++ and the Python side, but that's going to be unnatural for
>> at least one of the sides.
>>
>> Note that the data structures can become large-ish, and if the
>> simplification becomes complicated there may be a lot of back-and-forth.
>> It's possible that mixed execution will be slow for some algorithms or use
>> cases for that reason.
>>
>> I do not think that this can be determined in advance, it's something to
>> keep an eye out for during benchmarks.
>>
>>
>>>> My main concern here is that CSymPy has not clear separation from
>>>> SymPy, and as a result it will end up growing larger and larger, until
>>>> it becomes an independent CAS (which is fine if that's the goal, but
>>>> my understanding was that it was supposed to be just a small fast
>>>> core).
>>>
>>>
>>> The goals are written above. I am myself concentrating on speed,
>>> that's what I really
>>> want to nail down.
>>
>>
>> I'm somewhat sceptical about this.
>> A conversion to C++ will give a linear improvement.
>> Better algorithms can improve the big-Oh class.
>> Unless algorithmic improvements have been exhausted, this would be premature
>> optimization. (Are algorithmic improvements exhausted yet?)
>
> That is true. I usually prefer to use faster algorithms.
>
> But to be the devil's advocate, there are two issues with this line of thinking:
>
> - Big O is basically useless. Consider the extreme effectiveness of
> SAT solvers (which solve an NP-complete problem), or the difference
> between the simplex and Khachiyan's algorithm, or AKS vs. more
> efficient deterministic primality testing algorithms. Asymptotic
> complexity is all fine, but at the end of the day, you don't care how
> fast your algorithm is for increasingly large inputs, you care how
> fast it is for *your* input.
>
> - Faster algorithms have a complexity cost. You can get closer to the
> metal in Python by being very careful about your use of data
> structures, and avoiding things that are slow in Python (like function
> calls), but the cost is high because you end up with code that is not
> only harder to read and maintain, but harder to keep in its fast
> state, because someone else who doesn't know all the little tricks
> might come along and change things in a way that seems equivalent but
> makes things slower.

Precisely. That's why it's good to stick to just one language, C++, and nail
the speed. That sets the bar. Then one can try to match the speed with Python,
which sometimes is possible.

>
> With that being said, C++ is itself enough of a complexity cost that
> doing this outweighs using it in many (most?) cases. (That's not just
> a knock on C++; using any second language to Python brings a cost,
> both because there are now two languages to think about, and because
> of interoperability questions)

Yes, again the reason why to stick to just one language and only do thin
wrappers that allow to use it as a blackbox.



On Tue, Apr 22, 2014 at 6:23 PM, Brian Granger <elli...@gmail.com> wrote:
> I too feel that csympy should implement the absolute minimum possible
> for it to be fast. All actual mathematical algorithms should remain in
> sympy. Trying to pull lots of algorithms into csympy will fail - not
> that many people want to write complex mathematical algorithms in C++.


I understand your worries. Both yours and Aaron's and other people in
the SymPy community.
Let's be frank about this, here they are:

* keep things simple, maintainable, in Python, not introducing other languages

* especially not C++, which is notorious for being hilariously
complex. That's why we use Python, because it's better.

* danger of splitting a community by introducing a separate CAS
(=wasting our resources, attention, developers, and so on), and we've
been on this path before, e.g. with with sympycore, or even Sage ---
any improvement to those codes does not benefit SymPy. That does not
mean that there is anything bad with those, but one has to try to
focus, in our case SymPy, and just get things done, make it a useful
library so that people can use it and benefit from it.

* that we should concentrate on features, and sacrifice some
reasonable speed (maybe 100x or so).

* We should not be reimplementing SymPy in C++, that would be a big waste.


I thought about all these very deeply. And I can promise that I will
do my absolute best to make this work, with the SymPy community. I
might not have the best answer to all the worries, but I know that we
need to set the bar:

* So implement trig functions in C++
* Benchmark against Mathematica, Sage, GiNaC
* Make sure it is as fast or faster
* See if we can match it with Python
* If so, great! If not, what is the penalty? 2x? 10x? 100x?

If we don't have this bar, then we miss on speed. And if we miss on
speed then there will always be reason why people would use other
software, because of speed. If, on the other hand, csympy is as fast
as state of the art, then it fixes the problem. And it will be
integrated with SymPy well, people can keep using SymPy.

Ondrej

Aaron Meurer

unread,
Apr 22, 2014, 11:52:41 PM4/22/14
to sy...@googlegroups.com
Exactly. There is a "little" overhead. Not a huge overhead. It matters
for the stuff that is in the inner loops, like addition and
multiplication of terms, but for whole algorithms, which might be
called only a few times (as opposed to a few hundred thousand times),
it doesn't make a difference.

This is all hypothetical without numbers (and btw, it would be awesome
if you could provide real numbers here), but suppose these imaginary
numbers were true:

SymPy: 1x
CSymPy with Python wrappers: 4x
Raw CSymPy: 5x

Then using CSymPy with Python would already be 4x faster than SymPy.
Now doing everything in SymPy would only be 1.25 faster than that.

Now, if CSymPy integrates flawlessly, so that it just works (at least
as far as the user is concerned), there is little complexity cost of
CSymPy + Python. Definitely little enough to warrant the 4x speedup.
But as soon as you take that away, i.e., you implement more and more
in C++, or CSymPy differs enough from SymPy that the user needs to
care about it (which the more that is in CSymPy, the more likely this
is to happen), then the complexity cost sky rockets. Maybe 4x would
still be worth it here. But not 1.25x.

>So if you do
> a lot of them (like in series expansion, where you create intermediate
> expressions
> and so on --- and even if we use CSymPy, there is overhead in the
> Python wrappers,
> so essentially you don't want to be calling them too often, if you *really* care
> about performance), they will add up.

Sure, you could just implement a whole CAS in C++. That's what some
people have done already. But you have to factor in the costs of
everything, not just the speed. The costs of:

- How much more complicated the code is
- Code duplication (and all the associated issues that come with it)
- The additional overhead needed for CSymPy to interop with SymPy. The
more CSymPy does, the harder this is.

Is series expansion slow? Is it an inner loop (i.e., will it matter to
people if it is slow)? Is it simple to implement ('simple' being a
relative term of course; obviously no part of a CAS is completely
simple)? If the answer to any of those is "no", I think you should
seriously consider whether it's worth implementing.
Thanks for your frankness. So I think you do understand the issues.

>
> * keep things simple, maintainable, in Python, not introducing other languages
>
> * especially not C++, which is notorious for being hilariously
> complex. That's why we use Python, because it's better.

Well, there are also languages that are fast and not C++, but we can
have that discussion separately.

>
> * danger of splitting a community by introducing a separate CAS
> (=wasting our resources, attention, developers, and so on), and we've
> been on this path before, e.g. with with sympycore, or even Sage ---
> any improvement to those codes does not benefit SymPy. That does not
> mean that there is anything bad with those, but one has to try to
> focus, in our case SymPy, and just get things done, make it a useful
> library so that people can use it and benefit from it.
>
> * that we should concentrate on features, and sacrifice some
> reasonable speed (maybe 100x or so).
>
> * We should not be reimplementing SymPy in C++, that would be a big waste.

One of my worries is not listed here, which is that you are doing
things completely backwards from good software design with CSymPy,
which is getting speed first, and something that works later.

>
>
> I thought about all these very deeply. And I can promise that I will
> do my absolute best to make this work, with the SymPy community. I
> might not have the best answer to all the worries, but I know that we
> need to set the bar:
>
> * So implement trig functions in C++
> * Benchmark against Mathematica, Sage, GiNaC
> * Make sure it is as fast or faster
> * See if we can match it with Python
> * If so, great! If not, what is the penalty? 2x? 10x? 100x?

That is exactly what I want to know.

>
> If we don't have this bar, then we miss on speed. And if we miss on
> speed then there will always be reason why people would use other
> software, because of speed. If, on the other hand, csympy is as fast
> as state of the art, then it fixes the problem. And it will be
> integrated with SymPy well, people can keep using SymPy.
>
> Ondrej

I feel like without real numbers, we aren't going to get anywhere, so
maybe you could provide some benchmarks. I'm not convinced about
expand, because that relies pretty heavily on other things like
multinomial coefficient generation. I'd rather see a benchmark that
does the exact same expression manipulations everywhere.

Feel free to suggest a better one. I'm just coming up with this from
the seat of my pants, but something like

a = x
c = 1
for i in range(1000): # Replace with larger numbers if necessary
a += c*i*x # If CSymPy expressions are mutable modify this accordingly
c *= -1

This is a very basic test of how well terms are combined with
addition. It also creates a lot of terms, which might penalize a
system with too much caching (maybe it should be modified to not do
that; I don't know).

1000 should really be replaced with a much larger number, say
10000000. Basically, if a benchmark runs in under a second, I'm weary
of it. That means you're benchmarking things that might be irrelevant
for larger expressions, when the speed really matters. I suppose
really you should benchmark 10**N for some range of N and plot the
results.

That's just one benchmark, that tests one thing. We need to benchmark
all the major aspects of the core. A few others that come to mind:

- x1*...*xn + x2*...*xn*x1 + ... (basically, test that large
multiplications in different orders are efficiently canonicalized to
the same thing)
- (some large addition) - (the same addition, in a different order)
- x**x**x**x**...**x (do some stuff with it; basically test that
highly nested expressions don't kill things. You could also test
sin(sin(sin(sin(...(x)...) or something)
- (Huge complicated expression in x).subs(x, y) (CSymPy implements
some kind of subs or xreplace natively, right?)

It actually would be nice to have some kind of benchmarking site for
SymPy, similar to http://speed.pypy.org/.

Aaron Meurer

Ondřej Čertík

unread,
Apr 23, 2014, 1:36:17 AM4/23/14
to sympy
The goal is to get something that works first, polished API later.
By "work" I mean fast and working for PyDy (at first).
Sure. Here is the code:

from csympy import var, Integer
#from sympy import var, Integer
var("x")
a = x
c = Integer(1)
N = 10**5
for i in range(N):
a += c*i*x
c *= -1
print a


SymPy:

$ time python a.py
-49999*x

real 0m35.262s
user 0m34.870s
sys 0m0.300s

CSymPy:

$ time python a.py
-49999x

real 0m0.860s
user 0m0.852s
sys 0m0.004s


So this particular one is 41x faster in CSymPy. You can modify this to
generate some long expressions, e.g.:

from csympy import var, Integer
#from sympy import var, Integer
var("x")
a = x
c = Integer(1)
N = 3000
for i in range(N):
a += c*x**i
c *= -1

SymPy:

$ time python a.py

real 0m37.890s
user 0m37.626s
sys 0m0.152s

CSymPy:

$ time python a.py

real 0m1.032s
user 0m1.020s
sys 0m0.012s


This one is 37x faster. Sage takes about 3s on this benchmark. And so on.

We need to also benchmark Mathematica and GiNaC.

>
> This is a very basic test of how well terms are combined with
> addition. It also creates a lot of terms, which might penalize a
> system with too much caching (maybe it should be modified to not do
> that; I don't know).
>
> 1000 should really be replaced with a much larger number, say
> 10000000. Basically, if a benchmark runs in under a second, I'm weary
> of it. That means you're benchmarking things that might be irrelevant
> for larger expressions, when the speed really matters. I suppose
> really you should benchmark 10**N for some range of N and plot the
> results.

Exactly, plot depending on N is the best.

Ondrej

>
> That's just one benchmark, that tests one thing. We need to benchmark
> all the major aspects of the core. A few others that come to mind:
>
> - x1*...*xn + x2*...*xn*x1 + ... (basically, test that large
> multiplications in different orders are efficiently canonicalized to
> the same thing)
> - (some large addition) - (the same addition, in a different order)
> - x**x**x**x**...**x (do some stuff with it; basically test that
> highly nested expressions don't kill things. You could also test
> sin(sin(sin(sin(...(x)...) or something)
> - (Huge complicated expression in x).subs(x, y) (CSymPy implements
> some kind of subs or xreplace natively, right?)
>
> It actually would be nice to have some kind of benchmarking site for
> SymPy, similar to http://speed.pypy.org/.
>
> Aaron Meurer
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6KX8h7FRnKWa%2BjFihSfiu-9rrexYZmjXymOimLsUQ7BeQ%40mail.gmail.com.

Aaron Meurer

unread,
Apr 23, 2014, 10:15:49 AM4/23/14
to sy...@googlegroups.com
And how fast is the raw C++, without the python wrappers?

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVCEo1q-UvKv_bpsDn%3DT3d8suxNqdVnvN8dBeEV7j3CQ4w%40mail.gmail.com.

Aaron Meurer

unread,
Apr 23, 2014, 10:17:27 AM4/23/14
to sy...@googlegroups.com
And to be fair, we should probably benchmark Poly and ring with any
polynomial benchmark. They will show how fast pure python can be
(don't forget Python/gmpy/gmpy2 combinations).

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVCEo1q-UvKv_bpsDn%3DT3d8suxNqdVnvN8dBeEV7j3CQ4w%40mail.gmail.com.

Mateusz Paprocki

unread,
Apr 23, 2014, 11:21:20 AM4/23/14
to sympy
Hi,
Comparing sympy.polys and sympy.core:

In [1]: R, x = ring("x", ZZ)

In [2]: y = Symbol("y")

In [3]: N, a, c = 10**5, x, ZZ(1)

In [4]: %time for i in range(N): a += c*i*x; c *= -1
CPU times: user 564 ms, sys: 4.85 ms, total: 569 ms
Wall time: 555 ms

In [5]: N, a, c = 10**5, y, Integer(1)

In [6]: %time for i in range(N): a += c*i*y; c *= -1
CPU times: user 20 s, sys: 133 ms, total: 20.1 s
Wall time: 20 s

>
> So this particular one is 41x faster in CSymPy. You can modify this to
> generate some long expressions, e.g.:
>
> from csympy import var, Integer
> #from sympy import var, Integer
> var("x")
> a = x
> c = Integer(1)
> N = 3000
> for i in range(N):
> a += c*x**i
> c *= -1
>
> SymPy:
>
> $ time python a.py
>
> real 0m37.890s
> user 0m37.626s
> sys 0m0.152s
>
> CSymPy:
>
> $ time python a.py
>
> real 0m1.032s
> user 0m1.020s
> sys 0m0.012s
>

Comparing sympy.polys and sympy.core:

In [1]: R, x = ring("x", ZZ)

In [2]: y = Symbol("y")

In [3]: N, a, c = 3000, x, ZZ(1)

In [4]: %time for i in range(N): a += c*x**i; c *= -1
CPU times: user 148 ms, sys: 4.3 ms, total: 152 ms
Wall time: 147 ms

In [5]: N, a, c = 3000, y, Integer(1)

In [6]: %time for i in range(N): a += c*y**i; c *= -1
CPU times: user 20.6 s, sys: 42.6 ms, total: 20.6 s
Wall time: 20.6 s

So, what's the difference between CSymPy's +=, *=, *, **, etc.
operators and SymPy's ones? Are they in-place? What are the underlying
data structures? Do they use the same set of rewrite rules? Do they
take assumptions into account? When comparing sympy.polys and
sympy.core, it's obvious that sympy.polys will be faster because it
simply does a lot less compared to sympy.core.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVCEo1q-UvKv_bpsDn%3DT3d8suxNqdVnvN8dBeEV7j3CQ4w%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Mateusz

Aaron Meurer

unread,
Apr 23, 2014, 11:33:37 AM4/23/14
to sy...@googlegroups.com
Does it make a difference if you start out with c = 1 (rather than c =
Integer(1) or c = ZZ(1))?

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAGBZUCYOdhOWdBHkkYQmpPqv3Aic9d3p%2BZO4NfBNh0z%2BkV75vg%40mail.gmail.com.

Ondřej Čertík

unread,
Apr 23, 2014, 11:35:59 AM4/23/14
to sympy
They don't use assumptions. I think assumptions should not be taken
into account in these manipulations, but rather using
refine().

For the sympy.polys in this example,
is it using the sparse polynomial representation? I.e. it stores the
symbols (x, y, z) and then stores their powers and coefficients
in a dictionary, e.g.:

3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}

?

In CSymPy, I am still implementing this datastructure, as it is very
efficient, but it only works for polynomials. So at the moment, we
can't benchmark this yet.

The internal datastructure for the CSymPy benchmark above is currently
std::unordered_map inside CSymPy::Add.
So a better benchmark is something like this then:

%time for i in range(N): a += c*x**(i*x); c *= -1

which fails with:

TypeError: int() argument must be a string or a number, not 'PolyElement'

if you try to use polynomials.

So in SymPy:

In [1]: from sympy import Symbol, Integer

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Integer(1)

In [4]: %time for i in range(N): a += c*x**(i*x); c *= -1
CPU times: user 25.8 s, sys: 8 ms, total: 25.8 s
Wall time: 25.8 s

And CSymPy:

In [1]: from csympy import Symbol, Integer

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Integer(1)

In [4]: %time for i in range(N): a += c*x**(i*x); c *= -1
CPU times: user 1.17 s, sys: 0 ns, total: 1.17 s
Wall time: 1.17 s


Ondrej

Mateusz Paprocki

unread,
Apr 23, 2014, 11:37:35 AM4/23/14
to sympy
Hi,

On 23 April 2014 17:33, Aaron Meurer <asme...@gmail.com> wrote:
> Does it make a difference if you start out with c = 1 (rather than c =
> Integer(1) or c = ZZ(1))?

ring() won't work with Integer, because there is no more sympy ground
types. At most we can compare mpz with int.

Mateusz
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6J1C%2BDixoDwVfxZZj5nCfQ9oN5wH%3DUQwhWmJ76mr%2BCmfA%40mail.gmail.com.

Aaron Meurer

unread,
Apr 23, 2014, 11:38:13 AM4/23/14
to sy...@googlegroups.com
But if you allow arbitrary generators, as Poly does (I don't know if
it is possible to do this with ring()), then you can have any
expression, and just consider it as a polynomial. In this case, your
generators would be x**(i*x), or, if you are smart, x**x.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVAbvx4dHsY7XtPPbRXmfwSPazv0S%3D7oNtEt-UM_kbkd4g%40mail.gmail.com.

Aaron Meurer

unread,
Apr 23, 2014, 11:39:18 AM4/23/14
to sy...@googlegroups.com
No, my point is, if you just use int(1), and require that each
operation do the coercion, does that slow it down any? It shouldn't
much, since it already has to coerce i, which is an int.

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAGBZUCZFSNCUG4pC19ndxSeeqT1-aXSkYyvm-L6NADe92-Ay3w%40mail.gmail.com.

Mateusz Paprocki

unread,
Apr 23, 2014, 11:47:02 AM4/23/14
to sympy
Hi,

On 23 April 2014 17:37, Mateusz Paprocki <mat...@gmail.com> wrote:
> Hi,
>
> On 23 April 2014 17:33, Aaron Meurer <asme...@gmail.com> wrote:
>> Does it make a difference if you start out with c = 1 (rather than c =
>> Integer(1) or c = ZZ(1))?
>

OK, so one could use EX domain to make ring() work with Integer, but
note that every operation will require a call to cancel() in this
scenario, e.g.:

In [1]: R, x = ring("x", EX)

In [2]: N, a, c = 10**5, x, Integer(1)

In [3]: %time for i in range(N): a += c*i*x; c *= -1
CPU times: user 6.6 s, sys: 56.3 ms, total: 6.65 s
Wall time: 6.54 s

Removing cancel() saves about 2 s.

Mateusz

Ondřej Čertík

unread,
Apr 23, 2014, 11:50:33 AM4/23/14
to sympy
On Wed, Apr 23, 2014 at 9:38 AM, Aaron Meurer <asme...@gmail.com> wrote:
> But if you allow arbitrary generators, as Poly does (I don't know if
> it is possible to do this with ring()), then you can have any
> expression, and just consider it as a polynomial. In this case, your
> generators would be x**(i*x), or, if you are smart, x**x.

Can you post code that does this? Let's benchmark it.

Ondrej

Mateusz Paprocki

unread,
Apr 23, 2014, 11:50:35 AM4/23/14
to sympy
Hi,
Yeah, but sympy.core does. Or maybe not in this scenario?

> For the sympy.polys in this example,
> is it using the sparse polynomial representation? I.e. it stores the
> symbols (x, y, z) and then stores their powers and coefficients
> in a dictionary, e.g.:
>
> 3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}
>

Sparse.
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVAbvx4dHsY7XtPPbRXmfwSPazv0S%3D7oNtEt-UM_kbkd4g%40mail.gmail.com.

Aaron Meurer

unread,
Apr 23, 2014, 12:02:28 PM4/23/14
to sy...@googlegroups.com
Just use x = Poly(x**x). That's the dense representation. No idea how
to do this with ring(). Mateusz will have to show us. But the
polynomials you're creating in this benchmark are univariate and dense
anyway.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVAnhKFS-83379qN2RP2VgX_nDudZcX7xon%2BcrOtZZc0Gw%40mail.gmail.com.

Ondřej Čertík

unread,
Apr 23, 2014, 12:08:18 PM4/23/14
to sympy
On Wed, Apr 23, 2014 at 10:02 AM, Aaron Meurer <asme...@gmail.com> wrote:
> Just use x = Poly(x**x). That's the dense representation. No idea how
> to do this with ring(). Mateusz will have to show us. But the
> polynomials you're creating in this benchmark are univariate and dense
> anyway.

We should also be plotting the dependence on N, as different data
structures have different behaviors,
for example hash table (unordered_map, or dict in Python) vs.
red-black trees (std::map).

Ultimately though, and that's the main issue, instead of concentrating
on these artificial benchmarks, I am concentrating
on real world applications, thus PyDy. If PyDy could be done with
sympy.polys, then that would be good, but
I am afraid it can't, as it has stuff like sin, cos, unevaluated
functions like f1(t), and so on.

Ondrej

Matthew Rocklin

unread,
Apr 23, 2014, 5:43:50 PM4/23/14
to sy...@googlegroups.com

Just wanted to pop in and say that I really like this conversation.

Question, how fast could sympy core be if we were to pull out some of the assumptions logic until a final step at the end.  What stops core from reaching polys speed?

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.

Aaron Meurer

unread,
Apr 23, 2014, 5:52:44 PM4/23/14
to sy...@googlegroups.com
I think the polys are fast because it can assume that the expressions
are polynomials, which is a much simpler data structure than a general
mathematical expression.

Aaron Meurer
> https://groups.google.com/d/msgid/sympy/CAJ8oX-Fj%3DVwotBZKNFb2shVD8Nn3-T-ZFumXjrzR25hWjgDD_w%40mail.gmail.com.

Ondřej Čertík

unread,
Apr 23, 2014, 5:54:57 PM4/23/14
to sympy
On Wed, Apr 23, 2014 at 3:52 PM, Aaron Meurer <asme...@gmail.com> wrote:
> I think the polys are fast because it can assume that the expressions
> are polynomials, which is a much simpler data structure than a general
> mathematical expression.

That's exactly right. If you want a general data structure but without
assumptions and as fast as possible,
in Python, look here:

https://github.com/certik/sympyx

There is also cythonized version. It's fast (much faster than SymPy),
but still slower than CSymPy.

I don't know how to make it any faster, in Python or Cython.

Ondrej
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6JBYbtJ5JQbeUe9mW5F%3DhrjUaoY73WrP-f%3Dq3bjZunKwQ%40mail.gmail.com.

Matthew Rocklin

unread,
Apr 23, 2014, 6:19:22 PM4/23/14
to sy...@googlegroups.com
Perhaps a good target workflow would be something like the following

from sympy import *
from csympy import *

... do work as usual ...

In other words it would be nice for csympy to take over from sympy where it has functionality, but for sympy-proper to fill in all of the holes.  Ondrej, is this on your roadmap at all?

It gets weird of course when you have something like Expr(...) + CExpr(...)

Aaron Meurer

unread,
Apr 23, 2014, 6:27:21 PM4/23/14
to sy...@googlegroups.com
Well that would only affect what you do in your own workspace. Much
better would be if sympy transparently imported and used csympy
wherever relevant when it was available, so that sympy.Symbol would be
csympy.Symbol, and so on. Then everywhere in sympy would use csympy.

But this requires full interoperability between sympy and csympy
(which is one of the things I have not seen yet). I think the next
task for csympy should be:

- Patch sympy/core/symbol.py to use csympy.Symbol (so that from
sympy.core.symbol import Symbol gives csympy.Symbol)
- Run the tests
- Fix whatever fails
- Repeat with further csympy objects

And always do this with every csympy object that is written. We
perhaps need a more unified way to automatically replace anything in
sympy with csympy (especially since Ondrej still hasn't provided a
clear separation between what will and will not be in csympy, at least
in my mind).

Then we can stop making these artificial benchmarks and performance
speculations, and just see what really is slow, and what really speeds
things up, and, just as importantly, what doesn't speed things up
enough to warrant the complexity cost.

I would focus on this rather than PyDy specifically. It may be
possible to use csympy in PyDy without doing this, but it will be a
wasted effort if you eventually do this and then the PyDy specific
implementation can be scrapped.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/CAJ8oX-Fh%3DFR5LM2Uehn%2BWTdFs%3D8QBys%2B2BoPXUdZKgdWeHaqbA%40mail.gmail.com.

Ondřej Čertík

unread,
Apr 23, 2014, 6:31:15 PM4/23/14
to sympy
On Wed, Apr 23, 2014 at 4:27 PM, Aaron Meurer <asme...@gmail.com> wrote:
> Well that would only affect what you do in your own workspace. Much
> better would be if sympy transparently imported and used csympy
> wherever relevant when it was available, so that sympy.Symbol would be
> csympy.Symbol, and so on. Then everywhere in sympy would use csympy.

That's a good idea. There is of course the issue with assumptions,
attached to the Symbols...

>
> But this requires full interoperability between sympy and csympy
> (which is one of the things I have not seen yet). I think the next
> task for csympy should be:
>
> - Patch sympy/core/symbol.py to use csympy.Symbol (so that from
> sympy.core.symbol import Symbol gives csympy.Symbol)
> - Run the tests
> - Fix whatever fails
> - Repeat with further csympy objects
>
> And always do this with every csympy object that is written. We
> perhaps need a more unified way to automatically replace anything in
> sympy with csympy (especially since Ondrej still hasn't provided a
> clear separation between what will and will not be in csympy, at least
> in my mind).
>
> Then we can stop making these artificial benchmarks and performance
> speculations, and just see what really is slow, and what really speeds
> things up, and, just as importantly, what doesn't speed things up
> enough to warrant the complexity cost.
>
> I would focus on this rather than PyDy specifically. It may be
> possible to use csympy in PyDy without doing this, but it will be a
> wasted effort if you eventually do this and then the PyDy specific
> implementation can be scrapped.

The goal of using CSymPy in PyDy is precisely to do this switch
(single line of code)
in PyDy only, not the core of SymPy. Because that's a lot simpler (no
need to worry
about assumptions) and tons of tests. And doing a real application
where speed matters.

After that, do this in sympy.core --- that requires tackling
assumptions and maybe few
other things.

Ondrej
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6Kn9fKNFN7uNya_pLaj0%2Bqy0d%3DisZ8_6SKdp8sGwwBMKQ%40mail.gmail.com.

Ondřej Čertík

unread,
Apr 23, 2014, 6:36:42 PM4/23/14
to sympy
So that we are not just talking, here is the actual code:

https://github.com/sympy/sympy/pull/7397

It's simple, and we are still trying to make it more simple (=more compatible
with SymPy), until it is really just one line. There are some technical issue
like our Function("f") would correspond to just raw class in C++, except that
C++ doesn't support that, so I have a function called function_symbol("f", x),
and so on. This could be abstracted in the Python wrappers and so on. This
will all be fixed eventually.

Aaron, I think you want to have full compatibility with SymPy immediately. But
there could be significant speed/efficiency cost, and I need to make
sure that the
speed of the C++ core is optimal.

The time frame is roughly this summer, when we implement enough features
to be useful for PyDy and do some serious benchmarking on real world problems.
After that it would be good time to review the Python API to make sure
it works well
with SymPy and start integrating it more. For now I just keep things
simple to get the job done.

Ondrej

Jason Moore

unread,
Apr 23, 2014, 6:42:31 PM4/23/14
to sy...@googlegroups.com
Yeah, all I'm planning to do on that PR is to make a hack that inserts csympy data types into the core mechanics objects (Vectors, Dyadics, etc). The nice thing about the vector and mechanics module is that we only use symbols, functions, differentiation, addition, multiplication, basic trig functions, etc. Really a small subset of functionality that sympy offers. This will give a simpler test to see what it takes to have csympy work seamlessly. I'll try to spend some time this weekend to update that PR with Ondrej's latest features in csympy.

Matthew Rocklin

unread,
Apr 23, 2014, 6:43:11 PM4/23/14
to sy...@googlegroups.com
New assumptions strikes again!

I hope that by trying to interface SymPy with another project we expose and fix some of the warts in SymPy's current design.  Things like Function threw me off when I melded SymPy with LogPy (my little logic programming system) or with Theano.  It'll be nice for others to run into these.

My hope is that if/when we remove old assumptions that we'll be able to simplify a lot of the core, removing metaclasses, and hopefully imposing a few more strict invariants.

SymPy is pretty awesome but it'd be nice if the trees were simpler.  That would make it more pleasurable to interoperate with other systems.


Matthew Rocklin

unread,
Apr 23, 2014, 6:43:47 PM4/23/14
to sy...@googlegroups.com
PyDy sounds like the perfect proof of concept test for CSymPy


Matthew Rocklin

unread,
Apr 23, 2014, 6:51:32 PM4/23/14
to sy...@googlegroups.com
I generally agree with Aaron and Brian that I'd like to see CSymPy be as small as possible and focus more on SymPy-CSymPy integration.  My opinion is that performant cores should be small.

That being said I think that it's generally a good thing for Ondrej to go off and screw around with a C++ implementation in isolation.  Attaching himself to the existing SymPy infrastructure would probably bog down development.  Even O(n) performance gains are a good thing, and this is something about which Ondrej knows quite a lot.

I also think that, if CSymPy can make a compelling case for integration (perhaps through a PyDy solution), then at least some of the integration effort should come from the SymPy side, changing itself to adhere to something more CSymPy friendly (e.g. changing how Function works).  I think that working on interoperation is good for SymPy in the long run.

Ondrej, I think that folks would generally be more comfortable if we better understood the strategy for eventual SymPy-CSymPy integration.  Right now I don't have a good handle on this.  From far away it's not apparent that this is a major design goal of CSymPy.

Aaron Meurer

unread,
Apr 23, 2014, 6:52:18 PM4/23/14
to sy...@googlegroups.com
Good, now we are hitting real concrete issues.

>
> It's simple, and we are still trying to make it more simple (=more compatible
> with SymPy), until it is really just one line. There are some technical issue
> like our Function("f") would correspond to just raw class in C++, except that
> C++ doesn't support that, so I have a function called function_symbol("f", x),
> and so on. This could be abstracted in the Python wrappers and so on. This
> will all be fixed eventually.

This is something that should be fixed. See
https://github.com/sympy/sympy/issues/4787.

In fact, one of the reasons I suggested this is so that we could see
what the limitations are in the SymPy API (it's not perfect, and
shouldn't be assumed so). This is one of them. But it won't be good if
we "fix" the SymPy API and then much later try to integrate CSymPy and
discover that the real issues were not fixed at all.

In short, to succeed, CSymPy and SymPy should know about each other
sooner rather than later.

>
> Aaron, I think you want to have full compatibility with SymPy immediately. But
> there could be significant speed/efficiency cost, and I need to make
> sure that the
> speed of the C++ core is optimal.

Yes, because otherwise you are optimizing what you think should be
fast, rather than what you know should be fast from hard data.

>
> The time frame is roughly this summer, when we implement enough features
> to be useful for PyDy and do some serious benchmarking on real world problems.
> After that it would be good time to review the Python API to make sure
> it works well
> with SymPy and start integrating it more. For now I just keep things
> simple to get the job done.

But how do you know what should be fast and what isn't important
unless you have some kind of integration from the start? It doesn't
have to be perfect, but it should be enough to run the actual code
against CSymPy so that you can see if real things are getting faster,
and what real things are being slow.

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVD_Knxf1%2B82-vmGOyPA%3DWfZgg1Np3PMyUDHoYaRzt4nqA%40mail.gmail.com.

Aaron Meurer

unread,
Apr 23, 2014, 6:55:15 PM4/23/14
to sy...@googlegroups.com
Just to clarify, I am all for small, hacky, "proof of concept"
integration. You shouldn't try to get it right the first time. My
point is only that you should start trying early, so that your design
can be guided by the issues that will come up.



On Wed, Apr 23, 2014 at 5:51 PM, Matthew Rocklin <mroc...@gmail.com> wrote:
> I generally agree with Aaron and Brian that I'd like to see CSymPy be as
> small as possible and focus more on SymPy-CSymPy integration. My opinion is
> that performant cores should be small.
>
> That being said I think that it's generally a good thing for Ondrej to go
> off and screw around with a C++ implementation in isolation. Attaching
> himself to the existing SymPy infrastructure would probably bog down
> development. Even O(n) performance gains are a good thing, and this is
> something about which Ondrej knows quite a lot.
>
> I also think that, if CSymPy can make a compelling case for integration
> (perhaps through a PyDy solution), then at least some of the integration
> effort should come from the SymPy side, changing itself to adhere to
> something more CSymPy friendly (e.g. changing how Function works). I think
> that working on interoperation is good for SymPy in the long run.
>
> Ondrej, I think that folks would generally be more comfortable if we better
> understood the strategy for eventual SymPy-CSymPy integration. Right now I
> don't have a good handle on this. From far away it's not apparent that this
> is a major design goal of CSymPy.

Yes, I 100% agree with all of this. Exactly what I've been trying to get at.
> https://groups.google.com/d/msgid/sympy/CAJ8oX-FpCYeJVkw186rUOi1cnYD0RKWF9XAOH9%3DAHZSfDAZibA%40mail.gmail.com.

Ondřej Čertík

unread,
Apr 23, 2014, 7:29:18 PM4/23/14
to sympy
On Wed, Apr 23, 2014 at 4:51 PM, Matthew Rocklin <mroc...@gmail.com> wrote:
> I generally agree with Aaron and Brian that I'd like to see CSymPy be as
> small as possible and focus more on SymPy-CSymPy integration. My opinion is
> that performant cores should be small.
>
> That being said I think that it's generally a good thing for Ondrej to go
> off and screw around with a C++ implementation in isolation. Attaching
> himself to the existing SymPy infrastructure would probably bog down
> development. Even O(n) performance gains are a good thing, and this is
> something about which Ondrej knows quite a lot.
>
> I also think that, if CSymPy can make a compelling case for integration
> (perhaps through a PyDy solution), then at least some of the integration
> effort should come from the SymPy side, changing itself to adhere to
> something more CSymPy friendly (e.g. changing how Function works). I think
> that working on interoperation is good for SymPy in the long run.
>
> Ondrej, I think that folks would generally be more comfortable if we better
> understood the strategy for eventual SymPy-CSymPy integration. Right now I
> don't have a good handle on this. From far away it's not apparent that this
> is a major design goal of CSymPy.

I understand, that this makes you guys a little nervous and uneasy. I don't
have all the answers myself. I am committing to make this work. I am
specifically
*not* committing to strictly adhere to the current sympy, because the goal
of csympy is the absolute best performance, that we can get, for the *real*
problems that people are solving, like PyDy. It might be that the current SymPy
can be interfaced, but it might also mean that some changes to SymPy
would be needed.
I just don't know yet.

Once this works great for PyDy, I will start working hard on making
this work for SymPy.
First by trying to do the Python wrappers in a way that works. It
might be that something
might need changes on the SymPy side.

I am also saying that right now is a little to early to have specific
discussions about the
Function interface and other issues. Because I need to first benchmark
the C++ core
and try couple different ways to optimize things. Only after things
settle a bit, it's time
to see how to best interface with SymPy.

Obviously, roughly it sort of works even now and I try to keep things
compatible if possible.

We'll make this work, I am confident of that.

Ondrej

Joachim Durchholz

unread,
Apr 24, 2014, 12:36:44 AM4/24/14
to sy...@googlegroups.com
Am 23.04.2014 01:24, schrieb Ondřej Čertík:
>> For the basic manipulation pretty much. One can get faster than the
>> current sympy,
>> see e.g. sympycore. But you don't get as fast as current CSymPy with
>> that approach.
>> The speedup is linear for O(n) problems. For O(n^3) problems (Gaussian
>> elimination [1]...)
>> the speedup is cubic. So it really matters for these kind of problems.
>
> No, I think it is still linear even for O(n^3) problems. If it has n^3
> operations
> and each operations takes 100x less with CSymPy, the whole calculation still
> takes 100x less with CSymPy.

That was my knee-jerk reaction, too.
But then it depends on what you mean by "speedup is cubic" - if it's
"reduce O(N³) to O(1)" then that's obviously absurd, if it's "we have
O(N³) so the payoff will be ~cubic with input size" then it's actually a
correct statement.

Joachim Durchholz

unread,
Apr 24, 2014, 12:45:04 AM4/24/14
to sy...@googlegroups.com
Am 23.04.2014 05:52, schrieb Aaron Meurer:
> One of my worries is not listed here, which is that you are doing
> things completely backwards from good software design with CSymPy,
> which is getting speed first, and something that works later.

Now this is an optimization project, so the point is kind of moot.

However, it's doing optimization backwards. You don't go ahead and
optimize first, you go ahead and benchmark, then identify bottlenecks,
then reimplement that stuff in a faster fashion, then benchmark again to
get feedback on how successful that was.
I'm missing the latter though.

Joachim Durchholz

unread,
Apr 24, 2014, 12:49:37 AM4/24/14
to sy...@googlegroups.com
Am 23.04.2014 23:54, schrieb Ondřej Čertík:
> On Wed, Apr 23, 2014 at 3:52 PM, Aaron Meurer <asme...@gmail.com> wrote:
>> I think the polys are fast because it can assume that the expressions
>> are polynomials, which is a much simpler data structure than a general
>> mathematical expression.
>
> That's exactly right. If you want a general data structure but without
> assumptions and as fast as possible,
> in Python, look here:
>
> https://github.com/certik/sympyx
>
> There is also cythonized version. It's fast (much faster than SymPy),
> but still slower than CSymPy.

What are the speed differences?
Are the three versions doing the same thing?

Joachim Durchholz

unread,
Apr 24, 2014, 1:09:05 AM4/24/14
to sy...@googlegroups.com
Am 24.04.2014 01:29, schrieb Ondřej Čertík:
> the goal
> of csympy is the absolute best performance, that we can get, for the *real*
> problems that people are solving, like PyDy.

I'm at odds with absolute goals like that.
They're good for proof of concept code. It's good to know how far some
envelope can be pushed.
However, once code gets productive, other factors come like (among
others) maintainability, redundancy, testability etc. come into play.
Normally you'll have to strike a trade-off of some kind and what you'll
get is specifically *not* "absolute best performance". Hey, as long as
you aren't coding in assembly, you're not going to get the absolute
bests performance anyway, there will always be some 10% or 20% you don't
get - trying to get the maximum out of C++ isn't actually that
interesting, and the art is to see the diminishing returns and know when
to stop.

Joachim Durchholz

unread,
Apr 24, 2014, 1:18:35 AM4/24/14
to sy...@googlegroups.com
Am 23.04.2014 02:16, schrieb Aaron Meurer:
> On Tue, Apr 22, 2014 at 4:58 PM, Joachim Durchholz <j...@durchholz.org> wrote:
>>> The goals are written above. I am myself concentrating on speed,
>>> that's what I really want to nail down.
>>
>> I'm somewhat sceptical about this.
>> A conversion to C++ will give a linear improvement.
>> Better algorithms can improve the big-Oh class.
>> Unless algorithmic improvements have been exhausted, this would be premature
>> optimization. (Are algorithmic improvements exhausted yet?)
>
> That is true. I usually prefer to use faster algorithms.
>
> But to be the devil's advocate, there are two issues with this line of thinking:
>
> - Big O is basically useless.

Um... well, no, it does have its relevance, particularly if your code is
supposed to scale - and SymPy certainly should scale at least in some
areas (such as increasing matrix sizes).

You're right that linear factors and constant overheads must not be ignored.
It's important in SymPy already :-)
... and still, the point stands: Improving the Python side of things
algorithmically, even if it reduces "just" constant overheads and linear
factors, would still be very useful, and they'd come without the costs
associated with having a second programming language in the system.

My biggest fear is that CSymPy will result in a fast but somewhat
brittle system, with all the associated frustrations (mainly for Ondrej
so the damage is limited but still).
We'll see.

Ondřej Čertík

unread,
Apr 24, 2014, 2:09:14 AM4/24/14
to sympy
That's exactly how I developed csympy, but I benchmarked it on
artificial problems.
But I got it to speeds similar to Mathematica and GiNaC (sometimes
faster, sometimes slower).
Now my goal is to benchmark on real systems and get a little more
exposure and repeat the process.

Ondrej

mario

unread,
Apr 24, 2014, 2:10:26 AM4/24/14
to sy...@googlegroups.com
Ondřej, you wrote:
For the sympy.polys in this example,
is it using the sparse polynomial representation? I.e. it stores the
symbols (x, y, z) and then stores their powers and coefficients
in a dictionary, e.g.:

3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}

?

In CSymPy, I am still implementing this datastructure, as it is very
efficient, but it only works for polynomials. So at the moment, we
can't benchmark this yet.

In the dictionary in that example you meant {(2,0,1): 3, (1,0,0):2}
It is not fast with many variables; typically many of the exponents are zero, so
it is not efficient to iterate on an array of exponents, which are mostly zero.
It is faster to use a data structure keeping only the nonzero exponents, like
the ETuple used in PolyDict in Sage. The code for ETuple is short and simple.
>>>>>> The goals are written above. I am myself concentrating on speed,
>>>>>> that's what I really
>>>>>>> The goals are written above. I am myself concentrating on speed,
>>>>>>> that's what I really
>>>>>>> want to nail down.
>>>>>>
>>>>>>
>>>>>> I'm somewhat sceptical about this.
>>>>>> A conversion to C++ will give a linear improvement.
>>>>>> Better algorithms can improve the big-Oh class.
>>>>>> Unless algorithmic improvements have been exhausted, this would be premature
>>>>>> optimization. (Are algorithmic improvements exhausted yet?)
>>>>>
>>>>> That is true. I usually prefer to use faster algorithms.
>>>>>
>>>>> But to be the devil's advocate, there are two issues with this line of thinking:
>>>>>
>>> One of my worries is not listed here, which is that you are doing
>>> things completely backwards from good software design with CSymPy,
>>> which is getting speed first, and something that works later.
>>
For the sympy.polys in this example,
is it using the sparse polynomial representation? I.e. it stores the
symbols (x, y, z) and then stores their powers and coefficients
in a dictionary, e.g.:

3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}

Ondřej Čertík

unread,
Apr 24, 2014, 2:18:42 AM4/24/14
to sympy
On Thu, Apr 24, 2014 at 12:10 AM, mario <mario....@gmail.com> wrote:
> Ondřej, you wrote:
> For the sympy.polys in this example,
> is it using the sparse polynomial representation? I.e. it stores the
> symbols (x, y, z) and then stores their powers and coefficients
> in a dictionary, e.g.:
>
> 3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}
>
> ?
>
> In CSymPy, I am still implementing this datastructure, as it is very
> efficient, but it only works for polynomials. So at the moment, we
> can't benchmark this yet.
>
> In the dictionary in that example you meant {(2,0,1): 3, (1,0,0):2}

Ah, yes, of course.

> It is not fast with many variables; typically many of the exponents are
> zero, so
> it is not efficient to iterate on an array of exponents, which are mostly
> zero.
> It is faster to use a data structure keeping only the nonzero exponents,
> like
> the ETuple used in PolyDict in Sage. The code for ETuple is short and
> simple.

Here is the documentation:

http://www.sagemath.org/doc/reference/polynomial_rings/sage/rings/polynomial/polydict.html

Thanks for the tips. In CSymPy, here is the implementation for sparse
polynomial multiplication:

https://github.com/sympy/csympy/blob/master/src/rings.cpp#L63

and 'umap_vec_mpz' is defined here:

https://github.com/sympy/csympy/blob/master/src/dict.h#L67

If you have tips how to implement ETuple in C++ and improve upon this
datastructure, please let me know.

Ondrej
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/c64643bb-8e68-4fa5-8eee-8ed7f73c10a8%40googlegroups.com.

F. B.

unread,
Apr 24, 2014, 10:10:53 AM4/24/14
to sy...@googlegroups.com

On Thursday, April 24, 2014 12:43:11 AM UTC+2, Matthew wrote:
Things like Function threw me off when I melded SymPy with LogPy (my little logic programming system) or with Theano.  It'll be nice for others to run into these.

By the way, is there any implementation of MiniKanren in C++? Maybe that in such a case one could try to write some algorithms in a rule-based format (as in MiniKanren), then either load these rules through SymPy and LogPy, or through CSymPy and some other parsing library (or maybe even by template metaprogramming). But I'm just speculating.

Ondřej Čertík

unread,
Apr 24, 2014, 11:06:05 AM4/24/14
to sympy
CSymPy:

In [1]: from csympy import Symbol, Integer

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Integer(1)

In [4]: %time for i in range(N): a += c*x**i; c *= -1
CPU times: user 1.04 s, sys: 0 ns, total: 1.04 s
Wall time: 1.04 s

sympyx Cython:

In [1]: from sympy_pyx import Symbol, Integer

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Integer(1)

In [4]: %time for i in range(N): a += c*x**i; c *= -1
CPU times: user 10.8 s, sys: 0 ns, total: 10.8 s
Wall time: 10.8 s

sympyx Python:

In [1]: from sympy_py import Symbol, Integer

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Integer(1)

In [4]: %time for i in range(N): a += c*x**i; c *= -1
CPU times: user 1min 42s, sys: 4 ms, total: 1min 42s
Wall time: 1min 42s

SymPy:

In [1]: from sympy import Symbol, Integer

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Integer(1)

In [4]: %time for i in range(N): a += c*x**i; c *= -1
CPU times: user 25.5 s, sys: 20 ms, total: 25.6 s
Wall time: 25.6 s


So I guess sympyx is doing something stupid. I think it's the loop:

class Add(Basic):

...

@classmethod
def canonicalize(cls, args):
use_glib = 0
if use_glib:
from csympy import HashTable
d = HashTable()
else:
d = {}
num = Integer(0)
for a in args:
if a.type == INTEGER:
num += a
elif a.type == ADD:
for b in a.args:
if b.type == INTEGER:
num += b
else:
coeff, key = b.as_coeff_rest()
if key in d:
d[key] += coeff
else:
d[key] = coeff
...

So it loops over the dictionary in Add, which is inefficient. In
CSymPy, we copy the dictionary and add to it directly:

RCP<const Basic> add(const RCP<const Basic> &a, const RCP<const Basic> &b)
{
CSymPy::umap_basic_int d;
RCP<const Number> coef;
RCP<const Basic> t;
...
} else if (CSymPy::is_a<Add>(*a)) {
coef = (rcp_static_cast<const Add>(a))->coef_;
d = (rcp_static_cast<const Add>(a))->dict_;
if (is_a_Number(*b)) {
iaddnum(outArg(coef), rcp_static_cast<const Number>(b));
} else {
RCP<const Number> coef2;
Add::as_coef_term(b, outArg(coef2), outArg(t));
Add::dict_add_term(d, coef2, t);
}



> Are the three versions doing the same thing?

Good question, I thought it is, but it's apparently not, as I just
found out, as sympyx is using a list for arguments of Add, instead of
a dictionary. I also just tried sympycore:

In [1]: from sympycore import Symbol, Calculus
SparsepolyHead.evalf does not define documentation string

In [2]: x = Symbol("x")

In [3]: N, a, c = 3000, x, Calculus(1)

In [4]: %time for i in range(N): a += c*x**i; c *= -1
CPU times: user 52 ms, sys: 0 ns, total: 52 ms
Wall time: 50.2 ms


So this sets the bar for this benchmark --- but one should note that
sympycore's "a" is mutable. If you do this:

In [4]: %time for i in range(N): a = a + c*x**i; c *= -1
CPU times: user 548 ms, sys: 144 ms, total: 692 ms
Wall time: 693 ms


Then it's closer. Still it's faster than CSymPy, so there is room for
improvement. I am glad I tried this, sympycore must be using better
algorithms. Though on this benchmark:

In [1]: from sympycore import Symbol, Calculus
SparsepolyHead.evalf does not define documentation string

In [2]: x = Symbol("x")

In [3]: y = Symbol("y")

In [4]: z = Symbol("z")

In [5]: w = Symbol("w")

In [6]: e = ((x+y+z+w)**15+w)*(x+y+z+w)**15

In [7]: e
Out[7]: Calculus('(y + x + z + w)**15*(w + (y + x + z + w)**15)')

In [8]: %time f = e.expand()
CPU times: user 5.02 s, sys: 144 ms, total: 5.16 s
Wall time: 5.16 s


vs

In [1]: from csympy import var

In [2]: var("x y z w")
Out[2]: (x, y, z, w)

In [3]: e = ((x+y+z+w)**15+w)*(x+y+z+w)**15

In [4]: %time f = e.expand()
CPU times: user 2 s, sys: 4 ms, total: 2 s
Wall time: 2 s


So more time is needed to spend on this to see why is sympycore faster
on something, but slower on something else.

Ondrej

Ondřej Čertík

unread,
Apr 24, 2014, 11:16:59 AM4/24/14
to sympy
On Thu, Apr 24, 2014 at 9:06 AM, Ondřej Čertík <ondrej...@gmail.com> wrote:
> On Wed, Apr 23, 2014 at 10:49 PM, Joachim Durchholz <j...@durchholz.org> wrote:
>> Am 23.04.2014 23:54, schrieb Ondřej Čertík:
>>
>>> On Wed, Apr 23, 2014 at 3:52 PM, Aaron Meurer <asme...@gmail.com> wrote:
>>>>
>>>> I think the polys are fast because it can assume that the expressions
>>>> are polynomials, which is a much simpler data structure than a general
>>>> mathematical expression.
>>>
>>>
>>> That's exactly right. If you want a general data structure but without
>>> assumptions and as fast as possible,
>>> in Python, look here:
>>>
>>> https://github.com/certik/sympyx
>>>
>>> There is also cythonized version. It's fast (much faster than SymPy),
>>> but still slower than CSymPy.
>>
>>
>> What are the speed differences?
>
> CSymPy:
>
> In [1]: from csympy import Symbol, Integer
>
> In [2]: x = Symbol("x")
>
> In [3]: N, a, c = 3000, x, Integer(1)
>
> In [4]: %time for i in range(N): a += c*x**i; c *= -1
> CPU times: user 1.04 s, sys: 0 ns, total: 1.04 s
> Wall time: 1.04 s

I've done this benchmark in C++ (https://github.com/sympy/csympy/pull/163):

for (int i = 0; i < N; i++) {
a = add(a, mul(c, pow(x, integer(i))));
c = mul(c, integer(-1));
}


$ ./add1
625ms
number of terms: 2998


Aaron, so this is the current overhead of the Python wrappers ---
almost 1.6x slower.
Now the time is comparable to sympycore, but since sympycore is a mix
of Python and C,
it shows that things can be done faster. I need to look into this further.

Matthew Rocklin

unread,
Apr 24, 2014, 11:19:31 AM4/24/14
to sy...@googlegroups.com

By the way, is there any implementation of MiniKanren in C++? Maybe that in such a case one could try to write some algorithms in a rule-based format (as in MiniKanren), then either load these rules through SymPy and LogPy, or through CSymPy and some other parsing library (or maybe even by template metaprogramming). But I'm just speculating.

Yeah, I've given thought to low-level approaches to tree transformation.  That's the sort of algorithm that could interface with SymPy Trees pretty naturally.

I think that the answer is to implement just a few parts in C++, SymPy really doesn't need a full logic programming system.  We only need things like multi-pattern unification and AC unification.

Note that there are mature efficient projects that do this (and lots of other things) already.  Elan and Stratego/XT come to mind.  They're heavy dependencies though.

Joachim Durchholz

unread,
Apr 24, 2014, 1:06:35 PM4/24/14
to sy...@googlegroups.com
Am 24.04.2014 17:19, schrieb Matthew Rocklin:
> SymPy
> really doesn't need a full logic programming system.

That would be far too horrible performance-wise.

> We only need things
> like multi-pattern unification and AC unification.

I do not know what exactly these are.
I read that AC unification can be polynomial, which sounds like a
potential performance bottleneck; do we really need it?

> Note that there are mature efficient projects that do this (and lots of
> other things) already. Elan and Stratego/XT come to mind. They're heavy
> dependencies though.

I'm not sure what the consensus on dependencies is, but I myself think
that a dependency is okay if we can either make sure that a specific,
tested version is used, or if we expect the APIs to be stable and the
need to fork for bug fixes or extensions is nonexistent.

Matthew Rocklin

unread,
Apr 24, 2014, 1:58:18 PM4/24/14
to sy...@googlegroups.com

> We only need things
like multi-pattern unification and AC unification.

I do not know what exactly these are.
I read that AC unification can be polynomial, which sounds like a potential performance bottleneck; do we really need it?

Given expression (like sin(2*x) + cos(2*x)) we need to match against many possible patterns (like, every known trig identity).  It's possible to store all of the patterns in a good data structure so that we can check them all simultaneously (see Trie for something similar).  We need to do this matching in a way that is aware of commutative operators (like +).  Naive solutions to this are very slow.  There exists sophisticated solutions.

Note that there are mature efficient projects that do this (and lots of
other things) already.  Elan and Stratego/XT come to mind.  They're heavy
dependencies though.

I'm not sure what the consensus on dependencies is, but I myself think that a dependency is okay if we can either make sure that a specific, tested version is used, or if we expect the APIs to be stable and the need to fork for bug fixes or extensions is nonexistent.

When I say heavy I mean that these are straight research projects.  They're very good but there is virtually no distribution support here.

Ondřej Čertík

unread,
Apr 24, 2014, 6:52:54 PM4/24/14
to sympy
On Thu, Apr 24, 2014 at 11:58 AM, Matthew Rocklin <mroc...@gmail.com> wrote:
>>
>> > We only need things
>>>
>>> like multi-pattern unification and AC unification.
>>
>>
>> I do not know what exactly these are.
>> I read that AC unification can be polynomial, which sounds like a
>> potential performance bottleneck; do we really need it?
>
>
> Given expression (like sin(2*x) + cos(2*x)) we need to match against many
> possible patterns (like, every known trig identity). It's possible to store
> all of the patterns in a good data structure so that we can check them all
> simultaneously (see Trie for something similar). We need to do this
> matching in a way that is aware of commutative operators (like +). Naive
> solutions to this are very slow. There exists sophisticated solutions.

For trig simplification, there is a paper that I think uses some systematic way
to simplify it. So it might be that for that you don't need to check
all the trig
identities.

Ondrej

Joachim Durchholz

unread,
Apr 25, 2014, 2:46:32 AM4/25/14
to sy...@googlegroups.com
Am 25.04.2014 00:52, schrieb Ondřej Čertík:
> On Thu, Apr 24, 2014 at 11:58 AM, Matthew Rocklin <mroc...@gmail.com> wrote:
>>>
>>>> We only need things
>>>>
>>>> like multi-pattern unification and AC unification.
>>>
>>>
>>> I do not know what exactly these are.
>>> I read that AC unification can be polynomial, which sounds like a
>>> potential performance bottleneck; do we really need it?
>>
>> Given expression (like sin(2*x) + cos(2*x)) we need to match against many
>> possible patterns (like, every known trig identity).

I'm assuming that this is what was meant by "multi-pattern unification".
It's essentially a search in a potentially infinite decision tree. I
think that's already polynomial, possibly even undecidable.

>> It's possible to store
>> all of the patterns in a good data structure so that we can check them all
>> simultaneously (see Trie for something similar). We need to do this
>> matching in a way that is aware of commutative operators (like +). Naive
>> solutions to this are very slow. There exists sophisticated solutions.

Can't we deal with commutativity by normalizing the tree?
E.g. in polynoms, it's already commonplace to put the higher exponents
to the left; set up a total order over expressions that includes this
and similar conventions.
Similar for associative operators; there, we turn A+(B+C) into
plus(A,B,C) and don't have to worry about burdening the unification
algorithm with semantic subtleties.

> For trig simplification, there is a paper that I think uses some systematic way
> to simplify it. So it might be that for that you don't need to check
> all the trig
> identities.

I think that's the general approach. Do not exhaust the decision tree,
follow a search strategy.
Applies to trig, integrals, polynom simplification, and probably almost
all areas of mathematics.

Can this be captured as a simple priority value on each substitution rule?

Matthew Rocklin

unread,
Apr 25, 2014, 11:02:19 AM4/25/14
to sy...@googlegroups.com

I do not know what exactly these are.
I read that AC unification can be polynomial, which sounds like a
potential performance bottleneck; do we really need it?

Given expression (like sin(2*x) + cos(2*x)) we need to match against many
possible patterns (like, every known trig identity).

I'm assuming that this is what was meant by "multi-pattern unification".
It's essentially a search in a potentially infinite decision tree. I think that's already polynomial, possibly even undecidable.

The large decision tree search problem comes after you determine which patterns might possibly match.  That is also a fun problem.  To be explicit, given an expression and a knowledgebase of patterns first see which patterns are valid, then choose which patterns to apply.  Both of these operations are interesting.

>> It's possible to store
all of the patterns in a good data structure so that we can check them all
simultaneously (see Trie for something similar).  We need to do this
matching in a way that is aware of commutative operators (like +).  Naive
solutions to this are very slow.  There exists sophisticated solutions.

Can't we deal with commutativity by normalizing the tree?
E.g. in polynoms, it's already commonplace to put the higher exponents to the left; set up a total order over expressions that includes this and similar conventions.
Similar for associative operators; there, we turn A+(B+C) into plus(A,B,C) and don't have to worry about burdening the unification algorithm with semantic subtleties.

For commutativity, yes, for associativity and commutativity no.  

Match x + B to  A+B+C.  This problem is typically solved for a single pattern by performing a bipartite matching.  It's more complex when there exists several patterns and you don't want to check each of them linearly.

For trig simplification, there is a paper that I think uses some systematic way
to simplify it. So it might be that for that you don't need to check
all the trig
identities.

Yeah, so this is actually in SymPy.   Chris Smith already implemented this in fusimp.  Note that this doesn't use pattern matching but it does search through a tree of small transformations.
 
    >>> fu(sin(x)/cos(x))  # default objective function
    tan(x)
    >>> fu(sin(x)/cos(x), measure=lambda x: -x.count_ops()) # maximize op count
    sin(x)/cos(x)

I used the strategies module to pull out the tree search stuff.  See either sympy.strategies or github.com/logpy/strategies/
It is used in fu simp here


I think that's the general approach. Do not exhaust the decision tree, follow a search strategy.
Applies to trig, integrals, polynom simplification, and probably almost all areas of mathematics.

Can this be captured as a simple priority value on each substitution rule?

You can get pretty far with a simple priority value.  Often times greedy solutions don't produce optimal results.  My solution to this is to follow a greedy solution but then allow backtracking, allowing the user to ask for extra results.  

Ondřej Čertík

unread,
Apr 25, 2014, 12:01:41 PM4/25/14
to sympy
This is impressive!! It can do stuff like this:

In [3]: cos((x+y)/2) * sin((x-y)/2)
Out[3]: sin(x/2 - y/2)*cos(x/2 + y/2)

In [4]: fu(_)
Out[4]: sin(x)/2 - sin(y)/2

In [7]: sec(x) / (2*sin(x))
Out[7]: sec(x)/(2*sin(x))

In [8]: fu(_)
Out[8]: 1/sin(2*x)


I had no idea that this was implemented. That's very exciting.

Ondrej

>
> I used the strategies module to pull out the tree search stuff. See either
> sympy.strategies or github.com/logpy/strategies/
> It is used in fu simp here
>
> https://github.com/sympy/sympy/blob/821fc389bcb7285555f6ec46c32afa38faceeab1/sympy/simplify/fu.py#L1597
>
>> I think that's the general approach. Do not exhaust the decision tree,
>> follow a search strategy.
>> Applies to trig, integrals, polynom simplification, and probably almost
>> all areas of mathematics.
>>
>> Can this be captured as a simple priority value on each substitution rule?
>
>
> You can get pretty far with a simple priority value. Often times greedy
> solutions don't produce optimal results. My solution to this is to follow a
> greedy solution but then allow backtracking, allowing the user to ask for
> extra results.
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/CAJ8oX-GKTNEkdakDCJf-eSW70cRC4xPD_dnyCQjAuTrLtgEFKw%40mail.gmail.com.

Matthew Rocklin

unread,
Apr 25, 2014, 12:04:51 PM4/25/14
to sy...@googlegroups.com
Yeah, it's a pretty swell algorithm.  Kudos to Chris Smith.

It's also a great usecase of the strategies / decision tree stuff.  I'm happy to go over this with anyone if they're interested.


F. B.

unread,
Apr 25, 2014, 3:28:49 PM4/25/14
to sy...@googlegroups.com


On Friday, April 25, 2014 5:02:19 PM UTC+2, Matthew wrote:

I used the strategies module to pull out the tree search stuff.  See either sympy.strategies or github.com/logpy/strategies/
It is used in fu simp here



I was thinking of a possible scenario to continue the development of CSymPy. Suppose that a mathematical algorithm is implemented by a rule file, which either defines rules for LogPy or Strategies. SymPy could be added parsing capabilities to that "rule" file, i.e. SymPy works together with LogPy and/or Strategies to perform the defined algorithm. On the meantime, there could be an implementation of LogPy and/or Strategies through C++ template metaprogramming, which implies that the rule file is parsed at compile time, and efficient code is generated still at compile time in the CSymPy object files/binary. Template metaprogramming is Turing complete, so it should be possible to implement LogPy and/or Strategies through it.

There could be a slow and unoptimized version through SymPy + LogPy/Strategies for the development, and then the fast version of the algorithm when CSymPy is compiled.

Do you think that this one is a plausible way?
Reply all
Reply to author
Forward
0 new messages