51 views

Skip to first unread message

Apr 18, 2022, 11:45:39 PM4/18/22

to sympy, syme...@googlegroups.com

Hi,

I am CCing the symengine list also. Let me say a few words about the design and goals of SymEngine. It's a result of many years of experience with SymPy and implementing high performance code in general. The goal is the fastest possible symbolic manipulation. So we do not want to introduce slowdowns just to get a different API, rather we should figure out how to create an API that works for everybody, while keeping the speed.

The design is, as Isuru wrote below, quite close to optimal: SymEngine does not do any unnecessary work. The C++ classes just "pass through" the data, which must be prepared ahead of time. In order to get good speed, it needs to store the data in data structures like a hash table, which restricts what it can represent, in return for good speed. It seems that covers 90% of use cases. If you need to represent things differently than what SymEngine can do (such as `x+x` instead of `2*x`), you can always use SymPy, or we can implement some slower representations, but I think that should not be the default.

One idea for improvement is to represent all SymEngine classes in some kind of a description language, similar to what we use for LFortran here: https://gitlab.com/lfortran/lfortran/-/blob/8761fcee2bbc4cb924bf65f5d576541e58bb8b08/src/libasr/ASR.asdl, and generate all the C++ classes automatically from it. This has many advantages (much less code to maintain, consistent code, easy to add a new symbolic function/object, etc.) and almost no downsides. One can then also experiment with changing how things are represented.

Regarding the internal representation itself --- if anyone knows a faster design, please let me know!

* One idea is not to use reference counted pointers at all, and just represent everything by value. That means if you take a term from an Add, it will get copied. When Add goes out of scope, it deallocates all its terms. Whether this approach is overall faster is unclear, but it might be quite a bit faster if you don't need to reuse subexpressions too much.

* Another orthogonal idea is to represent all symbols like "x" in a separate structure (symbol table) and only reference them from expressions like x^3 + 2x^2+sin(x)...; The issue becomes again with memory management, but there are probably a few ways forward.

* SymEngine was design to operate fast on general expressions. One can of course get faster speed if you know that you are dealing with a polynomial or other more specific structure, but this is an orthogonal idea. It seems just getting the general case working really fast is worth it.

If anyone is interested in discussing this more, I am happy to have a video meeting where we can discuss more. Just let me know!

Ondrej

On Fri, Apr 15, 2022, at 7:53 PM, Isuru Fernando wrote:

> Hi Oscar,

>

> Here's a few things that are different in SymEngine than SymPy.

>

>> As an example it would be trivial in SymPy to make substitutions

> involving large expressions and many replacements much faster with a

> small redesign. The problem though is backwards compatibility: each

> expression class can implement its own _eval_subs method so for

> backwards compatibility the subs implementation must recurse in the

> same slow way once throughout the whole tree for each replacement that

> is to be made. (There are ways to work around this but the design

> makes it harder than it should be.)

>

> In SymEngine, classes do not have such a method and therefore cannot

> extend for eg: subs function.

> However the subs function itself (SubsVisitor class technically) can be

> extended.

>

>> This is a small example of a more general problem. Using classes means

> you have to use the interfaces of the class which means that efficient

> algorithms needing something not provided by that public interface are

> impossible. Even worse both SymPy and SymEngine allow the

> *constructors* of those classes to do nontrivial work without

> providing any good way to override that. This means that you can't

> even represent a symbolic expression without allowing arbitrary code

> execution: it's impossible to optimize higher-level algorithms if you

> have so little control over execution from the outside.

>

> No, SymEngine does not do anything in the constructor of the C++ class

> itself. (For eg: class "Add"). We have different functions (For eg:

> function "add")

> that do complicated functionality to create the C++ class. A user is

> allowed

> to create the C++ class directly and we assume that the data structure

> that the

> user passed is internally consistent in Release mode and we check user

> input in Debug mode. This allows the user to do low level optimizations

> when they know that going through the complicated function is

> unnecessary.

>

>> There are ways that this can be improved in both SymPy and SymEngine

> but I also think that for many important operations the basic design

> used by these is limiting in a big-O sense: the design constrains what

> algorithms can be used. SymEngine is faster than SymPy in a brute

> force sense by using C++ rather than Python but it would be possible

> to make something both faster and more flexible if a different design

> was used at a basic level.

>

> I disagree about this in SymEngine. For eg: SymPy's extensibility

> of classes prevents optimizations that SymEngine can do because symengine

> doesn't have an equivalence of `_eval_subs` or similar.

> Some of the optimizations in SymEngine like Aaron said leads to better

> big-O complexities. i.e. the ratio between the two modules is not always a

> constant as input size increases.

>

> Isuru

>

>

>

> On Thu, Apr 14, 2022 at 1:26 PM Oscar Benjamin

> <oscar.j....@gmail.com> wrote:

>> On Thu, 14 Apr 2022 at 18:16, Tirthankar Mazumder

>> <greenw...@gmail.com> wrote:

>> >

>> > On Thursday, April 14, 2022 at 10:04:28 PM UTC+5:30 Oscar wrote:

>> >>

>> >> On Tue, 12 Apr 2022 at 19:26, Matthias Köppe <matthia...@gmail.com> wrote:

>> >>

>> >> > - status and plans regarding SymEngine

>> >>

>> >> I don't know about the status of SymEngine. I can't say that I can see

>> >> any significant work happening on the SymPy side to integrate

>> >> SymEngine any further with SymPy. My personal view is that for faster

>> >> symbolics a different approach is needed in general but SymEngine

>> >> seems to have the same design flaws as SymPy itself in that respect.

>> >

>> > Hi, as someone who is relatively new to the SymPy + SymEngine project, and wants to work on SymEngine as a part of their GSoC, I would appreciate it if you could elaborate a bit on what design flaws you are referring to.

>>

>> The basic design of the way that symbolic expressions are represented

>> in SymPy and SymEngine is through the use of classes to represent

>> different types of expression e.g. there is a class Pow and an

>> expression like x**y is represented by an instance of that class

>> created as Pow(x, y).

>>

>> Using classes like this makes it hard to extend the system with new

>> types of symbolic expression e.g. with SymEngine you can't make a new

>> kind of symbolic expression without writing C++ code so if you are

>> using SymEngine from Python then it isn't extensible. With SymPy you

>> could at least make your own symbolic expression class in Python but

>> then it still doesn't work so well because there are so many places in

>> the codebase where particular types of expression are special-cased

>> meaning that any new symbolic expression type cannot be handled by

>> functions like solve, integrate etc.

>>

>> The other problem with using classes is that it means that the basic

>> data structure that is used to represent expressions is fixed from the

>> outside. In SymPy and SymEngine that data structure is a tree and all

>> algorithms recurse through that tree. More efficient data structures

>> for some given operation can't be used because the implementation of

>> each symbolic expression type requires that you always use instances

>> of the class meaning that you always have to have a tree and always

>> have to recurse in the same way.

>>

>> As an example it would be trivial in SymPy to make substitutions

>> involving large expressions and many replacements much faster with a

>> small redesign. The problem though is backwards compatibility: each

>> expression class can implement its own _eval_subs method so for

>> backwards compatibility the subs implementation must recurse in the

>> same slow way once throughout the whole tree for each replacement that

>> is to be made. (There are ways to work around this but the design

>> makes it harder than it should be.)

>>

>> This is a small example of a more general problem. Using classes means

>> you have to use the interfaces of the class which means that efficient

>> algorithms needing something not provided by that public interface are

>> impossible. Even worse both SymPy and SymEngine allow the

>> *constructors* of those classes to do nontrivial work without

>> providing any good way to override that. This means that you can't

>> even represent a symbolic expression without allowing arbitrary code

>> execution: it's impossible to optimise higher-level algorithms if you

>> have so little control over execution from the outside.

>>

>> It's very hard later to change the public interfaces of the expression

>> classes because as soon as any downstream code has subclassed your

>> classes you are bound by backwards compatibility. (Subclassing across

>> the boundaries of different software projects leads to strong

>> coupling.)

>>

>> There are ways that this can be improved in both SymPy and SymEngine

>> but I also think that for many important operations the basic design

>> used by these is limiting in a big-O sense: the design constrains what

>> algorithms can be used. SymEngine is faster than SymPy in a brute

>> force sense by using C++ rather than Python but it would be possible

>> to make something both faster and more flexible if a different design

>> was used at a basic level.

>>

>> --

>> Oscar

>>

>> --

>> You received this message because you are subscribed to the Google Groups "sympy" group.

>> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com <mailto:sympy%2Bunsu...@googlegroups.com>.

>> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxTXJRY%3DtJ%3DPCtkNBgmd_nWmWd-wm9ax_w0zcPtXe_gynA%40mail.gmail.com.

>

> --

> You received this message because you are subscribed to the Google

> Groups "sympy" group.

> To unsubscribe from this group and stop receiving emails from it, send

> an email to sympy+un...@googlegroups.com.

> To view this discussion on the web visit

> https://groups.google.com/d/msgid/sympy/CA%2B01voM%2Byu65B%2Brtv5HQEVoxkvF95Gf27L8%3DH7N7ZbJQWUbcUg%40mail.gmail.com

> <https://groups.google.com/d/msgid/sympy/CA%2B01voM%2Byu65B%2Brtv5HQEVoxkvF95Gf27L8%3DH7N7ZbJQWUbcUg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

I am CCing the symengine list also. Let me say a few words about the design and goals of SymEngine. It's a result of many years of experience with SymPy and implementing high performance code in general. The goal is the fastest possible symbolic manipulation. So we do not want to introduce slowdowns just to get a different API, rather we should figure out how to create an API that works for everybody, while keeping the speed.

The design is, as Isuru wrote below, quite close to optimal: SymEngine does not do any unnecessary work. The C++ classes just "pass through" the data, which must be prepared ahead of time. In order to get good speed, it needs to store the data in data structures like a hash table, which restricts what it can represent, in return for good speed. It seems that covers 90% of use cases. If you need to represent things differently than what SymEngine can do (such as `x+x` instead of `2*x`), you can always use SymPy, or we can implement some slower representations, but I think that should not be the default.

One idea for improvement is to represent all SymEngine classes in some kind of a description language, similar to what we use for LFortran here: https://gitlab.com/lfortran/lfortran/-/blob/8761fcee2bbc4cb924bf65f5d576541e58bb8b08/src/libasr/ASR.asdl, and generate all the C++ classes automatically from it. This has many advantages (much less code to maintain, consistent code, easy to add a new symbolic function/object, etc.) and almost no downsides. One can then also experiment with changing how things are represented.

Regarding the internal representation itself --- if anyone knows a faster design, please let me know!

* One idea is not to use reference counted pointers at all, and just represent everything by value. That means if you take a term from an Add, it will get copied. When Add goes out of scope, it deallocates all its terms. Whether this approach is overall faster is unclear, but it might be quite a bit faster if you don't need to reuse subexpressions too much.

* Another orthogonal idea is to represent all symbols like "x" in a separate structure (symbol table) and only reference them from expressions like x^3 + 2x^2+sin(x)...; The issue becomes again with memory management, but there are probably a few ways forward.

* SymEngine was design to operate fast on general expressions. One can of course get faster speed if you know that you are dealing with a polynomial or other more specific structure, but this is an orthogonal idea. It seems just getting the general case working really fast is worth it.

If anyone is interested in discussing this more, I am happy to have a video meeting where we can discuss more. Just let me know!

Ondrej

On Fri, Apr 15, 2022, at 7:53 PM, Isuru Fernando wrote:

> Hi Oscar,

>

> Here's a few things that are different in SymEngine than SymPy.

>

>> As an example it would be trivial in SymPy to make substitutions

> involving large expressions and many replacements much faster with a

> small redesign. The problem though is backwards compatibility: each

> expression class can implement its own _eval_subs method so for

> backwards compatibility the subs implementation must recurse in the

> same slow way once throughout the whole tree for each replacement that

> is to be made. (There are ways to work around this but the design

> makes it harder than it should be.)

>

> In SymEngine, classes do not have such a method and therefore cannot

> extend for eg: subs function.

> However the subs function itself (SubsVisitor class technically) can be

> extended.

>

>> This is a small example of a more general problem. Using classes means

> you have to use the interfaces of the class which means that efficient

> algorithms needing something not provided by that public interface are

> impossible. Even worse both SymPy and SymEngine allow the

> *constructors* of those classes to do nontrivial work without

> providing any good way to override that. This means that you can't

> even represent a symbolic expression without allowing arbitrary code

> execution: it's impossible to optimize higher-level algorithms if you

> have so little control over execution from the outside.

>

> No, SymEngine does not do anything in the constructor of the C++ class

> itself. (For eg: class "Add"). We have different functions (For eg:

> function "add")

> that do complicated functionality to create the C++ class. A user is

> allowed

> to create the C++ class directly and we assume that the data structure

> that the

> user passed is internally consistent in Release mode and we check user

> input in Debug mode. This allows the user to do low level optimizations

> when they know that going through the complicated function is

> unnecessary.

>

>> There are ways that this can be improved in both SymPy and SymEngine

> but I also think that for many important operations the basic design

> used by these is limiting in a big-O sense: the design constrains what

> algorithms can be used. SymEngine is faster than SymPy in a brute

> force sense by using C++ rather than Python but it would be possible

> to make something both faster and more flexible if a different design

> was used at a basic level.

>

> I disagree about this in SymEngine. For eg: SymPy's extensibility

> of classes prevents optimizations that SymEngine can do because symengine

> doesn't have an equivalence of `_eval_subs` or similar.

> Some of the optimizations in SymEngine like Aaron said leads to better

> big-O complexities. i.e. the ratio between the two modules is not always a

> constant as input size increases.

>

> Isuru

>

>

>

> On Thu, Apr 14, 2022 at 1:26 PM Oscar Benjamin

> <oscar.j....@gmail.com> wrote:

>> On Thu, 14 Apr 2022 at 18:16, Tirthankar Mazumder

>> <greenw...@gmail.com> wrote:

>> >

>> > On Thursday, April 14, 2022 at 10:04:28 PM UTC+5:30 Oscar wrote:

>> >>

>> >> On Tue, 12 Apr 2022 at 19:26, Matthias Köppe <matthia...@gmail.com> wrote:

>> >>

>> >> > - status and plans regarding SymEngine

>> >>

>> >> I don't know about the status of SymEngine. I can't say that I can see

>> >> any significant work happening on the SymPy side to integrate

>> >> SymEngine any further with SymPy. My personal view is that for faster

>> >> symbolics a different approach is needed in general but SymEngine

>> >> seems to have the same design flaws as SymPy itself in that respect.

>> >

>> > Hi, as someone who is relatively new to the SymPy + SymEngine project, and wants to work on SymEngine as a part of their GSoC, I would appreciate it if you could elaborate a bit on what design flaws you are referring to.

>>

>> The basic design of the way that symbolic expressions are represented

>> in SymPy and SymEngine is through the use of classes to represent

>> different types of expression e.g. there is a class Pow and an

>> expression like x**y is represented by an instance of that class

>> created as Pow(x, y).

>>

>> Using classes like this makes it hard to extend the system with new

>> types of symbolic expression e.g. with SymEngine you can't make a new

>> kind of symbolic expression without writing C++ code so if you are

>> using SymEngine from Python then it isn't extensible. With SymPy you

>> could at least make your own symbolic expression class in Python but

>> then it still doesn't work so well because there are so many places in

>> the codebase where particular types of expression are special-cased

>> meaning that any new symbolic expression type cannot be handled by

>> functions like solve, integrate etc.

>>

>> The other problem with using classes is that it means that the basic

>> data structure that is used to represent expressions is fixed from the

>> outside. In SymPy and SymEngine that data structure is a tree and all

>> algorithms recurse through that tree. More efficient data structures

>> for some given operation can't be used because the implementation of

>> each symbolic expression type requires that you always use instances

>> of the class meaning that you always have to have a tree and always

>> have to recurse in the same way.

>>

>> As an example it would be trivial in SymPy to make substitutions

>> involving large expressions and many replacements much faster with a

>> small redesign. The problem though is backwards compatibility: each

>> expression class can implement its own _eval_subs method so for

>> backwards compatibility the subs implementation must recurse in the

>> same slow way once throughout the whole tree for each replacement that

>> is to be made. (There are ways to work around this but the design

>> makes it harder than it should be.)

>>

>> This is a small example of a more general problem. Using classes means

>> you have to use the interfaces of the class which means that efficient

>> algorithms needing something not provided by that public interface are

>> impossible. Even worse both SymPy and SymEngine allow the

>> *constructors* of those classes to do nontrivial work without

>> providing any good way to override that. This means that you can't

>> even represent a symbolic expression without allowing arbitrary code

>> execution: it's impossible to optimise higher-level algorithms if you

>> have so little control over execution from the outside.

>>

>> It's very hard later to change the public interfaces of the expression

>> classes because as soon as any downstream code has subclassed your

>> classes you are bound by backwards compatibility. (Subclassing across

>> the boundaries of different software projects leads to strong

>> coupling.)

>>

>> There are ways that this can be improved in both SymPy and SymEngine

>> but I also think that for many important operations the basic design

>> used by these is limiting in a big-O sense: the design constrains what

>> algorithms can be used. SymEngine is faster than SymPy in a brute

>> force sense by using C++ rather than Python but it would be possible

>> to make something both faster and more flexible if a different design

>> was used at a basic level.

>>

>> --

>> Oscar

>>

>> --

>> You received this message because you are subscribed to the Google Groups "sympy" group.

>> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com <mailto:sympy%2Bunsu...@googlegroups.com>.

>> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxTXJRY%3DtJ%3DPCtkNBgmd_nWmWd-wm9ax_w0zcPtXe_gynA%40mail.gmail.com.

>

> --

> You received this message because you are subscribed to the Google

> Groups "sympy" group.

> To unsubscribe from this group and stop receiving emails from it, send

> an email to sympy+un...@googlegroups.com.

> To view this discussion on the web visit

> https://groups.google.com/d/msgid/sympy/CA%2B01voM%2Byu65B%2Brtv5HQEVoxkvF95Gf27L8%3DH7N7ZbJQWUbcUg%40mail.gmail.com

> <https://groups.google.com/d/msgid/sympy/CA%2B01voM%2Byu65B%2Brtv5HQEVoxkvF95Gf27L8%3DH7N7ZbJQWUbcUg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu