Dataclasses for symbolic computation

113 views
Skip to first unread message

S.Y. Lee

unread,
Dec 13, 2022, 3:22:25 AM12/13/22
to sympy
After Python 3.6 dataclasses were introduced.

Usually, the basic objects of symbolic computation (term) is based on syntactic equality between the objects, which has isomorphism or homomorphism (if there is eager evaluation) from term algebra, which dataclass can easily implement.

For example, dataclass(frozen=True) can easily give immutable objects, built in syntactic equality, and also the substitution operation between terms to terms, and also default printing capability, such that it is less black boxed version of python object.

It is easy to implement a symbolic computation library with dataclasses, without a need to implement everything from scratch, so it can be a consideration for future in SymPy library to refactor or if we can encourage people to implement new symbolic objects in this paradigm, than sticking to the old architecture.

There are alternatives like typing.NamedTuple or typing.TypedDict. And there is controversy between which has less overhead. However, I assume that they are feasible solution if you need untyped calculus. (There are sometimes cases where untyped calculus is more natural model for the problem, like if we have to dynamically create huge numbers of classes with different names)

Aaron Meurer

unread,
Dec 13, 2022, 5:21:59 PM12/13/22
to sy...@googlegroups.com
Data classes are a nice syntactic convenience, and it's useful to have
them if you are using something like that so you don't have to rewrite
all the boilerplate. But I've never really found the "objects
representing a tree of expressions" as being the hard part of symbolic
computation. A competent Python programmer could easily rewrite all
the logic of dataclass (or the most basic logic of Basic) from scratch
in an hour.

The only real challenges are 1) making things as syntactically nice as
possible. I think SymPy does pretty well here, although there are
places where things could be improved. And 2) the overall performance.
From what I've heard, dataclasses aren't particularly performant, so I
don't think they specifically are a good choice in that regard.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/d9caad27-fb02-434a-b260-5ab80df91f9an%40googlegroups.com.

David Bailey

unread,
Dec 14, 2022, 12:23:02 PM12/14/22
to sy...@googlegroups.com
On 13/12/2022 22:21, Aaron Meurer wrote:
> Data classes are a nice syntactic convenience, and it's useful to have
> them if you are using something like that so you don't have to rewrite
> all the boilerplate. But I've never really found the "objects
> representing a tree of expressions" as being the hard part of symbolic
> computation. A competent Python programmer could easily rewrite all
> the logic of dataclass (or the most basic logic of Basic) from scratch
> in an hour.
>
> The only real challenges are 1) making things as syntactically nice as
> possible. I think SymPy does pretty well here, although there are
> places where things could be improved. And 2) the overall performance.
> >From what I've heard, dataclasses aren't particularly performant, so I
> don't think they specifically are a good choice in that regard.
>
I find it interesting as an outsider, to read discussions from insiders
as to the ideal structure of SymPy and (perhaps) its long term development.

My suspicion is that object oriented programming is rarely as useful as
it is claimed. I wonder if that is what you are implying here.

Put another way, what would be your preferred computer language if you
were SymPy starting again? There would obviously need to be an interface
to Python, but would you write the rest in C++?

C++ does contain object oriented constructs - indeed its objects can
inherit from more than one parent - but of course it can be used without
accessing that complexity.

David

Sam Brockie

unread,
Dec 19, 2022, 8:58:46 AM12/19/22
to sympy
> Put another way, what would be your preferred computer language if you
were SymPy starting again? There would obviously need to be an interface
to Python, but would you write the rest in C++?

There's the SymEngine project (https://github.com/symengine/symengine), which already does this :) SymPy itself, however, is, and should remain, a pure-Python computer algebra system.

I agree with what Aaron said above, SymPy does well in making things syntactically nice. From a pure performance perspective, SymPy and its trees of OOP objects is almost certainly the wrong data structure for performant computer algebra. Oscar Benjamin and I (with Aaron and Jason Moore also present for parts) have discussed the merits of hashcons (https://en.wikipedia.org/wiki/Hash_consing), which gives really neat inherent caching, smaller memory footprint, efficient expression traversal and topological sorting, and more. If there were to be a concerted effort to refactor the internals of SymPy in any way, my vote would strongly go towards using a different data structure for the implementation.

Sam

Alan Bromborsky

unread,
Dec 19, 2022, 9:20:50 AM12/19/22
to sy...@googlegroups.com

Many of the sympy algorithms could be parallelized using parallel python.  That could lead to major speed improvements.  The computer I am writing this on has a Ryzen 9 5900x cpu with 12 cores and 24 threads.  I think that 4 cores and 8 threads is very common these days.

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

S.Y. Lee

unread,
Dec 21, 2022, 11:52:28 PM12/21/22
to sympy
Hashcons looks interesting, but as simple as making every object as singletons 
I would have a question how this would be implemented effectively on top of python, for example, how to keep the hash table not grow infinitely with dead references.

Oscar Benjamin

unread,
Dec 22, 2022, 6:37:07 AM12/22/22
to sy...@googlegroups.com
On Thu, 22 Dec 2022 at 04:52, S.Y. Lee <syle...@gmail.com> wrote:
>
> Hashcons looks interesting, but as simple as making every object as singletons
> I would have a question how this would be implemented effectively on top of python, for example, how to keep the hash table not grow infinitely with dead references.

You would use weakref:
https://docs.python.org/3/library/weakref.html#weakref.WeakValueDictionary

I will write some blog posts about this in the new year but here's a
simple demo:

import weakref

_all_expressions = weakref.WeakValueDictionary()

class Expr:
def __new__(cls, *args):
expr = _all_expressions.get(args, None)
if expr is not None:
return expr
expr = super().__new__(cls)
_all_expressions[args] = expr
return expr
def __init__(self, *args):
self.args = tuple(args)
def __repr__(self):
return f'{self.args[0]}{self.args[1:]}'

class Head:
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def __call__(*args):
return Expr(*args)

Add = Head('Add')
Mul = Head('Mul')

print('empty:', dict(_all_expressions))
# empty: {}

expr = Add(1, Mul(2, 3))

print('not empty:', dict(_all_expressions))
# not empty: {(Mul, 2, 3): Mul(2, 3), (Add, 1, Mul(2, 3)): Add(1, Mul(2, 3))}

del expr # clears the weakref dict

print('empty again:', dict(_all_expressions))
# empty again: {}

--
Oscar

S.Y. Lee

unread,
Jan 19, 2023, 3:16:59 AM1/19/23
to sympy
> Put another way, what would be your preferred computer language if you
were SymPy starting again? There would obviously need to be an interface
to Python, but would you write the rest in C++?

I've studied "Typescript" briefly, because lots of symbolic math are practical due to frontend interfaces,
and hope to see it could be better directly implemented on frontend.

But I'd give a warning that Javascript/Typescript is very poor for symbolic computation
just because of its shallow design of equality (===).

I'm not sure about if there are weird struggles like that in SymEngine developers, but there could be possibly is.

On the other hand, I think that functional languages like OCamL can be better for symbolic math,
like seeing how easy it is to implement basic stuff of symbolic math
and functional languages often have more mathematical rigor of its operational semantics
so things that looks correct in math and logic are more native for them, and easy to make it work.

https://stackoverflow.com/questions/52737089/ocaml-function-to-perform-differentiation

I think that the only reason people are gathering around this "old and messy" sympy than those functional languages is that
a lot of people studying physical science, or AI stuff are coming to python to use mature libraries.
And I believe that's the one and the only reason,
and nonetheless other mainstream libraries in Python are messy with objects too,

I would ideally want to see symbolic computation implemented in compositions of
terms, patterns, combinators, polynomials, logic, grammar sort of thing
Reply all
Reply to author
Forward
0 new messages