Hi Daniel,
On Tue, Dec 29, 2020, at 12:29 PM, Daniel R. Shapero wrote:
>
> Hi all -- I was curious about how expressions are represented
> internally in symengine so I started browsing the source code. It looks
> like every expression is always canonicalized before it's stored in,
> say an Add or a Mul object.
Yes, that is correct. As an example, x+x would not be stored like that in Add, but rather first canonicalized into 2*x and stored in Mul.
> So for example (a + b) * (c + d) wouldn't
> be represented that way internally, it would first be expanded to a * c
> + ...etc.
In this particular example, (a + b) * (c + d) would be stored like that in Mul. But if you call `expand()` on it, then it will get expanded and stored in Add
> Is that impression correct or is there something I've missed?
> (Sorry if this is answered directly somewhere, I'm going off the code
> comments and the API docs.)
>
> I guessed naively beforehand that expressions get stored in more or
> less the same form as when they're created, and then a whole bunch of
> rewrite rules might get applied later. So (assuming I read the comments
> correctly) it surprised me to see that they get canonicalized eagerly.
> But I imagine that having a uniform internal representation must be
> hugely advantageous for other algorithms and general developer sanity
> preservation.
It is mostly like that. The only simplifications that always get done before hand are things like x+x->2*x, or x*x->x^2, or x-x->0. Anything more complicated get stored as is. This has the performance advantage that the internal representation of Add can be a dictionary and you can assume distinct terms.
Ondrej