internal representation

27 views
Skip to first unread message

Daniel R. Shapero

unread,
Dec 29, 2020, 2:32:59 PM12/29/20
to symengine

Hi all -- I was curious about how expressions are represented internally in symengine so I started browsing the source code. It looks like every expression is always canonicalized before it's stored in, say an Add or a Mul object. So for example (a + b) * (c + d) wouldn't be represented that way internally, it would first be expanded to a * c + ...etc. Is that impression correct or is there something I've missed? (Sorry if this is answered directly somewhere, I'm going off the code comments and the API docs.)

I guessed naively beforehand that expressions get stored in more or less the same form as when they're created, and then a whole bunch of rewrite rules might get applied later. So (assuming I read the comments correctly) it surprised me to see that they get canonicalized eagerly. But I imagine that having a uniform internal representation must be hugely advantageous for other algorithms and general developer sanity preservation.

Thanks and hope everyone had a good holiday,
Daniel

Ondřej Čertík

unread,
Dec 29, 2020, 5:31:34 PM12/29/20
to syme...@googlegroups.com
Hi Daniel,

On Tue, Dec 29, 2020, at 12:29 PM, Daniel R. Shapero wrote:
>
> Hi all -- I was curious about how expressions are represented
> internally in symengine so I started browsing the source code. It looks
> like every expression is always canonicalized before it's stored in,
> say an Add or a Mul object.

Yes, that is correct. As an example, x+x would not be stored like that in Add, but rather first canonicalized into 2*x and stored in Mul.

> So for example (a + b) * (c + d) wouldn't
> be represented that way internally, it would first be expanded to a * c
> + ...etc.

In this particular example, (a + b) * (c + d) would be stored like that in Mul. But if you call `expand()` on it, then it will get expanded and stored in Add

> Is that impression correct or is there something I've missed?
> (Sorry if this is answered directly somewhere, I'm going off the code
> comments and the API docs.)
>
> I guessed naively beforehand that expressions get stored in more or
> less the same form as when they're created, and then a whole bunch of
> rewrite rules might get applied later. So (assuming I read the comments
> correctly) it surprised me to see that they get canonicalized eagerly.
> But I imagine that having a uniform internal representation must be
> hugely advantageous for other algorithms and general developer sanity
> preservation.

It is mostly like that. The only simplifications that always get done before hand are things like x+x->2*x, or x*x->x^2, or x-x->0. Anything more complicated get stored as is. This has the performance advantage that the internal representation of Add can be a dictionary and you can assume distinct terms.

Ondrej
Reply all
Reply to author
Forward
0 new messages