Identifying repeated subexpressions in systems of equations

Luke

unread,

Jun 13, 2008, 2:45:02 PM6/13/08

to sy...@googlegroups.com

I'm working on some code that symbolically generates equations of
motion for mechanical systems. I would like the equations to be
computationally efficient in that they don't repeatedly calculate
quantities that have previously been calculated -- i.e. if
cos(theta)*sin(alpha)*L is a subexpression that occurs in multiple
places in an equation, it would be identified and computed once and
stored in an intermediate variable that then replaces the
subexpression everywhere it occurs in the equation. Does anybody know
a good algorithm for searching equations and building up a list of
subexpressions? There could be many levels of subexpressions -- in
the example above, cos(theta) and sin(theta) themselves would like be
used in other subexpressions, so they could be considered
subexpressions themselves.

I have used a symbolic manipulator that has this feature but it is a
small closed source project and I don't know how they did it. It is
very valuable though -- the difference in the file size for the right
hand sides of the differential equations for the system we are
studying is several orders of magnitude -- this makes integration
times *much* faster when this common subexpression replacement method
is used.

Any ideas or references on this subject?

Thanks,
~Luke

Ondrej Certik

unread,

Jun 14, 2008, 2:10:35 PM6/14/08

to sy...@googlegroups.com

Hi Luke!

Thanks for your interest. I am not sure if I understand right, but would
something like this solve the problem?

In [1]: var("theta")
Out[1]: θ

In [2]: e = k/sqrt(sin(theta) * cos(theta))

In [3]: e
Out[3]:
k
─────────────────────
⎽⎽⎽⎽⎽⎽⎽⎽ ⎽⎽⎽⎽⎽⎽⎽⎽
╲╱ cos(θ) *╲╱ sin(θ)

In [4]: e.subs({cos(theta): y, sin(theta): z})
Out[4]:
k
───────────
⎽⎽⎽ ⎽⎽⎽
╲╱ y *╲╱ z

Or are you looking for an algorithm to automatically collect common
subexpressions? For that, at least for me it'd help if you could be
more specific, i.e. give us some particular examples of functionality
that you'd like sympy to do but it can't. We'll then think how to
implement that.

Thanks,
Ondrej

Luke

unread,

Jun 15, 2008, 3:00:54 PM6/15/08

to sympy

Ondrej,
I know that Sympy has the capability to do symbolic replacement like
you described. I guess what I'm looking for is more an algorithm to
help identify and automatically collect common subexpressions so that
repeated quantities are only calculated once. For each repeated
subexpression in an equation, an intermediate variable would be
introduced and assigned the value of the subexpression, and then all
occurrences of the repeated subexpression in the equation would be
replaced with the intermediate variable. By generating symbolic
equations this way, when it comes time to actually use them with
numerical values (numerical integration of equations of motion, for
example), one avoids the repetitive computation of the same
quantity.

In regards to providing an example -- writing an algorithm to automate
the example you did by hand would be exactly what I am looking to do.
The equations I am dealing with can be several thousand lines long at
times, so things get extremely messy. However, within these
equations, many of the same quantities are used repeatedly, just like
in your example -- sin(theta) and cos(theta).

There is a program I use called Autolev, written by some people from
Stanford, and it does this introduction of intermediate variables very
effectively. Unfortunately, Autolev isn't open source. I am very
interested in writing an open source variant of Autolev, but this part
of the program is something that I know little about and am not sure
yet how to approach.

Thanks for the response and if you have any other ideas or references,
I'd love to hear them! I'm guessing I'm going to need to study up on
searching, sorting, and regular expressions.

Thanks,
~Luke

Ondrej Certik

unread,

Jun 15, 2008, 5:11:59 PM6/15/08

to sy...@googlegroups.com

Hi Luke,

On Sun, Jun 15, 2008 at 9:00 PM, Luke <hazel...@gmail.com> wrote:
>
> Ondrej,
> I know that Sympy has the capability to do symbolic replacement like
> you described. I guess what I'm looking for is more an algorithm to
> help identify and automatically collect common subexpressions so that
> repeated quantities are only calculated once. For each repeated
> subexpression in an equation, an intermediate variable would be
> introduced and assigned the value of the subexpression, and then all
> occurrences of the repeated subexpression in the equation would be
> replaced with the intermediate variable. By generating symbolic

Does it also go in several layers, e.g

sin(x) -> A
sqrt(A**2) -> B

etc.?

> equations this way, when it comes time to actually use them with
> numerical values (numerical integration of equations of motion, for
> example), one avoids the repetitive computation of the same
> quantity.
>
> In regards to providing an example -- writing an algorithm to automate
> the example you did by hand would be exactly what I am looking to do.

How should sympy know that you want to substitute for sin(x) and not
sqrt(sin(x)) ?

> The equations I am dealing with can be several thousand lines long at
> times, so things get extremely messy. However, within these
> equations, many of the same quantities are used repeatedly, just like
> in your example -- sin(theta) and cos(theta).

I don't understand which criteria you use to substitute the
subexpressions --- the most frequent subexpression?

>
> There is a program I use called Autolev, written by some people from
> Stanford, and it does this introduction of intermediate variables very
> effectively. Unfortunately, Autolev isn't open source. I am very
> interested in writing an open source variant of Autolev, but this part
> of the program is something that I know little about and am not sure
> yet how to approach.

Could you post here the input and output of autolev so that we can see
some examples? (if it is legal of course)
If you cannot, at least if you could describe your specific problem
you want to solve, i.e. the exact form
of the equations etc.

>
> Thanks for the response and if you have any other ideas or references,
> I'd love to hear them! I'm guessing I'm going to need to study up on
> searching, sorting, and regular expressions.

I am interested in what you want to do, especially if you want to
solve something in physics.

Your usage could improve sympy tree traversal
functionality/subsitution. SymPy's expression is basically a tree,
each element has .args property, that is basically a list. See
Basic.atoms()'s implementation, as you can see, it's just 9 lines of
code. So you'd just traverse it and compare/substitute stuff. Let us
know how the current interface feel and if you find some ways to
improve it or make it more intuitive, please write us too.

Ondrej

Robert Kern

unread,

Jun 15, 2008, 6:10:45 PM6/15/08

to sy...@googlegroups.com

On Sun, Jun 15, 2008 at 16:11, Ondrej Certik <ond...@certik.cz> wrote:
>
> Hi Luke,
>
> On Sun, Jun 15, 2008 at 9:00 PM, Luke <hazel...@gmail.com> wrote:
>>
>> Ondrej,
>> I know that Sympy has the capability to do symbolic replacement like
>> you described. I guess what I'm looking for is more an algorithm to
>> help identify and automatically collect common subexpressions so that
>> repeated quantities are only calculated once. For each repeated
>> subexpression in an equation, an intermediate variable would be
>> introduced and assigned the value of the subexpression, and then all
>> occurrences of the repeated subexpression in the equation would be
>> replaced with the intermediate variable. By generating symbolic
>
> Does it also go in several layers, e.g
>
> sin(x) -> A
> sqrt(A**2) -> B
>
> etc.?
>
>> equations this way, when it comes time to actually use them with
>> numerical values (numerical integration of equations of motion, for
>> example), one avoids the repetitive computation of the same
>> quantity.
>>
>> In regards to providing an example -- writing an algorithm to automate
>> the example you did by hand would be exactly what I am looking to do.
>
> How should sympy know that you want to substitute for sin(x) and not
> sqrt(sin(x)) ?

Actually, that example is not relevant since there is no repeated
subexpression. The idea is to reduce the amount of unnecessarily
repeated computation (usually numerical) when there are subexpressions
that pop up several times in the complete expression. For example,
Mathematica has an example of its common subexpression elimination:

http://www.wolfram.com/technology/guide/subexpressiondetection.html

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco

Robert Kern

unread,

Jun 15, 2008, 6:17:04 PM6/15/08

to sy...@googlegroups.com

On Sun, Jun 15, 2008 at 16:11, Ondrej Certik <ond...@certik.cz> wrote:

> I don't understand which criteria you use to substitute the
> subexpressions --- the most frequent subexpression?

Here's some help I got a long time ago when I asked about how to do
this with Python ASTs. I never followed up on it since I had no way
(at that time) to regenerate Python code from the modified ASTs.

http://groups.google.com/group/comp.lang.python/browse_thread/thread/de32cd3d1b5c3359/0c93c996bd1ad1e1

Ondrej Certik

unread,

Jun 15, 2008, 6:46:22 PM6/15/08

to sy...@googlegroups.com

Ah, I got it now, thanks!

Right, that'd be a very useful feature. i.e. when one has:

1/(sqrt(sin(x)+1)*sqrt(sin(x)+2))

then one can spare some CPU by first substituting for sin(x),
evaluating and then substituting back.
So the problem could be reformulated: evaluate the symbolic expression
(numerically) using the least amount of (expensive) evaluations.
Ok, that makes sense and it's very interesting.

> On Sun, Jun 15, 2008 at 16:11, Ondrej Certik <ond...@certik.cz> wrote:

>> I don't understand which criteria you use to substitute the
>> subexpressions --- the most frequent subexpression?
>

> Here's some help I got a long time ago when I asked about how to do
> this with Python ASTs. I never followed up on it since I had no way
> (at that time) to regenerate Python code from the modified ASTs.
>
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/de32cd3d1b5c3359/0c93c996bd1ad1e1

I see, you wanted to have sympy in 2003. :) I read the thread and
indeed many people were interested in it
and some even started to code something. Too bad noone has followed up
and created sympy back then ---
we would be much more advanced now.

Anyway, I created an issue for this:

http://code.google.com/p/sympy/issues/detail?id=891

Fredrik, you were investigating some approaches to evaluate
expressions quickly, do you have any thoughts on this?

Ondrej

Robert Kern

unread,

Jun 15, 2008, 8:37:01 PM6/15/08

to sy...@googlegroups.com

On Sun, Jun 15, 2008 at 17:46, Ondrej Certik <ond...@certik.cz> wrote:
>
> On Mon, Jun 16, 2008 at 12:10 AM, Robert Kern <rober...@gmail.com> wrote:

>> Actually, that example is not relevant since there is no repeated
>> subexpression. The idea is to reduce the amount of unnecessarily
>> repeated computation (usually numerical) when there are subexpressions
>> that pop up several times in the complete expression. For example,
>> Mathematica has an example of its common subexpression elimination:
>>
>> http://www.wolfram.com/technology/guide/subexpressiondetection.html
>
> Ah, I got it now, thanks!
>
> Right, that'd be a very useful feature. i.e. when one has:
>
> 1/(sqrt(sin(x)+1)*sqrt(sin(x)+2))
>
> then one can spare some CPU by first substituting for sin(x),
> evaluating and then substituting back.
> So the problem could be reformulated: evaluate the symbolic expression
> (numerically) using the least amount of (expensive) evaluations.

I probably wouldn't go as far as that. If you are evaluating the
expression over numpy arrays, it would also be good to trade off flops
with the memory taken up by temporaries. Also, for the problem I had
in 2003, I really wanted to generate code with the common
subexpressions broken up primarily for readability rather than flops.

Here's a pseudocode sketch of an algorithm that might work:

seen_subexp = set()
to_eliminate = []
for subtree in postorder_traversal(expr):
if subtree in seen_subexp and subtree not in to_eliminate:
to_eliminate.append(subtree)
seen_subexp.add(subtree)
replacements = []
for i, subtree in enumerate(to_eliminate):
name = 'x%s' % i
sym = Symbol(name)
replacements.append((sym, subtree))
expr = expr.subs(subtree, sym)
# WARNING: modifying iterated list in-place! I think it's fine,
# but there might be clearer alternatives.
for j in range(i+1, len(to_eliminate)):
to_eliminate[j] = to_eliminate.subs(subtree, sym)

The list of replacements (in order!) and the final expression
constitute the expressions that need to be calculated. This is, of
course, untested.

Friedrich Hagedorn

unread,

Jun 16, 2008, 2:25:00 AM6/16/08

to sy...@googlegroups.com

On Sun, Jun 15, 2008 at 12:00:54PM -0700, Luke wrote:
>
> I guess what I'm looking for is more an algorithm to
> help identify and automatically collect common subexpressions so that
> repeated quantities are only calculated once. For each repeated
> subexpression in an equation, an intermediate variable would be
> introduced and assigned the value of the subexpression, and then all
> occurrences of the repeated subexpression in the equation would be
> replaced with the intermediate variable. By generating symbolic
> equations this way, when it comes time to actually use them with
> numerical values (numerical integration of equations of motion, for
> example), one avoids the repetitive computation of the same
> quantity.

Only just an idea: Could you use the numerical resuluts in the cache
for that? So you didnt need to extract all the common subexpressions.

By,

Friedrich

Robert Kern

unread,

Jun 16, 2008, 2:28:04 AM6/16/08

to sy...@googlegroups.com

On Sun, Jun 15, 2008 at 17:46, Ondrej Certik <ond...@certik.cz> wrote:

> Anyway, I created an issue for this:
>
> http://code.google.com/p/sympy/issues/detail?id=891

I've attached some nominally working code to the issue. Playing with
it, I see some suboptimal results. First, it's quadratic, but I'm not
clever enough to think of a way around that. Second, sympy's default
representation of expressions isn't well-suited for this algorithm (or
the other way around). For example:

In [1]: from cse import cse

In [2]: cse((x-y)*(z-y) + sqrt((x-y)*(z-y)))
Out[2]:
⎛ ⎽⎽⎽⎽ ⎽⎽⎽⎽⎞
⎝[(x₀, -y), (x₁, x + x₀), (x₂, x₀ + z)], x₁*x₂ + ╲╱ x₁ *╲╱ x₂ ⎠

We see that -y gets picked up as term that should be pulled out
although it only shows up in the context of (x-y). For the typical use
case, subtraction is an atomic operation, and the representation of
(x-y) as Add(Mul(NegativeOne(-1), Symbol('y')), Symbol('x')) gets in
the way.

Although the input expression has both x1 and x2 under the same
sqrt(), it gets broken out before the cse() function gets to look at
it. It would be nice to stuff everything in that term under the same
sqrt() before cse() operates. This kind of thing pops up from time to
time in my experience.

So I think there needs to be a bit of a preprocessing step to optimize
the expression specifically for cse() before it does its magic.

Generalizing this to a list of expressions (e.g. my old use case) is
an exercise left to the reader.

Luke

unread,

Jun 16, 2008, 1:13:10 PM6/16/08

to sympy

So I think it has become clear what I'm looking to do. There are many
levels of subexpressions, and equations may be defined by many
different subexpressions. For those of you familiar with classical
mechanics, you are probably aware of the explosion of the number of
terms that arises as the number of rigid bodies increases. The
following is a snippet of code from an Autolev script that I've used
to formulate the equations of motion for a benchmark Whipple bicycle
model, with the interpreter response being indicated by an '->' and
the introduction of intermediate variables beginning with the letter
'Z':

(98)
%---------------------------------------------------------------------
%
(99) % ANGULAR
RELATIONSHIPS %
(100)
%---------------------------------------------------------------------
%
(101) % FRAME YAW
(102) SIMPROT(N,YF,3,q3)
-> (103) Z1 = COS(q3)
-> (104) Z2 = SIN(q3)
-> (105) N_YF = [Z1, -Z2, 0; Z2, Z1, 0; 0, 0, 1]

(106) % FRAME LEAN
(107) SIMPROT(YF,LF,1,q4)
-> (108) Z3 = COS(q4)
-> (109) Z4 = SIN(q4)
-> (110) YF_LF = [1, 0, 0; 0, Z3, -Z4; 0, Z4, Z3]

(111) % REAR WHEEL ROTATION
(112) SIMPROT(LF,R,2,q5)
-> (113) Z5 = COS(q5)
-> (114) Z6 = SIN(q5)
-> (115) LF_R = [Z5, 0, Z6; 0, 1, 0; -Z6, 0, Z5]

(116) % FRAME PITCH
(117) SIMPROT(LF,B,2,q6)
-> (118) Z7 = COS(q6)
-> (119) Z8 = SIN(q6)
-> (120) LF_B = [Z7, 0, Z8; 0, 1, 0; -Z8, 0, Z7]

(121) % INTERMEDIATE STEER FRAME
(122) SIMPROT(B,SF,2,lambda)
-> (123) Z9 = COS(lambda)
-> (124) Z10 = SIN(lambda)
-> (125) B_SF = [Z9, 0, Z10; 0, 1, 0; -Z10, 0, Z9]

(126) % STEERING ANGLE
(127) SIMPROT(SF,H,3,q7)
-> (128) Z11 = COS(q7)
-> (129) Z12 = SIN(q7)
-> (130) SF_H = [Z11, -Z12, 0; Z12, Z11, 0; 0, 0, 1]

(131) % HANDLEBAR MASS FRAME ROTATION
(132) SIMPROT(H,HF,2,-lambda)
-> (133) H_HF = [Z9, 0, -Z10; 0, 1, 0; Z10, 0, Z9]

(134) % FRONT WHEEL ROTATION
(135) SIMPROT(H,F,2,q8)
-> (136) Z13 = COS(q8)
-> (137) Z14 = SIN(q8)
-> (138) H_F = [Z13, 0, Z14; 0, 1, 0; -Z14, 0, Z13]

(139)
%---------------------------------------------------------------------
%
(140) % POSITION VECTORS
(141)
%---------------------------------------------------------------------
%
(142) % Locate the rear wheel center of mass
(143) P_NO_RO> = q1*N1> + q2*N2> - rrt*N3> - rr*LF3>
-> (144) P_NO_RO> = -rr*LF3> + q1*N1> + q2*N2> - rrt*N3>

(145) % Rear contact points
(146) % We need two points here because one is fixed to the wheel,
the
(147) % other is fixed to the ground. Later when we need to impose
the
(148) % nonholonomic constraint, we form the velocity by treating
RN as
(149) % a point FIXED in C
(150) P_RO_NR> = rr*LF3> + rrt*N3>
-> (151) P_RO_NR> = rr*LF3> + rrt*N3>

(152) P_RO_RN> = P_RO_NR>
-> (153) P_RO_RN> = rr*LF3> + rrt*N3>

(154) % Locate the frame center of mass
(155) P_RO_BO> = d4*B1> + d5*B3>
-> (156) P_RO_BO> = d4*B1> + d5*B3>

(157) % Locate the front wheel center of mass
(158) P_RO_FO> = d1*SF1> + d3*SF3> + d2*H1>
-> (159) P_RO_FO> = d2*H1> + d1*SF1> + d3*SF3>

(160) % Locate the fork center of mass
(161) P_RO_HO> = d1*SF1> + d6*HF1> + d7*HF3>
-> (162) P_RO_HO> = d6*HF1> + d7*HF3> + d1*SF1>

(163) % Front contact points
(164) % We need two points here because one is fixed to the wheel,
the
(165) % other is fixed to the ground. Later when we need to impose
the
(166) % nonholonomic constraint, we form the velocity by treating
FN as
(167) % a point FIXED in F
(168) P_FO_NF> = rf*UNITVEC(N3>-DOT(H2>,N3>)*H2>) + rft*N3>
-> (169) Z15 = Z9*Z11
-> (170) Z16 = Z10*Z11
-> (171) Z17 = Z9*Z12
-> (172) Z18 = Z10*Z12
-> (173) H_B = [Z15, Z12, -Z16; -Z17, Z11, Z18; Z10, 0, Z9]
-> (174) Z19 = Z4*Z8
-> (175) Z20 = Z3*Z8
-> (176) Z21 = Z4*Z7
-> (177) Z22 = Z3*Z7
-> (178) B_YF = [Z7, Z19, -Z20; 0, Z3, Z4; Z8, -Z21, Z22]
-> (179) Z23 = Z1*Z7 - Z2*Z19
-> (180) Z24 = Z1*Z19 + Z2*Z7
-> (181) Z25 = Z2*Z3
-> (182) Z26 = Z1*Z3
-> (183) Z27 = Z1*Z8 + Z2*Z21
-> (184) Z28 = Z2*Z8 - Z1*Z21
-> (185) B_N = [Z23, Z24, -Z20; -Z25, Z26, Z4; Z27, Z28, Z22]
-> (186) Z29 = Z15*Z23 - Z12*Z25 - Z16*Z27
-> (187) Z30 = Z12*Z26 + Z15*Z24 - Z16*Z28
-> (188) Z31 = Z4*Z12 - Z15*Z20 - Z16*Z22
-> (189) Z32 = Z18*Z27 - Z11*Z25 - Z17*Z23
-> (190) Z33 = Z11*Z26 + Z18*Z28 - Z17*Z24
-> (191) Z34 = Z4*Z11 + Z17*Z20 + Z18*Z22
-> (192) Z35 = Z9*Z27 + Z10*Z23
-> (193) Z36 = Z9*Z28 + Z10*Z24
-> (194) Z37 = Z9*Z22 - Z10*Z20
-> (195) H_N = [Z29, Z30, Z31; Z32, Z33, Z34; Z35, Z36, Z37]
-> (196) Z38 = 1 - Z34^2
-> (197) Z39 = Z38^0.5
-> (198) Z40 = 1/Z39
-> (199) Z41 = Z34/Z39
-> (200) Z42 = rf*Z41
-> (201) Z43 = rft + rf*Z40
-> (202) P_FO_NF> = -Z42*H2> + Z43*N3>

Comments:

Lines 98-138: Here is where the angular relationship of the various
rigid bodies and intermediate frames are defined. The 'SIMPROT'
command stands for Simple Rotation, and it essentially just generate a
rotation matrix, but notice that it automatically introduces
intermediate variables for the sin and cosine of the angle of
rotation.

Lines 139-162: The notation here is a little funky, but basically
'P_NO_RO>' denotes the position vector (hence the 'P'), from the point
NO, to the point RO. Autolev recognizes 'NO' as the origin of a
Newtonian reference frame that I declared earlier (not shown), and
'RO' as the center of mass location of the rigid body 'R' (also
declared earlier). Anything with a '>' at the end of it denotes that
it is a vector, and most of the vectors here are the body fixed
vectors of each rigid body (they are created automatically upon
declaration of a rigid body). Once the orientations of all of the
bodies has been declared, you can work with whichever coordinate
system is most convenient when describing the location of a center of
mass or a point.

Lines 163-202: This best illustrates the behavior I am trying to
achieve. Upon entering the definition of 'P_FO_NF>' (the position
vector from the center of the front wheel to the front wheel - ground
contact point), the Autolev interpreter generates 28 intermediate 'Z'
variables before finally forming the final expression. All of these
intermediate variables somehow involve the generalized coordinates of
the problem at hand (most happen to be angles) and hence they would
vary throughout the course of a numerical integration. For
comparison, if P_FO_NF> is expressed in the Newtonian frame and no
substitution is used, it looks like:

-> (224) P_FO_NF> = rf*(SIN(q4)*COS(q7)+SIN(q7)*COS(q4)*SIN(lambda
+q6))*(SIN(q7)
*COS(q3)*COS(lambda+q6)+SIN(q3)*(COS(q4)*COS(q7)-
SIN(q4)*SIN(q7)*SIN(lambda+q6))
)/(1-(SIN(q4)*COS(q7)+SIN(q7)*COS(q4)*SIN(lambda+q6))^2)^0.5*N1> +
rf*(SIN(q4)*C
OS(q7)+SIN(q7)*COS(q4)*SIN(lambda+q6))*(SIN(q3)*SIN(q7)*COS(lambda+q6)-
COS(q3)*(
COS(q4)*COS(q7)-SIN(q4)*SIN(q7)*SIN(lambda+q6)))/(1-
(SIN(q4)*COS(q7)+SIN(q7)*COS
(q4)*SIN(lambda+q6))^2)^0.5*N2> + (rft+rf*(1-
(SIN(q4)*COS(q7)+SIN(q7)*COS(q4)*SI
N(lambda+q6))^2)^0.5)*N3>

It is easy to see that there are many repeated quantities in the above
expression and it makes sense that they only be computed once. When
forming expressions for the velocities and accelerations, one can
imagine how long the expressions get after the application of the
chain rule.... many pages long. I don't know if the method Autolev is
identifying and creating intermediate variables is the best way to do
it because I have seen cases that don't seem to actually simplify the
equations much, but for the most part, it seems to work well. And, as
was mentioned by somebody else, there must be a trade off between
memory usage and flop usage. I the problems I have worked on, using
extra memory used has been worth the savings in flops by a large
factor. For example, the equations of motion for the Whipple bicycle
generated by Autolev are about 77kb, while those same equations
generated by another author's method are about 3.5mb, almost 2 orders
of magnitude larger. Needless to say, numerical integration times are
much faster with the smaller more compact representation of the
equations of motion.

Essentially what I need to do is parse every equation and identify
subexressions (and then parse those subexpression and identify
subsubexpressions....), and introduce variables for any that are used
more than once. If a subexpression is found, but doesn't occur more
than once in any of the preceding equations, but then later shows up
in another equation, it would then be introduced as an intermediate
variable. If they aren't used more than once, then it doesn't make
sense to introduce an intermediate variable for them, unless the goal
is simply just readability.

Any ideas on how to start parsing each equation and accumulating a
list of subexpressions?

I haven't had time to start coding this up in python yet, hopefully
I'll get started on this over the summer.

Thanks,
~Luke

Robert Kern

unread,

Jun 16, 2008, 3:14:32 PM6/16/08

to sy...@googlegroups.com

On Mon, Jun 16, 2008 at 12:13, Luke <hazel...@gmail.com> wrote:
> Essentially what I need to do is parse every equation and identify
> subexressions (and then parse those subexpression and identify
> subsubexpressions....), and introduce variables for any that are used
> more than once. If a subexpression is found, but doesn't occur more
> than once in any of the preceding equations, but then later shows up
> in another equation, it would then be introduced as an intermediate
> variable. If they aren't used more than once, then it doesn't make
> sense to introduce an intermediate variable for them, unless the goal
> is simply just readability.
>
> Any ideas on how to start parsing each equation and accumulating a
> list of subexpressions?

Look at the code I provided attached to the issue.

http://sympy.googlecode.com/issues/attachment?aid=3304519749492286715&name=cse.py

Luke

unread,

Jun 16, 2008, 5:41:03 PM6/16/08

to sympy

Robert,
The Mathematica link you provided is exactly what I'm trying to do.
I haven't tried your python code yet but after reading it I think it
should work great. I really appreciate your comments and your help!

Thanks,
~Luke

On Jun 16, 12:14 pm, "Robert Kern" <robert.k...@gmail.com> wrote:

> On Mon, Jun 16, 2008 at 12:13, Luke <hazelnu...@gmail.com> wrote:
> > Essentially what I need to do is parse every equation and identify
> > subexressions (and then parse those subexpression and identify
> > subsubexpressions....), and introduce variables for any that are used
> > more than once. If a subexpression is found, but doesn't occur more
> > than once in any of the preceding equations, but then later shows up
> > in another equation, it would then be introduced as an intermediate
> > variable. If they aren't used more than once, then it doesn't make
> > sense to introduce an intermediate variable for them, unless the goal
> > is simply just readability.
>
> > Any ideas on how to start parsing each equation and accumulating a
> > list of subexpressions?
>
> Look at the code I provided attached to the issue.
>

> http://sympy.googlecode.com/issues/attachment?aid=3304519749492286715...

Ondrej Certik

unread,

Jun 17, 2008, 3:11:43 AM6/17/08

to sy...@googlegroups.com

On Mon, Jun 16, 2008 at 11:41 PM, Luke <hazel...@gmail.com> wrote:
>
> Robert,
> The Mathematica link you provided is exactly what I'm trying to do.
> I haven't tried your python code yet but after reading it I think it

I did:

http://code.google.com/p/sympy/issues/detail?id=891#c3

> should work great. I really appreciate your comments and your help!

Thanks Robert and Fredrik for the code!

Robert's main loop is this:

for subtree in postorder_traversal(expr):
if subtree.args == ():
# Exclude atoms, since there is no point in renaming them.
continue
if subtree.args != () and subtree in seen_subexp and subtree

not in to_eliminate:
to_eliminate.append(subtree)
seen_subexp.add(subtree)

It's awesome. Then one just takes the to_eliminate list and substitute
it for x_0, x_1, ...

On Mon, Jun 16, 2008 at 8:28 AM, Robert Kern <rober...@gmail.com> wrote:
> On Sun, Jun 15, 2008 at 17:46, Ondrej Certik <ond...@certik.cz> wrote:
>> Anyway, I created an issue for this:
>>
>> http://code.google.com/p/sympy/issues/detail?id=891
>
> I've attached some nominally working code to the issue. Playing with
> it, I see some suboptimal results. First, it's quadratic, but I'm not
> clever enough to think of a way around that. Second, sympy's default
> representation of expressions isn't well-suited for this algorithm (or
> the other way around). For example:
>
> In [1]: from cse import cse
>
> In [2]: cse((x-y)*(z-y) + sqrt((x-y)*(z-y)))
> Out[2]:
> ⎛ ⎽⎽⎽⎽ ⎽⎽⎽⎽⎞
> ⎝[(x₀, -y), (x₁, x + x₀), (x₂, x₀ + z)], x₁*x₂ + ╲╱ x₁ *╲╱ x₂ ⎠
>
>
> We see that -y gets picked up as term that should be pulled out
> although it only shows up in the context of (x-y). For the typical use
> case, subtraction is an atomic operation, and the representation of
> (x-y) as Add(Mul(NegativeOne(-1), Symbol('y')), Symbol('x')) gets in
> the way.
>
> Although the input expression has both x1 and x2 under the same
> sqrt(), it gets broken out before the cse() function gets to look at
> it. It would be nice to stuff everything in that term under the same
> sqrt() before cse() operates.

Well, x-y is represented as Add(x, Mul(-1, y)) as you said. So -y is
Mul(-1,y), thus it is a common subexpression and thus it get
substituted,
I think everything is a-ok, isn't it?

So if you want to improve the algorithm, apply this trivial patch:

diff --git a/cse.py b/cse.py
--- a/cse.py
+++ b/cse.py
@@ -48,6 +48,9 @@ def cse(expr, prefix='x'):
if subtree.args == ():
# Exclude atoms, since there is no point in renaming them.
continue
+ if subtree.is_Mul and subtree.args[0].is_number:
+ # Exclude stuff like (-y)
+ continue
if subtree.args != () and subtree in seen_subexp and subtree

not in to_eliminate:
to_eliminate.append(subtree)
seen_subexp.add(subtree)

Now it works:

In [2]: cse((x-y)*(z-y) + sqrt((x-y)*(z-y)))
Out[2]:
⎛ ⎽⎽⎽⎽ ⎽⎽⎽⎽⎞

⎝[(x₀, x - y), (x₁, z - y)], x₀*x₁ + ╲╱ x₀ *╲╱ x₁ ⎠

> This kind of thing pops up from time to time in my experience.

It does, but it can be easily fixed, see above. I.e. fixing your
algorithm. :) The other option, as you said, is fixing sympy. We would
have to introduce a Sub class.
But I think it would just make things more complex. No?

>
> So I think there needs to be a bit of a preprocessing step to optimize
> the expression specifically for cse() before it does its magic.
>
> Generalizing this to a list of expressions (e.g. my old use case) is
> an exercise left to the reader.

Yep. Thanks again for the code, it will be included soon in sympy.
If you could Luke test it and maybe even prepare a patch to SymPy
together with a test, it'd be very much appreciated. :)

Ondrej

Robert Kern

unread,

Jun 17, 2008, 1:46:06 PM6/17/08

to sy...@googlegroups.com

It is certainly correct for that representation. However, my claim is
that this representation is suboptimal for the purposes to which one
wants to apply common subexpression elimination. All of the use cases
hold as important one of these two criteria:

1. flop count
2. readability

CSE on the default sympy representation pessimizes both of these. For
floating point evaluation, - is an atomic operation. You would not
generate code that breaks that up into a Mul and an Add.

x0 = -1*y
x1 = x + x0

versus

x1 = x - y

Readability is much the same.

> So if you want to improve the algorithm, apply this trivial patch:
>
> diff --git a/cse.py b/cse.py
> --- a/cse.py
> +++ b/cse.py
> @@ -48,6 +48,9 @@ def cse(expr, prefix='x'):
> if subtree.args == ():
> # Exclude atoms, since there is no point in renaming them.
> continue
> + if subtree.is_Mul and subtree.args[0].is_number:
> + # Exclude stuff like (-y)
> + continue
> if subtree.args != () and subtree in seen_subexp and subtree
> not in to_eliminate:
> to_eliminate.append(subtree)
> seen_subexp.add(subtree)
>
>
> Now it works:
>
> In [2]: cse((x-y)*(z-y) + sqrt((x-y)*(z-y)))
> Out[2]:
> ⎛ ⎽⎽⎽⎽ ⎽⎽⎽⎽⎞
> ⎝[(x₀, x - y), (x₁, z - y)], x₀*x₁ + ╲╱ x₀ *╲╱ x₁ ⎠
>
>
>> This kind of thing pops up from time to time in my experience.
>
> It does, but it can be easily fixed, see above. I.e. fixing your
> algorithm. :) The other option, as you said, is fixing sympy. We would
> have to introduce a Sub class.
> But I think it would just make things more complex. No?

For all of sympy, yes. However, I think we could keep these
transformations local to the cse module. You would have to implement
just enough of Sub to replace Add(x, Mul(-constant, y)) and go back
again.

Similarly, breaking up a power of a product into a product of powers
in the default sympy representation hides optimization opportunities.
CSE could work better with a different representation.

>> So I think there needs to be a bit of a preprocessing step to optimize
>> the expression specifically for cse() before it does its magic.

I think this is the correct approach rather than putting special cases
inside cse() itself. Preprocessing to a more tractable representation,
CSE, then postprocessing allows us to add optimizations incrementally
while keeping the core algorithm clean and testable. sympy in general
doesn't need to change outside of the cse module.

Ondrej Certik

unread,

Jun 18, 2008, 4:29:52 AM6/18/08

to sy...@googlegroups.com

> It is certainly correct for that representation. However, my claim is
> that this representation is suboptimal for the purposes to which one
> wants to apply common subexpression elimination. All of the use cases
> hold as important one of these two criteria:
>
> 1. flop count
> 2. readability
>
> CSE on the default sympy representation pessimizes both of these. For
> floating point evaluation, - is an atomic operation. You would not
> generate code that breaks that up into a Mul and an Add.
>
> x0 = -1*y
> x1 = x + x0
>
> versus
>
> x1 = x - y
>
> Readability is much the same.

Yes, I completely agree with this.

>> It does, but it can be easily fixed, see above. I.e. fixing your
>> algorithm. :) The other option, as you said, is fixing sympy. We would
>> have to introduce a Sub class.
>> But I think it would just make things more complex. No?
>
> For all of sympy, yes. However, I think we could keep these
> transformations local to the cse module. You would have to implement
> just enough of Sub to replace Add(x, Mul(-constant, y)) and go back
> again.

That's basically what my patch does. In terms of maintainability, I
think it's easier to maintain the two added lines, rather than a full
Sub class.
(Unless it doesn't cover all the corner cases, in which case we might
refactor it into a regular Sub class).

>
> Similarly, breaking up a power of a product into a product of powers
> in the default sympy representation hides optimization opportunities.
> CSE could work better with a different representation.

This also pops up from time to time. The current implementation does
what you'd do by hand in 90% of the cases, i.e. it does the right
thing.
I think we can patch Pow to do something like this:

e = Pow(a*b, 2, expand=False)
f = Pow(a*b, 2)

then

e = (a*b)**2
f = a**2 * b**2

and

e != f

We can also add methods like expand(), collect() etc. I think it
wouldn't break anything and it can be useful from time to time.

>
>>> So I think there needs to be a bit of a preprocessing step to optimize
>>> the expression specifically for cse() before it does its magic.
>
> I think this is the correct approach rather than putting special cases
> inside cse() itself. Preprocessing to a more tractable representation,
> CSE, then postprocessing allows us to add optimizations incrementally
> while keeping the core algorithm clean and testable. sympy in general
> doesn't need to change outside of the cse module.

Right. As I said above, if the only special case are those 2 lines in
my patch, then I think that's the best way. If, however, we realize we
need to add more special cases, then I'd suggest to implement the
other representation, for exactly the reasons you said.

Or to put it another way, if someone sends a patch with the things you
describe, it will surely go in (if it's robust + maintainable),
otherwise I suggest to push in your code + my 2 lines patch.

Ondrej

Robert Kern

unread,

Jun 18, 2008, 4:45:19 AM6/18/08

to sy...@googlegroups.com

On Wed, Jun 18, 2008 at 03:29, Ondrej Certik <ond...@certik.cz> wrote:

>>> It does, but it can be easily fixed, see above. I.e. fixing your
>>> algorithm. :) The other option, as you said, is fixing sympy. We would
>>> have to introduce a Sub class.
>>> But I think it would just make things more complex. No?
>>
>> For all of sympy, yes. However, I think we could keep these
>> transformations local to the cse module. You would have to implement
>> just enough of Sub to replace Add(x, Mul(-constant, y)) and go back
>> again.
>
> That's basically what my patch does.

Yes, but down inside the guts of the CSE algorithm. My preference is
to keep the core algorithm generic. It is currently entirely agnostic
to the actual contents of the tree.

> In terms of maintainability, I
> think it's easier to maintain the two added lines, rather than a full
> Sub class.

If that were the only optimization, I would agree. However, this
expression was the first I tested the algorithm on, and I found two
places for optimization. I think we'll find more. Preprocess, CSE,
then postprocess is more extensible, IMO, than adding special cases
inside the CSE. I think that the product-of-powers case can only
really be done by preprocessing, so we might as well do all of the
optimizations by preprocessing. Unit testing is easier by separating
the concerns of optimizing the representation and doing CSE. It opens
the possibility for the user to select the optimizations they need.

These are also things that can be added later, so having the core CSE
implementation could go in right away (pending unit tests and multiple
expressions).

Pearu Peterson

unread,

Jun 18, 2008, 5:03:04 AM6/18/08

to sympy

Hi,

In sympycore we have Verbatim class that holds expressions without
any evaluation. E.g. it can hold the expression 'x - y' in both
forms: Add(x, -y) and Sub(x, y). Converting to Verbatim and back
to the default representation is relatively cheap. May sympy should
implement a similar scheme in order to better support the cse
algorithm.

Pearu

Ondrej Certik

unread,

Jun 18, 2008, 5:13:13 AM6/18/08

to sy...@googlegroups.com

Thanks for the tip. BTW, you don't seem to have a class Sub in
sympycore, definitely not exported by default.

Ondrej

Pearu Peterson

unread,

Jun 18, 2008, 5:23:26 AM6/18/08

to sympy

On Jun 18, 9:13 am, "Ondrej Certik" <ond...@certik.cz> wrote:
> On Wed, Jun 18, 2008 at 11:03 AM, Pearu Peterson
>

> <pearu.peter...@gmail.com> wrote:
>
> > Hi,
>
> > In sympycore we have Verbatim class that holds expressions without
> > any evaluation. E.g. it can hold the expression 'x - y' in both
> > forms: Add(x, -y) and Sub(x, y). Converting to Verbatim and back
> > to the default representation is relatively cheap. May sympy should
> > implement a similar scheme in order to better support the cse
> > algorithm.
>
> Thanks for the tip. BTW, you don't seem to have a class Sub in
> sympycore, definitely not exported by default.

sympycore does not have classes Add, Mul, et as well:)
However, Sub is not exposed indeed (may be it should),
it is available as class method to ring algebras:
>>> Calculus.Sub(x, y)
Calculus('x - y')
>>> Calculus.Add(x, y)
Calculus('x + y')

Anyway, here's how Verbatim can be used in sympycore (the interface
has room for improvement, of course):
>>> from sympycore import *
>>> from sympycore.utils import SUB, ADD
>>> x,y,z,v,w=map(Symbol,'xyzvw')
>>> Verbatim(SUB, (x,y))
Verbatim('x - y')
>>> Verbatim(ADD, (x,-y))
Verbatim('x + -y')
>>> print Verbatim(SUB, (x,y)).as_tree()
Verbatim:
SUB[
SYMBOL[x]
SYMBOL[y]
]

Pearu

Ondrej Certik

unread,

Jun 18, 2008, 5:32:11 AM6/18/08

to sy...@googlegroups.com

Thanks for the example. Isn't just implementing a class Sub in SymPy
(+conversion methods) a better solution? It is like having the
Integral or Derivative class,
it also isn't used by default, but if the user requests it, he can have it.

I agree. But you know what Linus says[1], right :), so the current options are:

1) commit your patch as is and wait until someone fixes the x-y
problem in a general way
2) commit your patch + my patch + a good comment, that adding further
special cases are not acceptable and one should refactor it, and give
a reference to this thread,
3) do not commit anything, and wait until someone implements the general way

Each option has advantages and disadvantages, I think I myself am for
2), but if majority of you think we should just go the 1) or 3) way,
it's ok with me too.

Ondrej

[1] http://lkml.org/lkml/2000/8/25/132

Robert Kern

unread,

Jun 18, 2008, 5:50:03 AM6/18/08

to sy...@googlegroups.com

On Wed, Jun 18, 2008 at 04:32, Ondrej Certik <ond...@certik.cz> wrote:

> 1) commit your patch as is and wait until someone fixes the x-y
> problem in a general way
> 2) commit your patch + my patch + a good comment, that adding further
> special cases are not acceptable and one should refactor it, and give
> a reference to this thread,
> 3) do not commit anything, and wait until someone implements the general way
>
> Each option has advantages and disadvantages, I think I myself am for
> 2), but if majority of you think we should just go the 1) or 3) way,
> it's ok with me too.

I can probably implement multiple expression support + unit tests +
subtraction preprocessing in a few days time.

Ondrej Certik

unread,

Jun 18, 2008, 6:44:41 AM6/18/08

to sy...@googlegroups.com

On Wed, Jun 18, 2008 at 11:50 AM, Robert Kern <rober...@gmail.com> wrote:
>
> On Wed, Jun 18, 2008 at 04:32, Ondrej Certik <ond...@certik.cz> wrote:
>
>> 1) commit your patch as is and wait until someone fixes the x-y
>> problem in a general way
>> 2) commit your patch + my patch + a good comment, that adding further
>> special cases are not acceptable and one should refactor it, and give
>> a reference to this thread,
>> 3) do not commit anything, and wait until someone implements the general way
>>
>> Each option has advantages and disadvantages, I think I myself am for
>> 2), but if majority of you think we should just go the 1) or 3) way,
>> it's ok with me too.
>
> I can probably implement multiple expression support + unit tests +
> subtraction preprocessing in a few days time.

Excellent. In this case, let's go the 3) way.

Thanks!
Ondrej

Fredrik Johansson

unread,

Jun 18, 2008, 9:26:51 AM6/18/08

to sy...@googlegroups.com

I've implemented an evaluate=False option for Add, Mul, Pow and
functions (see attachment). This could be useful to suppress default
behavior like Sub(x,y) -> Add(x,Mul(-1,y)) for code generation etc. As
it happens, I need something like this for evalf testing as well.

Fredrik

evalfalse.patch

Ondrej Certik

unread,

Jun 18, 2008, 11:54:15 AM6/18/08

to sy...@googlegroups.com

Thanks, it's in:

http://hg.sympy.org/sympy/rev/c51330c0f3da

Ondrej

Robert Kern

unread,

Jun 19, 2008, 1:42:44 AM6/19/08

to sy...@googlegroups.com

Where should the code go in the package?

Ondrej Certik

unread,

Jun 19, 2008, 4:10:14 AM6/19/08

to sy...@googlegroups.com

On Thu, Jun 19, 2008 at 7:42 AM, Robert Kern <rober...@gmail.com> wrote:
>
> On Wed, Jun 18, 2008 at 05:44, Ondrej Certik <ond...@certik.cz> wrote:
>>
>> On Wed, Jun 18, 2008 at 11:50 AM, Robert Kern <rober...@gmail.com> wrote:
>
>>> I can probably implement multiple expression support + unit tests +
>>> subtraction preprocessing in a few days time.
>>
>> Excellent. In this case, let's go the 3) way.
>
> Where should the code go in the package?

Let's put it to sympy/simplify/

It is in fact a simplification.

Ondrej

Ondrej Certik

unread,

Jul 1, 2008, 11:38:56 AM7/1/08

to sy...@googlegroups.com

Any progress on this? I think it's time to make anothe release, let's
say on Monday. If you are busy, I propose to put the code that we have
in (with my patch and a comment how to proceed with fixing it
generally) and when you or anyone else finds time, it can be improved.

Ondrej

Robert Kern

unread,

Jul 1, 2008, 2:49:37 PM7/1/08

to sy...@googlegroups.com

On Tue, Jul 1, 2008 at 10:38, Ondrej Certik <ond...@certik.cz> wrote:
> Any progress on this? I think it's time to make anothe release, let's
> say on Monday. If you are busy, I propose to put the code that we have
> in (with my patch and a comment how to proceed with fixing it
> generally) and when you or anyone else finds time, it can be improved.

It's just about ready.

Robert Kern

unread,

Jul 6, 2008, 6:36:12 AM7/6/08

to sy...@googlegroups.com

On Tue, Jul 1, 2008 at 10:38, Ondrej Certik <ond...@certik.cz> wrote:

> Any progress on this? I think it's time to make anothe release, let's
> say on Monday. If you are busy, I propose to put the code that we have
> in (with my patch and a comment how to proceed with fixing it
> generally) and when you or anyone else finds time, it can be improved.

'tis done.

http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/sympy/

Ondrej Certik

unread,

Jul 6, 2008, 6:45:40 AM7/6/08

to sy...@googlegroups.com

On Sun, Jul 6, 2008 at 12:36 PM, Robert Kern <rober...@gmail.com> wrote:
>
> On Tue, Jul 1, 2008 at 10:38, Ondrej Certik <ond...@certik.cz> wrote:
>> Any progress on this? I think it's time to make anothe release, let's
>> say on Monday. If you are busy, I propose to put the code that we have
>> in (with my patch and a comment how to proceed with fixing it
>> generally) and when you or anyone else finds time, it can be improved.
>
> 'tis done.
>
> http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/sympy/

Wow, beautiful patches, thanks a lot!

I'll run couple tests and push it in.

While I am at the doc writing, I'll also write some intro about this
to our docs.

Ondrej

Robert Kern

unread,

Jul 6, 2008, 6:48:07 AM7/6/08

to sy...@googlegroups.com

You should also decide where the main cse() function ought to be
exposed. I punted, and didn't expose it in sympy.* or
sympy.simplify.*.

Ondrej Certik

unread,

Jul 6, 2008, 8:13:40 AM7/6/08

to sy...@googlegroups.com

> You should also decide where the main cse() function ought to be
> exposed. I punted, and didn't expose it in sympy.* or
> sympy.simplify.*.

I'd add the following patch:

# HG changeset patch
# User Ondrej Certik <ond...@certik.cz>
# Date 1215346162 -7200
# Node ID a899701f1e4f303dcc41fd4a1e00795936a035c5
# Parent d799b0e232f2d15f7c771aec318d30840506a681
Make cse accept a single expression as well.

diff --git a/sympy/simplify/cse.py b/sympy/simplify/cse.py
--- a/sympy/simplify/cse.py
+++ b/sympy/simplify/cse.py
@@ -1,7 +1,7 @@
""" Tools for doing common subexpression elimination.
"""

-from sympy import Symbol
+from sympy import Symbol, Basic
from sympy.utilities.iterables import postorder_traversal

import cse_opts
@@ -91,7 +91,7 @@ def cse(exprs, symbols=None, optimizatio

Parameters
----------
- exprs : list of sympy expressions
+ exprs : list of sympy expressions, or a single sympy expression
The expressions to reduce.
symbols : infinite iterator yielding unique Symbols
The symbols used to label the common subexpressions which are pulled
@@ -124,6 +124,9 @@ def cse(exprs, symbols=None, optimizatio
# manipulations of the module-level list in some other thread.
optimizations = list(cse_optimizations)

+ # Handle the case if just one expression was passed.
+ if isinstance(exprs, Basic):
+ exprs = [exprs]
# Preprocess the expressions to give us better optimization opportunities.
exprs = [preprocess_for_cse(e, optimizations) for e in exprs]

diff --git a/sympy/simplify/tests/test_cse.py b/sympy/simplify/tests/test_cse.py
--- a/sympy/simplify/tests/test_cse.py
+++ b/sympy/simplify/tests/test_cse.py
@@ -45,6 +45,13 @@ def test_cse_single():
assert substs == [(x0, x+y)]
assert reduced == [sqrt(x0) + x0**2]

+def test_cse_single2():
+ # Simple substitution, test for being able to pass the expression directly
+ e = Add(Pow(x+y,2), sqrt(x+y))
+ substs, reduced = cse.cse(e, optimizations=[])
+ assert substs == [(x0, x+y)]
+ assert reduced == [sqrt(x0) + x0**2]
+
def test_cse_not_possible():
# No substitution possible.
e = Add(x,y)

and export it using this patch:

# HG changeset patch
# User Ondrej Certik <ond...@certik.cz>
# Date 1215346276 -7200
# Node ID 049e4c8c531bacf82216959d2c6bfa004a662961
# Parent a899701f1e4f303dcc41fd4a1e00795936a035c5
Export cse() by default.

diff --git a/sympy/simplify/__init__.py b/sympy/simplify/__init__.py
--- a/sympy/simplify/__init__.py
+++ b/sympy/simplify/__init__.py
@@ -10,3 +10,5 @@ from rewrite import cancel, trim, apart
from rewrite import cancel, trim, apart

from sqrtdenest import sqrtdenest
+
+from cse import cse

Let me know what you think.

Ondrej

Robert Kern

unread,

Jul 6, 2008, 8:19:27 AM7/6/08

to sy...@googlegroups.com

On Sun, Jul 6, 2008 at 07:13, Ondrej Certik <ond...@certik.cz> wrote:
> diff --git a/sympy/simplify/__init__.py b/sympy/simplify/__init__.py
> --- a/sympy/simplify/__init__.py
> +++ b/sympy/simplify/__init__.py
> @@ -10,3 +10,5 @@ from rewrite import cancel, trim, apart
> from rewrite import cancel, trim, apart
>
> from sqrtdenest import sqrtdenest
> +
> +from cse import cse

I'd rename the module, in this case. It makes it difficult to import
the module from the sympy.simplify package. This is necessary if
someone needs to change the default optimization list at runtime.

Ondrej Certik

unread,

Jul 6, 2008, 8:55:49 AM7/6/08

to sy...@googlegroups.com

On Sun, Jul 6, 2008 at 2:19 PM, Robert Kern <rober...@gmail.com> wrote:
>
> On Sun, Jul 6, 2008 at 07:13, Ondrej Certik <ond...@certik.cz> wrote:
>> diff --git a/sympy/simplify/__init__.py b/sympy/simplify/__init__.py
>> --- a/sympy/simplify/__init__.py
>> +++ b/sympy/simplify/__init__.py
>> @@ -10,3 +10,5 @@ from rewrite import cancel, trim, apart
>> from rewrite import cancel, trim, apart
>>
>> from sqrtdenest import sqrtdenest
>> +
>> +from cse import cse
>
> I'd rename the module, in this case. It makes it difficult to import
> the module from the sympy.simplify package. This is necessary if
> someone needs to change the default optimization list at runtime.

That's right. How about this patch:

# HG changeset patch
# User Ondrej Certik <ond...@certik.cz>
# Date 1215346276 -7200

# Node ID 841260b1596e0adcc1aef4695f78c02fb14c870b
# Parent a899701f1e4f303dcc41fd4a1e00795936a035c5
cse module renamed to cse_main and cse_main.cse() exported by default.

diff --git a/sympy/simplify/__init__.py b/sympy/simplify/__init__.py
--- a/sympy/simplify/__init__.py
+++ b/sympy/simplify/__init__.py
@@ -10,3 +10,5 @@ from rewrite import cancel, trim, apart
from rewrite import cancel, trim, apart

from sqrtdenest import sqrtdenest
+

+from cse_main import cse
diff --git a/sympy/simplify/cse.py b/sympy/simplify/cse_main.py
rename from sympy/simplify/cse.py
rename to sympy/simplify/cse_main.py

diff --git a/sympy/simplify/tests/test_cse.py b/sympy/simplify/tests/test_cse.py
--- a/sympy/simplify/tests/test_cse.py
+++ b/sympy/simplify/tests/test_cse.py

@@ -1,19 +1,19 @@ import itertools
import itertools

-from sympy import Add, Mul, Pow, Symbol, sin, sqrt, symbols, sympify
-from sympy.simplify import cse, cse_opts
+from sympy import Add, Mul, Pow, Symbol, sin, sqrt, symbols, sympify, cse
+from sympy.simplify import cse_main, cse_opts

w,x,y,z = symbols('wxyz')
-x0,x1,x2 = list(itertools.islice(cse.numbered_symbols(), 0, 3))
+x0,x1,x2 = list(itertools.islice(cse_main.numbered_symbols(), 0, 3))
negone = sympify(-1)

def test_numbered_symbols():
- ns = cse.numbered_symbols(prefix='y')
+ ns = cse_main.numbered_symbols(prefix='y')
assert list(itertools.islice(ns, 0, 10)) == [Symbol('y%s'%i) for
i in range(0, 10)]
- ns = cse.numbered_symbols(prefix='y')
+ ns = cse_main.numbered_symbols(prefix='y')
assert list(itertools.islice(ns, 10, 20)) == [Symbol('y%s'%i) for
i in range(10, 20)]
- ns = cse.numbered_symbols()
+ ns = cse_main.numbered_symbols()
assert list(itertools.islice(ns, 0, 10)) == [Symbol('x%s'%i) for
i in range(0, 10)]

# Dummy "optimization" functions for testing.
@@ -24,59 +24,59 @@ def opt2(expr):
return expr*z

def test_preprocess_for_cse():
- assert cse.preprocess_for_cse(x, [(opt1, None)]) == x+y
- assert cse.preprocess_for_cse(x, [(None, opt1)]) == x
- assert cse.preprocess_for_cse(x, [(None, None)]) == x
- assert cse.preprocess_for_cse(x, [(opt1, opt2)]) == x+y
- assert cse.preprocess_for_cse(x, [(opt1, None), (opt2, None)]) == (x+y)*z
+ assert cse_main.preprocess_for_cse(x, [(opt1, None)]) == x+y
+ assert cse_main.preprocess_for_cse(x, [(None, opt1)]) == x
+ assert cse_main.preprocess_for_cse(x, [(None, None)]) == x
+ assert cse_main.preprocess_for_cse(x, [(opt1, opt2)]) == x+y
+ assert cse_main.preprocess_for_cse(x, [(opt1, None), (opt2,
None)]) == (x+y)*z

def test_postprocess_for_cse():
- assert cse.postprocess_for_cse(x, [(opt1, None)]) == x
- assert cse.postprocess_for_cse(x, [(None, opt1)]) == x+y
- assert cse.postprocess_for_cse(x, [(None, None)]) == x
- assert cse.postprocess_for_cse(x, [(opt1, opt2)]) == x*z
+ assert cse_main.postprocess_for_cse(x, [(opt1, None)]) == x
+ assert cse_main.postprocess_for_cse(x, [(None, opt1)]) == x+y
+ assert cse_main.postprocess_for_cse(x, [(None, None)]) == x
+ assert cse_main.postprocess_for_cse(x, [(opt1, opt2)]) == x*z
# Note the reverse order of application.
- assert cse.postprocess_for_cse(x, [(None, opt1), (None, opt2)]) == x*z+y
+ assert cse_main.postprocess_for_cse(x, [(None, opt1), (None,
opt2)]) == x*z+y

def test_cse_single():
# Simple substitution.
e = Add(Pow(x+y,2), sqrt(x+y))
- substs, reduced = cse.cse([e], optimizations=[])
+ substs, reduced = cse([e], optimizations=[])

assert substs == [(x0, x+y)]
assert reduced == [sqrt(x0) + x0**2]

def test_cse_single2():

# Simple substitution, test for being able to pass the expression directly

e = Add(Pow(x+y,2), sqrt(x+y))
- substs, reduced = cse.cse(e, optimizations=[])
+ substs, reduced = cse(e, optimizations=[])

assert substs == [(x0, x+y)]
assert reduced == [sqrt(x0) + x0**2]

def test_cse_not_possible():

# No substitution possible.
e = Add(x,y)

- substs, reduced = cse.cse([e], optimizations=[])
+ substs, reduced = cse([e], optimizations=[])
assert substs == []
assert reduced == [x+y]

def test_nested_substitution():
# Substitution within a substitution.
e = Add(Pow(w*x+y,2), sqrt(w*x+y))
- substs, reduced = cse.cse([e], optimizations=[])
+ substs, reduced = cse([e], optimizations=[])
assert substs == [(x0, w*x), (x1, x0+y)]
assert reduced == [sqrt(x1) + x1**2]

def test_subtraction_opt():
# Make sure subtraction is optimized.
e = (x-y)*(z-y) + sin((x-y)*(z-y))
- substs, reduced = cse.cse([e],
optimizations=[(cse_opts.sub_pre,cse_opts.sub_post)])
+ substs, reduced = cse([e],
optimizations=[(cse_opts.sub_pre,cse_opts.sub_post)])
assert substs == [(x0, z-y), (x1, x-y), (x2, x0*x1)]
assert reduced == [x2 + sin(x2)]

def test_multiple_expressions():
e1 = (x+y)*z
e2 = (x+y)*w
- substs, reduced = cse.cse([e1, e2], optimizations=[])
+ substs, reduced = cse([e1, e2], optimizations=[])

assert substs == [(x0, x+y)]

assert reduced == [x0*z, x0*w]

Ondrej

Robert Kern

unread,

Jul 6, 2008, 8:57:45 AM7/6/08

to sy...@googlegroups.com

On Sun, Jul 6, 2008 at 07:55, Ondrej Certik <ond...@certik.cz> wrote:
>
> On Sun, Jul 6, 2008 at 2:19 PM, Robert Kern <rober...@gmail.com> wrote:
>>
>> On Sun, Jul 6, 2008 at 07:13, Ondrej Certik <ond...@certik.cz> wrote:
>>> diff --git a/sympy/simplify/__init__.py b/sympy/simplify/__init__.py
>>> --- a/sympy/simplify/__init__.py
>>> +++ b/sympy/simplify/__init__.py
>>> @@ -10,3 +10,5 @@ from rewrite import cancel, trim, apart
>>> from rewrite import cancel, trim, apart
>>>
>>> from sqrtdenest import sqrtdenest
>>> +
>>> +from cse import cse
>>
>> I'd rename the module, in this case. It makes it difficult to import
>> the module from the sympy.simplify package. This is necessary if
>> someone needs to change the default optimization list at runtime.
>
> That's right. How about this patch:

Works for me.

Ondrej Certik

unread,

Jul 6, 2008, 9:35:58 AM7/6/08

to sy...@googlegroups.com

On Sun, Jul 6, 2008 at 2:57 PM, Robert Kern <rober...@gmail.com> wrote:
>
> On Sun, Jul 6, 2008 at 07:55, Ondrej Certik <ond...@certik.cz> wrote:
>>
>> On Sun, Jul 6, 2008 at 2:19 PM, Robert Kern <rober...@gmail.com> wrote:
>>>
>>> On Sun, Jul 6, 2008 at 07:13, Ondrej Certik <ond...@certik.cz> wrote:
>>>> diff --git a/sympy/simplify/__init__.py b/sympy/simplify/__init__.py
>>>> --- a/sympy/simplify/__init__.py
>>>> +++ b/sympy/simplify/__init__.py
>>>> @@ -10,3 +10,5 @@ from rewrite import cancel, trim, apart
>>>> from rewrite import cancel, trim, apart
>>>>
>>>> from sqrtdenest import sqrtdenest
>>>> +
>>>> +from cse import cse
>>>
>>> I'd rename the module, in this case. It makes it difficult to import
>>> the module from the sympy.simplify package. This is necessary if
>>> someone needs to change the default optimization list at runtime.
>>
>> That's right. How about this patch:
>
> Works for me.

Great. It's in.

Ondrej

Ondrej Certik

unread,

Jul 7, 2008, 6:08:40 AM7/7/08

to sy...@googlegroups.com, Stéfan van der Walt

The docs are here:

http://docs.sympy.org/modules/rewriting.html#module-sympy.simplify.cse_main

Stefan, what is the preferred way to document things in numpy? I mean
what kind of REST markup one should use in docstrings, so that it
looks nice in sphinx.

Looking here:

http://mentat.za.net/numpy/refguide/functions.xhtml

it seems like you are using stuff like

"
Paremeters
--------------
"

Also is this now the official numpy docs style guide:

http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/example.py#L37

?

in the docstrings, so in this light, this patch:

http://hg.sympy.org/sympy/rev/2a85bc165b30

is not correct. But without it, the sphinx refuses to generate the
docs. How did you overcome this problem?

Ondrej

Stéfan van der Walt

unread,

Jul 7, 2008, 7:15:38 AM7/7/08

to Ondrej Certik, sy...@googlegroups.com

2008/7/7 Ondrej Certik <ond...@certik.cz>:

> The docs are here:
>
> http://docs.sympy.org/modules/rewriting.html#module-sympy.simplify.cse_main
>
> Stefan, what is the preferred way to document things in numpy? I mean
> what kind of REST markup one should use in docstrings, so that it
> looks nice in sphinx.
>
> Looking here:
>
> http://mentat.za.net/numpy/refguide/functions.xhtml
>
> it seems like you are using stuff like
>
> "

> Parameters

> --------------
> "
>
> Also is this now the official numpy docs style guide:
>
> http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/example.py#L37
>
> ?
>
> in the docstrings, so in this light, this patch:
>
> http://hg.sympy.org/sympy/rev/2a85bc165b30
>
> is not correct. But without it, the sphinx refuses to generate the
> docs. How did you overcome this problem?

The NumPy docstrings are not Sphinx-compatible reST, mainly because we
use top-level constructs (such as Parameters) in nested environments.
We take all the docstrings and parse it through a translator before we
send it to Sphinx:

https://code.launchpad.net/~stefanv/scipy/numpy-refguide

The reference guide is not yet looking very professional, but I hope
to invest some time in its presentation this week.

Matplotlib followed a bit of a different tack, and used Sphinx
compatible markup, which is extracted using the `autodoc` feature.
They also have plotting and LaTeX directives (LaTeX being rendered by
their own mathtext):