GSOC - Step-by-Step calculations

104 views
Skip to first unread message

bl0cked...@gmail.com

unread,
Feb 28, 2014, 7:28:12 PM2/28/14
to sy...@googlegroups.com
Hi,

I'm an undergraduate computer science student. I'm interested in your GSOC project idea
to add "Step-by-Step" functionality to SymPy, and I have some good past experience relating
to implementing that exact specific feature in symbolic mathematics software.

Last summer I spent some time reading textbooks on symbol manipulation, Lisp, and AI,
and wrote a program called "Step-by-Step Derivative Calculator".
Here's a screenshot: https://sites.google.com/site/bl0ckeduserssoftware/easyderivsteps/screenshots/v12gui-linux.png
And the download is at: https://sites.google.com/site/bl0ckeduserssoftware/easyderivsteps/

(Some of the source code for the program is currently available, but not the
source code for the "step-by-step" feature).

I would be quite interested to spend this summer adding similar functionality to SymPy,
and will probably apply as a GSOC student in March, using my real name (my first name is Gabriel
and my last name starts with a C) rather than this pseudonym.

The approach I used and which I may consider attempting to use, perhaps somewhat
modified, in SymPy can be summarized as follows:

(1) some special code marks nodes in the expression tree that get modified between
two operations which are marked as user-presented "steps", and specially decorates
these so they are highlighted with a special colour in the user GUI display;

(2) a table of description strings is built which associates classes of symbolic operations
to parameterized description strings which aim to explain to the user what the operation did,
and include, through parameterization, the specific expression or variable that was modified.
In the case of my pet project, this was done using a custom little DSL for which I wrote a
built-in compiler. The source code for the case of the quotient rule (from calculus) was:

    D(?x, f / g) = (g * D(x, f) - f * D(x, g) ) / (g * g)
    ; Apply quotient rule on $ \\frac{d}{d{?x}} \\left[ \\frac{&f}{&g} \\right] $

Note that this contains LaTeX for the final rendering, mixed with a description string,
mixed with a symbolic rule for how the quotient rule works.

I hope to soon spend some time familiarizing myself with the SymPy source code.

Please tell me if you think my idea is interesting and would be willing to take
me as a GSOC student.

Best,

Bl0ckeduser

Matthew Rocklin

unread,
Feb 28, 2014, 8:29:49 PM2/28/14
to sy...@googlegroups.com
Hi Gabriel/bl0ckedusersoft,

Your step-by-step derivative calculator looks neat and hopefully the experience will help you.  I'm curious, why isn't the source code for the step-by-step feature released?

From what I understand the approach you took on your project seems reasonable (safe to say, given that it worked), and I'm envious of your ability to describe transformations as concisely as you do in your DSL.  I'm not certain that this same approach would work for SymPy though as there is a lot of pre-existing code to work around.  This code is valuable (many thousands of mathematician-hours) but also enforces some pretty serious constraints on what we can and can not do to SymPy.

As you suggest I encourage you to look around the SymPy source code, as you do this you might want to think about the following constraints:

1.  We'd like step-by-step to work on lots of code - SymPy is big, far too big for you change everything.
2.  We can't rewrite all of this code, nor can we expect other developers to implement step-by-step-style methods in the future.
3.  We can't significantly slow down SymPy or add anything to the expression trees.  Step-by-step is not a big enough feature to cause a major redesign.

So you're asked to move mountains (do a huge task) while not making a sound (not changing the code at all.)  We don't need someone to re-implement derivative code, we do need someone who can think about SymPy as a whole, learn how it manipulates expressions, and then inject a small amount of control code in the right places in order to optionally capture and record these manipulations as a side effect.

This is a deceptively hard project.  However I'm encouraged by your particular experience building rule based systems to manipulate expression trees.

Best,
-Matthew






--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/32aee09f-a6a4-494e-9e09-5242f74d23a5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

bl0cked...@gmail.com

unread,
Feb 28, 2014, 10:40:24 PM2/28/14
to sy...@googlegroups.com
Hello Matthew

The source code for my program is not released because it needs some
better commenting and some cleaning-up -- I was trying to make
the program into an ad-supported proprietary program, which didn't work
out very well. I will probably eventually clean up and release the source,
since it's only about 100 or 200 new lines from the already
open-sourced portion of the program.

I understand and agree with the provision you have noted that the current
SymPy code shouldn't be modified very much. I hope to spend
some time soon perusing the SymPy source and trying to figure
out if there is any way to adapt it to track steps.

I've had some experiences evocative of this "moving mountains
without a sound" analogy you make where I spent lots of time
figuring out how some open-source code was structured just to
add what a priori seemed like a small trivial enhancement.
(I have been able to manage some GNOME utilities, but in the case
of GNU Gnash the code was simply too complicated for my skills).

I'll report back if I make any progress.

Best,

Bl0ckeduser

bl0cked...@gmail.com

unread,
Mar 2, 2014, 11:48:36 AM3/2/14
to sy...@googlegroups.com
Hello

I've made a little progress today.

I made a little program (https://raw.github.com/bl0ckeduser/sympy/master/test.py) that tracks
the stack frames of all calls to _eval_derivative and prints out their inputs and outputs,
as the Wiki suggested.

The output for that exact task looks something like this:

Expression :  cos(log(sqrt(a))
Differentiating w.r.t. a:
d/d( a ) [ a ] = 1        
d/d( a ) [ sqrt(a) ] = 1/(2*sqrt(a))
d/d( a ) [ log(sqrt(a)) ] = 1/(2*a)
d/d( a ) [ cos(log(sqrt(a))) ] = -sin(log(sqrt(a)))/(2*a)

This does what the wiki suggests, but it is not exactly clear
how consecutive steps relate to each other. Of course, this is not
the exact desired behaviour, as found in programs like Wolfram.

So I took things further by writing a little loop that looks for exact
substitutions between steps to try to write a full "step-by-step" solution
showing the original full expression as it evolves after every call to the derivative routine.
Unfortunately this doesn't work very well for many expressions because
there is some kind of simplification code being called somewhere that kills
exact substitutions, for example [ 1/(2*a) ] in one step gets multiplied
by [ -sin(log(sqrt(a))) ] in the next step, but it is not possible to see
the exact earlier node [ 1/(2*a) ] in the resulting expression [ -sin(log(sqrt(a)))/(2*a) ].
Perhaps there is some flag somewhere to recursively disable simplifications ?

Below is one case (with output slightly reformatted from my test program)
where this hacky loop did work relatively well:

D(cos(log(a**5) + sin(a))**2, a)
= -2 * D(log(a**5) + sin(a), a)  *  sin(log(a**5) + sin(a))  *  cos(log(a**5) + sin(a))
= -2 * ( D(log(a**5), a)  +  D(sin(a), a) )  *  sin(log(a**5) + sin(a))  *  cos(log(a**5) + sin(a))
= -2*(cos(a) + 5/a)  *  sin(log(a**5) + sin(a))  *  cos(log(a**5) + sin(a))

A combined chain and power rule is somewhat visible in the first two lines,
and a sum rule / linear distribution of differentiation is visible the second.
In the last step, cos(a) and log(a**5) are both differentiated.

I suspect another possible approach would be to somehow modify the code
so that it first outputs purely symbolic, not actually evaluated or applied, differentiation
nodes, displays them, and then proceeds to evaluate them.

I'm still quite unsure whether the overall task is approachable or not.

Best,

Bl0ckeduser

Matthew Rocklin

unread,
Mar 2, 2014, 12:45:33 PM3/2/14
to sy...@googlegroups.com
I find this very encouraging.

Some thoughts:

1.  Rather than accumulate a list of transformations is it possible to accumulate a tree?  Rather than infer which transformations were sub-transformations of others by looking for sub-expressions I wonder if we can collect this information directly.
2.  What other meta-data is available?  Would any of this be valuable for future presentation / human analysis?  In particular as you extend to other methods (you could try all of `_eval_*`)
3.  Automatic evaluation is another issue in SymPy that we would like to work on.  This project may overlap with that issue a bit.
4.  Perhaps there is some flag somewhere to recursively disable simplifications - see https://github.com/sympy/sympy/pull/2917

5.  I suspect another possible approach would be to somehow modify the code so that it first outputs purely symbolic, not actually evaluated or applied, differentiation nodes, displays them, and then proceeds to evaluate them. -- 

Yes, it would be very nice to have explicit stages before and after evaluation.  More or less SymPy does do this, you just need to track more functions than `_eval_derivative`.  Unfortunately we don't do this very consistently, but this could change.  If you want a somewhat clean example, and if you're familiar with linear algebra, then I recommend trying your system on the matrix expression module and the `__new__`,  `doit`, and `_eval_foo` methods (note that sympy matrix expressions is different from normal sympy matrices).  I'm personally biased towards this module, but I do think that it is more self-consistent than much of SymPy.  I often use it as a test-bed for projects like this.

>>> n = Symbol('n')
>>> X = MatrixSymbol('X', n, n)
>>> Y = MatrixSymbol('Y', n, n)
>>> (X + X + Y).T
2*X' + Y'

6.  How could this functionality be useful to people?  What are some applications?

Best,
-Matt


--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
Reply all
Reply to author
Forward
0 new messages