What is out there for SymPy code generation / optimizing compiler effort?

Anthony Scopatz

unread,

Oct 30, 2015, 5:24:06 PM10/30/15

to sy...@googlegroups.com

Hello All,

As many of you probably know, earlier this month Aaron joined my research group at the University of South Carolina. He'll be working on adding / improving SymPy's capabilities with respect to being an optimizing compiler.

There are more details about this vision below, but right now we are in the process of doing a literature review of sorts, and trying to figure out what (SymPy-specific) is out there. What has been done already. Aaron et al, have started putting together a page on the wiki that compiles some of this information. We'd really appreciate it if you know of anything that is not on this page if you could let us know.

We also would be grateful if you could let us know (publicly or privately) about any use cases that you might have for a symbolic optimizing compiler. There are many examples where different folks have done various pieces of this (chemreac, dengo, pydy, some stuff in pyne), but these examples tend to be domain specific. This effort is supposed to target a general scientific computing audience, and to do that we want to have as many possible scenarios in mind at the outset.

And of course, we'd love it if other folks dived in and helped us put this thing together :).

Thanks a million!

Be Well

Anthony

Vision

------------

Essentially, what we want to build is an optimizing compiler for symbolic mathematical expressions in order to solve simple equations, ODEs, PDEs, and perhaps more. This compiler should be able to produce very fast code, though the compiler itself may be expensive.

Ultimately, it is easy to imagine a number of backend targets, such as C, Fortran, LLVM IR, Cython, pure Python, etc. It is also easy to imagine a couple of meaningful frontends - SymPy objects (for starters) and LaTeX (which could then be parsed into SymPy).

We are aiming to have an optimization pipeline that is highly customizable (but with sensible defaults). This would allow folks to tailor the result to their problem or add their own problem-specific optimizations. There are likely different levels to this (such as on an expression vs at full function scope). Some initial elements of this pipeline might include CSE, simple rule-based rewriting (like a/b/c -> a/(b*c) or a*exp(b*x) -> A*2^(B*x)), and replacing non-analytic sub-expressions with approximate expansions (taylor, pade, chebychev, etc) out to an order computed based on floating point precision.

That said, we aren't the only ones thinking in this area. The chemora (http://arxiv.org/pdf/1410.1764.pdf, h/t Matt Turk) code does something like the vision above but using Mathematica, for HPC applications only, and with an astrophysical bent.

I think a tool like this is important because it allows the exploration of more scientific models more quickly and with a higher degree of verification. The current workflow for most scientific modeling is to come up with a mathematical representation of the problem, a human then translates that into a programming language of choice, they may or may not test this translation, and then execution of that model. This compiler aims to get rid of the time-constrained human in those middle steps. It won't tell you if the model is right or not, but you'll sure be able to pump out a whole lot more models :).

--

Asst. Prof. Anthony Scopatz
Nuclear Engineering Program
Mechanical Engineering Dept.
University of South Carolina
sco...@cec.sc.edu
Office: (803) 777-7629
Cell: (512) 827-8239
Check my calendar

Aaron Meurer

unread,

Oct 30, 2015, 5:33:19 PM10/30/15

to sy...@googlegroups.com

I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better?

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/CAPk-6T453AxDYt1UCmBj_7vrzr_HikC2U03UP%2Bzz5_RtDA9NDA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Denis Akhiyarov

unread,

Nov 2, 2015, 12:05:35 AM11/2/15

to sympy

FENICS includes some sort of Python to c++ compiler for PDE and FE problems.

Pyomo can convert optimization problems written in Python to AML modeling language.

Fast matrix exponentiation:

https://github.com/borzunov/cpmoptimize

Tim Lahey

unread,

Nov 2, 2015, 12:24:33 AM11/2/15

to SymPy

I’d really like something along the lines of Allan Wittkopf’s work on code generation for ODEs in Maple. The paper is,

http://www.jnaiam.org/new/uploads/files/16985fffb53018456cf3506db1c5e42b.pdf

It automatically generates code, compiles it and uses it inside Maple. There’s a Maple implementation up at,

http://www.cecm.sfu.ca/~wittkopf/ToExternal.html

and there’s also,

http://www.cecm.sfu.ca/~wittkopf/dna.html

Basically, it’s for numerical solution of ODEs. The code compiled is just the evaluation of the ODEs which is then used in a normal ODE solver, but when the solver evaluates the ODEs, it’s using the compiled code.

I believe this is now a option for the Maple numerical ODE solver. I first used this back in Maple 8 and 9 when it wasn’t. It’s very useful for physics problems.

Cheers,

Tim.

Björn Dahlgren

unread,

Nov 2, 2015, 3:13:15 PM11/2/15

to sympy

On Friday, 30 October 2015 22:33:19 UTC+1, Aaron Meurer wrote:

I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better?

I took the liberty to answer some questions inline in the wiki.

The work that I've done with codegeneration using SymPy was motivated by either:

Solving systems of nonlinear equations numerically
Integrating (nonlinear) systems of ordinary differential equations

Since a picture is worth a thousand words I tried to summarize my efforts here (http://hera.physchem.kth.se/~bjorn/overview.png):

The repos are at https://github/bjodah

I've tried to summarize some of my more general experince here:

Using templates is almost a must, and then rather a powerful templating engine (e.g. Mako) rather than jinja2 or the like.
That said, I think the most common idioms that keep reoccurring should definitely be collected in a "template" library along with convenience functions to render those
from SymPy expressions (e.g. code that populates a jacobian matrix - possibly banded).
Code generation can quite easily impede your speed of development.
Example: you generate a Cython file, for every change both the Cython file and the resulting C file have to be recompiled.
I tried to get distutils to only recompile what had changed and cache object files, I couldn't get it to work so I wrote pycompilation to do that instead.
Another solution would be to JIT code but then it is much harder to adapt your code for various libraries used in HPC environments for example.
Even if one use code-generation it is nice to be able to use `lambdify` so if the API can be made agnostic of that choice it's a great bonus.
See also our recent effort in implementing a faster lambdify in symengine: https://github.com/symengine/symengine.py/pull/11
(I believe we can do better still, perhaps using LLVM jiting for medium sized expressions)
`Indexed` objects in sympy have been quite useful for representing discretized data points, but it could definitely be improved further.

I was quite surprised to hear that compiler CSE elimination was so slow in one of Anthony Scopatz's projects - that is a nice motivation to do this in SymPy already.