What is out there for SymPy code generation / optimizing compiler effort?

813 views
Skip to first unread message

Anthony Scopatz

unread,
Oct 30, 2015, 5:24:06 PM10/30/15
to sy...@googlegroups.com
Hello All, 

As many of you probably know, earlier this month Aaron joined my research group at the University of South Carolina. He'll be working on adding / improving SymPy's capabilities with respect to being an optimizing compiler. 

There are more details about this vision below, but right now we are in the process of doing a literature review of sorts, and trying to figure out what (SymPy-specific) is out there. What has been done already. Aaron et al, have started putting together a page on the wiki that compiles some of this information. We'd really appreciate it if you know of anything that is not on this page if you could let us know.

We also would be grateful if you could let us know (publicly or privately) about any use cases that you might have for a symbolic optimizing compiler. There are many examples where different folks have done various pieces of this (chemreac, dengo, pydy, some stuff in pyne), but these examples tend to be domain specific. This effort is supposed to target a general scientific computing audience, and to do that we want to have as many possible scenarios in mind at the outset.  

 And of course, we'd love it if other folks dived in and helped us put this thing together :).

Thanks a million!
Be Well
Anthony

Vision
------------
Essentially, what we want to build is an optimizing compiler for symbolic mathematical expressions in order to solve simple equations, ODEs, PDEs, and perhaps more. This compiler should be able to produce very fast code, though the compiler itself may be expensive.

Ultimately, it is easy to imagine a number of backend targets, such as C, Fortran, LLVM IR, Cython, pure Python, etc. It is also easy to imagine a couple of meaningful frontends - SymPy objects (for starters) and LaTeX (which could then be parsed into SymPy). 

We are aiming to have an optimization pipeline that is highly customizable (but with sensible defaults). This would allow folks to tailor the result to their problem or add their own problem-specific optimizations. There are likely different levels to this (such as on an expression vs at full function scope). Some initial elements of this pipeline might include CSE, simple rule-based rewriting (like a/b/c -> a/(b*c) or a*exp(b*x) -> A*2^(B*x)), and replacing non-analytic sub-expressions with approximate expansions (taylor, pade, chebychev, etc) out to an order computed based on floating point precision. 

That said, we aren't the only ones thinking in this area. The chemora (http://arxiv.org/pdf/1410.1764.pdf, h/t Matt Turk) code does something like the vision above but using Mathematica, for HPC applications only, and with an astrophysical bent. 

I think a tool like this is important because it allows the exploration of more scientific models more quickly and with a higher degree of verification. The current workflow for most scientific modeling is to come up with a mathematical representation of the problem, a human then translates that into a programming language of choice, they may or may not test this translation, and then execution of that model. This compiler aims to get rid of the time-constrained human in those middle steps.  It won't tell you if the model is right or not, but you'll sure be able to pump out a whole lot more models :).


--

Asst. Prof. Anthony Scopatz
Nuclear Engineering Program
Mechanical Engineering Dept.
University of South Carolina
sco...@cec.sc.edu
Office: (803) 777-7629
Cell: (512) 827-8239
Check my calendar

Aaron Meurer

unread,
Oct 30, 2015, 5:33:19 PM10/30/15
to sy...@googlegroups.com
I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better?

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/CAPk-6T453AxDYt1UCmBj_7vrzr_HikC2U03UP%2Bzz5_RtDA9NDA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Denis Akhiyarov

unread,
Nov 2, 2015, 12:05:35 AM11/2/15
to sympy
FENICS includes some sort of Python to c++ compiler for PDE and FE problems.

Pyomo can convert optimization problems written in Python to AML modeling language.

Fast matrix exponentiation:

https://github.com/borzunov/cpmoptimize

Tim Lahey

unread,
Nov 2, 2015, 12:24:33 AM11/2/15
to SymPy
I’d really like something along the lines of Allan Wittkopf’s work on code generation for ODEs in Maple. The paper is,

http://www.jnaiam.org/new/uploads/files/16985fffb53018456cf3506db1c5e42b.pdf

It automatically generates code, compiles it and uses it inside Maple. There’s a Maple implementation up at,

http://www.cecm.sfu.ca/~wittkopf/ToExternal.html

and there’s also,

http://www.cecm.sfu.ca/~wittkopf/dna.html

Basically, it’s for numerical solution of ODEs. The code compiled is just the evaluation of the ODEs which is then used in a normal ODE solver, but when the solver evaluates the ODEs, it’s using the compiled code.

I believe this is now a option for the Maple numerical ODE solver. I first used this back in Maple 8 and 9 when it wasn’t. It’s very useful for physics problems.

Cheers,

Tim.

Björn Dahlgren

unread,
Nov 2, 2015, 3:13:15 PM11/2/15
to sympy


On Friday, 30 October 2015 22:33:19 UTC+1, Aaron Meurer wrote:
I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better?


I took the liberty to answer some questions inline in the wiki.

The work that I've done with codegeneration using SymPy was motivated by either:
  • Solving systems of nonlinear equations numerically
  • Integrating (nonlinear) systems of ordinary differential equations

Since a picture is worth a thousand words I tried to summarize my efforts here (http://hera.physchem.kth.se/~bjorn/overview.png):


The repos are at https://github/bjodah


I've tried to summarize some of my more general experince here:

  • Using templates is almost a must, and then rather a powerful templating engine (e.g. Mako) rather than jinja2 or the like.

    That said, I think the most common idioms that keep reoccurring should definitely be collected in a "template" library along with convenience functions to render those

    from SymPy expressions (e.g. code that populates a jacobian matrix - possibly banded).

  • Code generation can quite easily impede your speed of development.
    Example: you generate a Cython file, for every change both the Cython file and the resulting C file have to be recompiled.
    I tried to get distutils to only recompile what had changed and cache object files, I couldn't get it to work so I wrote pycompilation to do that instead.
    Another solution would be to JIT code but then it is much harder to adapt your code for various libraries used in HPC environments for example.
  • Even if one use code-generation it is nice to be able to use `lambdify` so if the API can be made agnostic of that choice it's a great bonus.
    See also our recent effort in implementing a faster lambdify in symengine: https://github.com/symengine/symengine.py/pull/11
    (I believe we can do better still, perhaps using LLVM jiting for medium sized expressions)
  • `Indexed` objects in sympy have been quite useful for representing discretized data points, but it could definitely be improved further.

I was quite surprised to hear that compiler CSE elimination was so slow in one of Anthony Scopatz's projects - that is a nice motivation to do this in SymPy already.



Aaron Meurer

unread,
Nov 2, 2015, 3:46:56 PM11/2/15
to sy...@googlegroups.com
On Mon, Nov 2, 2015 at 2:13 PM, Björn Dahlgren <bjo...@gmail.com> wrote:


On Friday, 30 October 2015 22:33:19 UTC+1, Aaron Meurer wrote:
I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better?


I took the liberty to answer some questions inline in the wiki.

Thanks.
 

The work that I've done with codegeneration using SymPy was motivated by either:
  • Solving systems of nonlinear equations numerically
  • Integrating (nonlinear) systems of ordinary differential equations

Since a picture is worth a thousand words I tried to summarize my efforts here (http://hera.physchem.kth.se/~bjorn/overview.png):


The repos are at https://github/bjodah

So are the items in the top image that aren't in the bottom still supported? 

I didn't see symvarsub. It looks like it is very similar to autowrap. Any reason it shouldn't be in SymPy itself? 


I've tried to summarize some of my more general experince here:

  • Using templates is almost a must, and then rather a powerful templating engine (e.g. Mako) rather than jinja2 or the like.

    That said, I think the most common idioms that keep reoccurring should definitely be collected in a "template" library along with convenience functions to render those

    from SymPy expressions (e.g. code that populates a jacobian matrix - possibly banded).

  • Code generation can quite easily impede your speed of development.
    Example: you generate a Cython file, for every change both the Cython file and the resulting C file have to be recompiled.
    I tried to get distutils to only recompile what had changed and cache object files, I couldn't get it to work so I wrote pycompilation to do that instead.
    Another solution would be to JIT code but then it is much harder to adapt your code for various libraries used in HPC environments for example.
  • Even if one use code-generation it is nice to be able to use `lambdify` so if the API can be made agnostic of that choice it's a great bonus.
    See also our recent effort in implementing a faster lambdify in symengine: https://github.com/symengine/symengine.py/pull/11
    (I believe we can do better still, perhaps using LLVM jiting for medium sized expressions)

In my view, lambdify *is* code generation.  lambdarepr generates Python/NumPy code from a SymPy expression, and lambdify "compiles" that code into a Python function (it also uses some namespace tricks on top of lambdarepr, which admittedly isn't the cleanest design). 

I don't know what lambdify means in terms of symengine. Is it the same thing? 

For LLVM, you can wrap your lambdified function with numba.jit or numba.vectorize. In my tests, that works quite well (in a very simple test that I did it ended up being just as fast as autowrap via f2py).

Aaron Meurer
  • `Indexed` objects in sympy have been quite useful for representing discretized data points, but it could definitely be improved further.

I was quite surprised to hear that compiler CSE elimination was so slow in one of Anthony Scopatz's projects - that is a nice motivation to do this in SymPy already.



--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.

Björn Dahlgren

unread,
Nov 3, 2015, 4:13:44 AM11/3/15
to sympy


On Monday, 2 November 2015 21:46:56 UTC+1, Aaron Meurer wrote:


On Mon, Nov 2, 2015 at 2:13 PM, Björn Dahlgren <bjo...@gmail.com> wrote:


On Friday, 30 October 2015 22:33:19 UTC+1, Aaron Meurer wrote:
I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better?


I took the liberty to answer some questions inline in the wiki.

Thanks.
 

The work that I've done with codegeneration using SymPy was motivated by either:
  • Solving systems of nonlinear equations numerically
  • Integrating (nonlinear) systems of ordinary differential equations

Since a picture is worth a thousand words I tried to summarize my efforts here (http://hera.physchem.kth.se/~bjorn/overview.png):


The repos are at https://github/bjodah

So are the items in the top image that aren't in the bottom still supported? 

There should only have been one image, google-groups messed it up. The linked image (http://hera.physchem.kth.se/~bjorn/overview.png) is the correct one.
I'd like to support all which is there but I'm spread thin, and the earlier efforts should most probably be refactored (I've simply learnt things along the way).
 

I didn't see symvarsub. It looks like it is very similar to autowrap. Any reason it shouldn't be in SymPy itself? 

Well, since it depends on pycompilation and pycodeexport it would introduce dependencies unless they were incorporated as well.
And I suppose that incorporating pycompilation/pycodeexport would require testing on OS X and windows which I currently lack access to/cannot motivate spending the time myself.

 


I've tried to summarize some of my more general experince here:

  • Using templates is almost a must, and then rather a powerful templating engine (e.g. Mako) rather than jinja2 or the like.

    That said, I think the most common idioms that keep reoccurring should definitely be collected in a "template" library along with convenience functions to render those

    from SymPy expressions (e.g. code that populates a jacobian matrix - possibly banded).

  • Code generation can quite easily impede your speed of development.
    Example: you generate a Cython file, for every change both the Cython file and the resulting C file have to be recompiled.
    I tried to get distutils to only recompile what had changed and cache object files, I couldn't get it to work so I wrote pycompilation to do that instead.
    Another solution would be to JIT code but then it is much harder to adapt your code for various libraries used in HPC environments for example.
  • Even if one use code-generation it is nice to be able to use `lambdify` so if the API can be made agnostic of that choice it's a great bonus.
    See also our recent effort in implementing a faster lambdify in symengine: https://github.com/symengine/symengine.py/pull/11
    (I believe we can do better still, perhaps using LLVM jiting for medium sized expressions)

In my view, lambdify *is* code generation.  lambdarepr generates Python/NumPy code from a SymPy expression, and lambdify "compiles" that code into a Python function (it also uses some namespace tricks on top of lambdarepr, which admittedly isn't the cleanest design). 

Yes, in SymPy it definitely is. I am quite surprised how fast the lambdified expressions are (not construction though).


I don't know what lambdify means in terms of symengine. Is it the same thing? 

In symengine Isuru Fernando implemented a visitor using C++ lambda functions so there is no code generation there.
 

For LLVM, you can wrap your lambdified function with numba.jit or numba.vectorize. In my tests, that works quite well (in a very simple test that I did it ended up being just as fast as autowrap via f2py).

Yes, I am planning to do some real world benchmarks (for problems I've come across in my research) where I would compare sympy.lambdify, symengine.Lambdify and numba.jit(sympy.lambdify(...)), etc.
 

Nathan Woods

unread,
Nov 3, 2015, 10:46:39 AM11/3/15
to sympy
My use case for Sympy code generation is somewhat different from most, it seems. I'm primarily interested in numerical quadrature on a grid, which means a LOT of evaluations of similar functions. I'm only using lambdify currently, because the compile-time associated with codegen is worse than the Python call-back overhead. The real gotcha for me is that I need to do a complete Solve-Generate-Integrate loop many, many times, so the performance of the generation step matters a lot.

Nathan Woods

Björn Dahlgren

unread,
Nov 3, 2015, 11:56:43 AM11/3/15
to sympy


On Tuesday, 3 November 2015 16:46:39 UTC+1, Nathan Woods wrote:
My use case for Sympy code generation is somewhat different from most, it seems. I'm primarily interested in numerical quadrature on a grid, which means a LOT of evaluations of similar functions. I'm only using lambdify currently, because the compile-time associated with codegen is worse than the Python call-back overhead. The real gotcha for me is that I need to do a complete Solve-Generate-Integrate loop many, many times, so the performance of the generation step matters a lot.

Interesting. If you are only doing numerical (double precision) evaluations and are only using the lambdified expressions once, symengine should
be quite a bit faster. I wrote this shim which makes symengine.Lambdify work like sympy.lambdify:

https://gist.github.com/anonymous/7c6d7a5e80e1e5f6b39e

You can try to see if you see a speed up.
 

Denis Akhiyarov

unread,
Nov 3, 2015, 2:20:31 PM11/3/15
to sympy
my use case for sympy expressions is like this: I have some "arbitrary" correlations/metamodels as input  from text file to the sympy parser. These expressions are converted into functions using lambdify, and the backend is numpy only.

These definitions are then passed to scipy optimizer to perform non-linear regression on these equations with some "experimental" data to calculate the best estimate for tuning parameters.

This works pretty well for me currently and I'm interested in trying out the common correlations from zunzun library for best metamodels. Also ND chebyshev polynomials look promising.

Richard Brown

unread,
Nov 3, 2015, 9:41:52 PM11/3/15
to sympy
On Saturday, 31 October 2015 10:33:19 UTC+13, Aaron Meurer wrote:
I would also love to hear from those of you who are using SymPy to do
code generation or would like to use SymPy to do code generation, what
is your wishlist for SymPy? What do you wish it could do that it can't
do or what do you wish it could do better? 

I only saw this thread after my previous post. My research group uses MATLAB and C versions of the same ODE model to run on different platforms (MATLAB for testing a single component, C for running large numbers in parallel in an HPC environment). Because the models are pretty complex in terms of number of equations, variables and parameters, maintaining two versions of the code without making mistakes was a massive nightmare. I wrote a piece of software (not publicly available just yet, but will be soon) to allow the model to be specified once in ini files, and use Mako templates to create C and MATLAB versions of the model that can be interfaced to solvers. Sympy does the language translation of individual expressions.

At the moment I'm grappling with how to customise the names of symbols - e.g. if an 'x' is picked up from the ini file, how can I append a prefix to it when it is output from ccode or octave_code. See my post from today: https://groups.google.com/forum/#!topic/sympy/xCSQChSpcb8

cheers

Richard

Nathan Woods

unread,
Nov 10, 2015, 3:40:21 PM11/10/15
to sympy
Am I correct in noticing that symengine.py does not yet have Solve (or equivalent)? That's kind of a dealbreaker, unfortunately.

Ondřej Čertík

unread,
Nov 10, 2015, 5:33:23 PM11/10/15
to sympy
Hi Nate,

You can use sympy solve (i.e. by passing it symengine expressions it
will convert it to sympy expressions, and then you can convert back).
If you want to use a C++ solver due to speed reasons, then currently
we only have linear solvers implemented here:

https://github.com/symengine/symengine/blob/e5660d01b62e9f68e2e865dae83d8a453a5f2bdb/symengine/matrix.h#L58

You can use them from Python using the solve method:

https://github.com/symengine/symengine.py/blob/83c1a70cd2516a597a7a8a12ef0c0ce21ed825de/symengine/lib/symengine_wrapper.pyx#L1199

Let us know if you need faster non-linear solvers than the ones in
sympy, and which ones.

Ondrej
> https://groups.google.com/d/msgid/sympy/2b52bf0f-4396-4c7a-9dbc-372fc7666a54%40googlegroups.com.

Nathan Woods

unread,
Nov 10, 2015, 5:53:46 PM11/10/15
to sympy
I'm solving a number of different equations, but one example I've used (using sympy syntax) might be:

sympy.solve(x**2+y**2+z**2-1, x)

I'm definitely hoping for the speed improvements; that's the main reason not to simply use Sympy.

Matthew Rocklin

unread,
Nov 11, 2015, 3:40:54 PM11/11/15
to sy...@googlegroups.com
what is your wishlist for SymPy? 

ND-Arrays

Francesco Bonazzi

unread,
Nov 12, 2015, 4:06:20 AM11/12/15
to sympy


On Wednesday, 11 November 2015 21:40:54 UTC+1, Matthew wrote:
what is your wishlist for SymPy? 

ND-Arrays

nikolas

unread,
Dec 1, 2015, 1:30:45 PM12/1/15
to sympy
Hi all,
this is an exciting topic, I hope it's still alive!

A common use case for me is to automatically generate high dimensional symbolic ODEs (photonic circuit non-linear coupled mode equations).

One thing I have found is that lambdify does not easily allow for efficiently re-using common subexpressions. I have cooked up a fairly dirty (though surprisingly useful) little function that identifies common subexpressions and then uses nested sympy.lambdify calls (along with function closures) to precompute and then re-use common subexpressions. 
For my use cases it easily gives my a 2x or 3x speedup despite the additional function calling overhead.
I think Mathematica's Compile does similar things under the hood.

I think it would be most desirable if we could have methods to generate fast functions that operate in-place on numpy arrays (and perhaps even take an additional numpy "work" array to save on allocation inside the function).
Ideally, this would be based on c-function pointers that do not require the GIL and can also be passed to other compiled libraries.

Best,

Nik

Denis Akhiyarov

unread,
Dec 1, 2015, 1:43:09 PM12/1/15
to sympy
apparently someone tried using sympy as the backend for modelica compiler/parser:

Aaron Meurer

unread,
Dec 1, 2015, 3:17:41 PM12/1/15
to sy...@googlegroups.com
On Tue, Dec 1, 2015 at 12:30 PM, nikolas <nikola...@gmail.com> wrote:
> Hi all,
> this is an exciting topic, I hope it's still alive!

Definitely is. I'm actively working on this project.

>
> A common use case for me is to automatically generate high dimensional
> symbolic ODEs (photonic circuit non-linear coupled mode equations).
>
> One thing I have found is that lambdify does not easily allow for
> efficiently re-using common subexpressions. I have cooked up a fairly dirty
> (though surprisingly useful) little function that identifies common
> subexpressions and then uses nested sympy.lambdify calls (along with
> function closures) to precompute and then re-use common subexpressions.
> You can find it here: https://gist.github.com/ntezak/e1922acdd790e265963e
> For my use cases it easily gives my a 2x or 3x speedup despite the
> additional function calling overhead.
> I think Mathematica's Compile does similar things under the hood.

That's a good point. Lambdify does one thing quite well, but I think
most people are confused as to what exactly it does and what it should
and shouldn't do. That and it sort of sits apart from the rest of the
code generation in terms of architecture (and indeed it takes a bit of
insight to realize that lambdify is in fact just another type of code
generation + a "compilation" stage, and nothing more).

Instead of going for a functional approach to solve this, with lots of
lambda calculus and currying and whatnot (which in my opinion gets
confusing really fast), I think it would be better to extend lambdify
to print not lambda functions but just regular Python functions. Then
something like lambdify(x, sin(x) + cos(x) + (sin(x) + cos(x))**2,
cse=True) could produce

def _func(x):
_y = sin(x) + cos(x)
return _y + y**2

(by the way, it's generally a good idea to use cse() to take care of
common subexpressions, as it's going to be a little bit smarter than
your walk_exprs).

Something that I've been playing with a little bit is a CodeBlock
object, which would make it easier for the code printers to work with
and reason about blocks of code, rather than just single expressions
and assignments. That way you won't have to move up an abstraction
layer just to start thinking about common subexpression elimination.

>
> I think it would be most desirable if we could have methods to generate fast
> functions that operate in-place on numpy arrays (and perhaps even take an
> additional numpy "work" array to save on allocation inside the function).
> Ideally, this would be based on c-function pointers that do not require the
> GIL and can also be passed to other compiled libraries.

It's all a question of how to represent this in SymPy syntax. We have
Indexed and MatrixSlice, which are generally used to represent
in-place operations for code generation. I suppose to support what
you want we would need a more general NumPySlice object. Or would
Indexed satisfy your needs.

Once you have that, it's just a question of generating code for it.
SymPy already has a few ways of generating code to work on NumPy
arrays, such as autowrap, ufuncify, and lambdify (and you can use
numba.jit or numpy.vectorize on a lambdified function to make it
faster or even numba.jit(nogil=True) to make it run without the GIL).

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/b8663373-af4a-499e-bca9-e96a6433a6de%40googlegroups.com.

Liang

unread,
Feb 26, 2016, 7:59:56 PM2/26/16
to sympy
This is a great project.

Is there a plan to incorporate CSE in this tool? For my application, the equation is extremely long, typically containing several thousands of terms. The Fortran codes generated by codegen exceed the continuation line limit of Intel Fortran compiler and have to be manually edited. Many of the terms on the right-hand side can be combined together through CSE and the post-CSE formula can be easily fitted under the continuation line limit.

Jason Moore

unread,
Feb 27, 2016, 10:43:53 AM2/27/16
to sy...@googlegroups.com
Yes CSE is planned to be incorporated into the code generation. For now you will need to manually use CSE and then pass that info to the code printers and or code gen.

We did fix a line limit bug in the Fortran code: https://github.com/sympy/sympy/pull/7968 but it looks like it may only be for the call signature.

Aaron Meurer

unread,
Mar 1, 2016, 11:34:48 AM3/1/16
to sy...@googlegroups.com
Yes, CSE is an important optimization. Another useful optimization for
your case would be a function that automatically splits up larger
expressions (like splitting a large sum into many += assignments).

Aaron Meurer
> Visit this group at https://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/9198f1ea-b0e8-4f54-b559-8d2e6cedae41%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages