Contributing to SymPy Paper

220 views
Skip to first unread message

Ondřej Čertík

unread,
Apr 19, 2016, 6:53:13 PM4/19/16
to sympy
Hi,

I would like to invite anybody to contribute to our paper about SymPy
and become an author. We use the authorship criteria that are written
in our README:

https://github.com/sympy/sympy-paper/blob/2a93d84a6f3447f8e15e24f02cedb6c27c299abd/README.md#authorship-criteria

In other words, to satisfy 1), you must contribute to sympy in some
way (e.g. some good patch that is more than, say, fixing a typo in
documentation), to satisfy 2), get involved with the development of
the sympy-paper repository: https://github.com/sympy/sympy-paper,
submit a patch there, write a section, or just review PRs. Finally,
you must also be willing to satisfy 3) and 4). Hopefully this should
be pretty clear, but if you have any questions about authorship,
please let me or Aaron know.

Once this paper is accepted, we will probably put it into the SymPy's
README for people to cite, so I encourage everyone to get involved.

Ondrej

yueming liu

unread,
Apr 20, 2016, 2:29:29 PM4/20/16
to sympy
I am not sure if it is too late to contribute to SymPy and the paper. I've been developing a private project called sympy-llvm which uses just-in-time (JIT) compilation technique to compile SymPy expressions to native machine code in order to speedup the numerical evaluation of the expressions for numerical computation purpose. This is similar to the existing functions in SymPy like subs/evalf, lambdify, ufuncify and Theano (see http://docs.sympy.org/latest/modules/numeric-computation.html). The advantage of sympy-llvm is that it is faster than all the existing methods in the sense of compilation time and numerical evaluation time. Another advantage is that no FORTRAN or C/C++ source code generation involved.  Runnable machine code is generated in memory using LLVM (Attached figures show some comparison benchmarks. SMC_py stands for sympy-llvm implementation).  Example applications are implemented such as the numerical computation for modals in PyDy. I'd like to make sympy-llvm public and integrate into SymPy as an optional component for numerical computation if possible.

As you all may know that the projects like Google TensorFlow and Theano are both using symbolic-numerical way to provide human friendly language interface and fast numerical computation. The CASs projects developed a couple of decades ago like Maple, Maxima, Mathematica, Reduce etc (https://en.wikipedia.org/wiki/List_of_computer_algebra_systems) none of them have 'in-memory' JIT complication functions to bridge the gap between symbolic and numeric computations. I believe sympy-llvm as a component of SymPy will be a great enhancement of SymPy in numerical computation field.

-Yueming Liu
benchmark-data-taylor-TCC.png
benchmark-data-sqrt-TCC.png

Jason Moore

unread,
Apr 20, 2016, 4:43:11 PM4/20/16
to sy...@googlegroups.com
This sounds great. Note that we have recently merged some code that uses llvm to automatically JIT sympy expressions. Check out the master branch and search for the relevant pull requests. Maybe there is some overlap with your project.

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at https://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/af0be361-2f60-4c4e-878c-6a264289ba50%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jason Moore

unread,
Apr 20, 2016, 4:45:37 PM4/20/16
to sy...@googlegroups.com
Can you explain the graphs you attached. What do all the labels mean? Why would lambdify by faster than ufuncify?

Ondřej Čertík

unread,
Apr 20, 2016, 5:44:41 PM4/20/16
to sympy
Hi Yueming,
That sounds very interesting. Jason and I were in fact discussing how
to speedup the compilation phase when the expression is large, as it
can take a long time (like 1 day to compile a single file C file).
Using LLVM was one the ideas.

If you are interested in contributing this to SymPy, you can share the
code somewhere and we can have a look how to best integrate it.

Ondrej

yueming liu

unread,
Apr 20, 2016, 6:20:44 PM4/20/16
to sympy
Just made sympy-llvm public ( https://github.com/yuemingl/sympy-llvm ). The core is the file jit_compile.py. I think it is very simple to integrate into SymPy subject to some modifications. I think there are still some work to be done, for example the support of basic math functions is not complete but it is trivial to add them. The JIT part is based on LLVMPY (http://www.llvmpy.org/). I can migrate it to llvmlite since llvmpy is deprecated. I also looked the code that using llvmlite in SymPy. There are some overlap of the functions but my work provides more functions and very suitable for numerical computations like vectorized function (Compile one or a list of expressions and the list of expressions can be evaluated N times by passing array parameters in the call of the compiled function).

yueming liu

unread,
Apr 20, 2016, 6:30:53 PM4/20/16
to sympy
The expiation of most of the labels can be found in the paper 

For the SMC_xxx in the picture, they are the similar things but implemented in Python(SymPy+LLVMPY+LLVM) and C++ (GiNaC+LLVM). C++_O3 is the pure C++ implementation of the benchmark.

yueming liu

unread,
Apr 21, 2016, 1:22:13 PM4/21/16
to sympy
I looked at the PRs yesterday, there are some overlap in the basic functions. However, I have a longer road map about the JIT compilation. Let me briefly list out it, so you all may think about if it is worthy or suitable to put it in SymPy.
For the JIT compilation, it should provide the class or function interfaces that wrap the details of llvmlite, the following functions are included:

1. Compile a single SymPy expression, a number is returned as the result

2. Compile a list of SymPy expresions, an array is returned as the result (batch compile)

3. Compile a single SymPy expression, a array is returned when an array is passed into the compiled function (vectorization)
  - This is very useful since the over head of the function call can be reduced when you want to do many times evaluation against the compiled expressison

Advanced features:
4. The extension from 3, provide Cartisian product of parameters like the Cartisian product in SQL
       expr = x*y
       f = jit_compile([x,y], expr)
       f([2,3], [4,5,6]) # return [2*4, 2*5, 2*6, 3*4, 3*5, 3*6]

5. The extension from 4, vector symbol can be used instead of symbol of double number
  - This can be used in design algorithms which involve in a double loop between two list of vectors

Even longer road map:
6. The compiled IR can be send to a remote server and the evaluation can be performed on the server.

All the above has been implemented in my another project SymJava in the experimental branch dist-snc. Actually, SymJava is motivated by SymPy, so I believe is easy to implement the proposed features in SymPy.

-Yueming Liu

Aaron Meurer

unread,
Apr 21, 2016, 1:47:13 PM4/21/16
to sy...@googlegroups.com, Mark Dewing
I'm CCing Mark Dewing, who wrote the SymPy llvm PR.

Aaron Meurer
> https://groups.google.com/d/msgid/sympy/ffd26adc-c9f2-4c68-abf1-37febd8efe48%40googlegroups.com.

Mark Dewing

unread,
Apr 21, 2016, 5:26:38 PM4/21/16
to sympy
I think these are good directions. I also think they should be driven by strong examples (the PyDy example in your repo is good, I see the paper has some others), because that will affect the API.

For vectorized evaluation, array data can be stored in different ways, and this affects the interface and compilation.
The length/stride information can either be:
 1. not stored with the array (bare C arrays) - the length then either needs to be known at compile time or passed as a parameter to the function.
 2. stored in a structure with the data, like numpy arrays.  If the compiled code is called from Python, it is very likely using a numpy array for storage.  It would then be useful for the compiled code to operate on numpy arrays directly ( See an example here, where it can sum over a 1-d numpy array: https://github.com/markdewing/sympy/blob/llvmlite/sympy/printing/llvmjitcode.py )

In principle the autowrap features should provide equivalent functionality as using an LLVM JIT - the API should be made similar - I haven't gotten around to this yet.
(I've been focused on compiling callback routines - which requires precisely specifying the input argument types and their order - I'm not sure if autowrap can do this or not)

As for the items in your list - item 1 is currently implemented (though it may not be able to convert every Sympy element -  Piecewise is missing).  Item 2 should be handled by PR 10683, which was just merged.  Item 3 - this would be subject to issues with arrays I mentioned earlier, but would be useful. I have some code for array evaluation, but in the context of callback routines to various integrators (Cuba, Cubature) - I can put up that code if you want.

For item 4, do you have an example for how the Cartesian product would be used?

Item 6 seems very specialized, but if there are problems where the bottleneck is evaluating functions on large arrays, it could be worthwhile.

Mark

yueming liu

unread,
Apr 22, 2016, 1:57:35 PM4/22/16
to sympy
Hi Mark,
   I agree that they should be driven by strong examples. Actually, all the proposed interfaces are motivated by solving some problems that I faced in real applications (They are originally motivated by solving partial differential equations using finite element methods. Users can express PDEs symbolically and then they are solved by the solvers automatically). With the development of the code, I found that more and more applications can be solved in this symbolic-numeric way. Tensorflow and Theano are good examples. For SymPy, I think solving general math problems, algorithms in machine learning and scientific computing using symbolic-numeric way could be one of the long term goals. Item 4 as you mentioned is originally designed to implement K-mean algorithm. The Cartesian product can be used to compute the distance between all the centers and vectors in each iteration.
   About the vectorization, my implementation is base on the first way as you mentioned (bare C array). The length is determined at the compile time. The support of numpy arrays is a really good idea and I think it should be an import feature. 
   I am not clear about the autowrap feature you mentioned. Do you mean wrap the API int LLVM C interface? The LLVMPY project is different from llvmlite. LLVNPY wraps almost all the functions in LLVM C/C++ interface. My implementation is based on LLVMPY. I looked at the llvmlite yesterday. It provides a quite different way for JIT. Basically, it doesn't wrap the LLVM C/C++ interfaces, instead, it provides functions to generated the Intermediate Representation (IR) in string format, then the LLVM compiler is call to generate the machine code. I doubt that this way may be slower than the way in LLVMPY since there is no parsing phase in LLVMPY. However, the LLVM project itself is a fast moving project, the interfaces keep changing. It is hard to maintain the corresponding Python interface and this is also why llvmlite is designed in its way.
    By the way, I'd like to see your code for array evaluation in the callback way.

-Yueming Liu

Mark Dewing

unread,
Apr 23, 2016, 1:12:58 AM4/23/16
to sympy
The autowrap feature I referred to is in sympy/utilities/autowrap.py, which contains ufuncify and a few other functions.

(I should clean it up and submit it.)  If you want to make it work on a numpy array (like ufuncify), that would be useful.

Mark
Reply all
Reply to author
Forward
0 new messages