first a very good news --- thanks to Kirill, the latest sympy-hg got
again about 10% faster in the last 3 days, so on the benchmark below,
we are now only about 4x slower than sympycore (if you remember, it
used to be couple hundreds times slower half a year ago).
I was curious how fast it would be if we rewrote SymPy directly in C,
using the current design (Add/Mul/Pow classes). I haven't done any
coding in C for couple years, so I had to relearn basically
everything, but I managed to do a very quick draft in the last two
days, you can see for yourself here:
$ hg clone http://freehg.sympy.org/u/certik/csympy/
$ cd csympy
$ make
$ ./t.py
[...]
(1 + y + z + x)^10
# of terms: 1771
time spent doing e+e2: 0.0344660282135
time doing multinomial_coefficients: 0.0283851623535
total time: 0.0746591091156
The above "total time" is is the result of this benchmark in sympy:
In [1]: e = (x+y+z+1)**10
In [2]: %time f = (e**2).expand()+e.expand()
CPU times: user 0.47 s, sys: 0.00 s, total: 0.47 s
Wall time: 0.47 s
csympy:
In [4]: %time f = (e**2).expand()+e.expand()
CPU times: user 0.13 s, sys: 0.00 s, total: 0.13 s
Wall time: 0.13 s
sympycore:
In [1]: from sympycore import var
In [2]: var("x y z")
Out[2]: (Calculus('x'), Calculus('y'), Calculus('z'))
In [3]: e = (x+y+z+1)**10
In [4]: %time f = (e**2).expand()+e.expand()
CPU times: user 0.12 s, sys: 0.00 s, total: 0.12 s
Wall time: 0.12 s
sympycore with a compiled C extension:
In [4]: %time f = (e**2).expand()+e.expand()
CPU times: user 0.09 s, sys: 0.00 s, total: 0.09 s
Wall time: 0.09 s
ginac (the timing varies from 0.4s to 0.6s):
#include "ginac.h"
using namespace std;
using namespace GiNaC;
int main()
{
symbol x("x"), y("y"), z("z");
ex e = pow(x+y+z+1, 10);
ex e2 = pow(e, 2);
ex f = e2.expand()+e.expand();
//cout << f << endl;
return 0;
}
$ time ./test
real 0m0.052s
user 0m0.048s
sys 0m0.004s
The problem with the above timings is that the timings in ipython are
much (50%) higher compared to a measurement in the python script using
timeit.default_timer(), but I don't have time to dig into this. So you
can see that on this particular benchmark, sympy is less than 4x
slower than sympycore, csympy is somewhere between sympycore and
sympycore+C, while ginac is still a little faster, but quite
negligibly on this particular benchmark.
Technical remarks:
* csympy is written in C using structs and functions pointers that
implements the OO style and the sympy design. Everything is in the
basic.c file, you can find there add, mul, pow, symbol and integer
structs and functions (methods) operating on it. Just like sympy, only
in C. Look here for a simple example how it works:
http://wiki.sympy.org/wiki/Playing_with_SymPy_design_in_C
Important point is that you don't need to use some other class scheme
like in sympycore in order to achieve competitive speeds.
* it is exported to Python using Cython, so you can use it just like sympy
* for the above expansion, it uses the multinomial_coefficients from
sympy, that is written in pure python (the C code uses Cython to call
it). Possibly implementing this function in C too, one could get a
little more speedup
* csympy uses a HashTable from glib in Add and Mul classes. The table
seems to preserve the order, so no sorting is necessary. We may look
at how it is implemented internally and maybe exploit things even more
for our needs
* I haven't done any real profiling of the code, so I suspect it could
be made even faster
* it uses reference counting, but currently it is a little buggy, so
the freeing of objects is disabled to prevent segfaults. You can
easily enable it (just uncomment the free(self) line in the
basic__dealloc__ method), the "t.py" script runs with no noticeable
slowdown with it. The reference counting needs to be fixed.
* currently only C ints are used, which means that for the above
benchmark, the ints wrap around to negative values. I haven't
implemented Python ints in csympy yet.
What to do next:
* sympy will remain pure python by default
* we'll continue speeding things up using our current approach, plus
code some parts in Cython (plus maybe C), those will be optional
modules for people that need better speed and don't mind compiling
stuff.
* I may play with csympy a little more to fix the reference counting
and then figure out how to hook up sympy (python) objects in it. I
would like to get as fast as ginac with this design. Actually faster.
:) Basically what is needed are just __hash__ and __eq__ methods, then
it should work fine. Then one could easily extend the basic core which
is in C using pure Python classes. However, this approach is only a
playground to get the perspective of how fast one can get with this
design. This is because this kind of basic manipulation is not very
useful without assumptions and many other simplifications that we do
in SymPy.
* using our experience with csympy, we'll try to speedup our
Add/Mul/Pow classes and also our Poly class (which should be
optionally coded in Cython).
Anyway, I had this design in C in my head for a long time, so I am
glad I managed to put this into code and I am surprised that I got
almost as fast as ginac in just 2 days. I won't have time to work on
it much in the nearest time frame, but if anyone is interested, let me
know what you think.
Ondrej
Just to clarify on what this "10% faster" means:
e = (x+y+z+1)**32
%time %time
integrate(x**3*sin(x), x) q = e.expand()
cache cache
off on off on
old: 7.55 s 4.02 s 18.8 s 14.08 s
new: 5.74 s 3.77 s 6.07 s 5.73 s
speedup: 31% 6.6% 3.1x 2.5x
old = a94579fb550a
new = 8a31d8eebe32
I think we'll converge to Cython which is the way to go.
--
Всего хорошего, Кирилл.
Exactly. Thanks for the clarification.
>> Anyway, I had this design in C in my head for a long time, so I am
>> glad I managed to put this into code and I am surprised that I got
>> almost as fast as ginac in just 2 days. I won't have time to work on
>> it much in the nearest time frame, but if anyone is interested, let me
>> know what you think.
>
> I think we'll converge to Cython which is the way to go.
Yes. I tried to fix the reference counting (which I need to do by hand
in C) and it was a night mare. I gave up, this is not a maintainable
solution. So I wrote the very simple sympy core in python here:
http://freehg.sympy.org/u/certik/csympy2/
and I ported the nice dictionary from glib using Cython. The plan now
is to make it calculate the expansion benchmark above correctly (it
almost does) and then use Cython to convert all the classes to
extension classes (basically just C struct) and optimize things to be
as fast as my C solution, that I abandoned.
This is done with the aim of merging with sympy immediatelly once we
get a fast/maintainable solution. (of course the cython version will
only be optional).
The aim is to get as fast as maple and mathematica or ginac. And then
port it to sympy, thus making sympy as fast too.
Ondrej