I put a print statement at the top of Expr.expand and another at the
end where it returns, and based on the output for some test
expressions, the issue is not so much that it is slow but that it is
called a lot of times. I included a simple counter, and it is called
thousands of times. And note that the function is cached, so the print
statements are only called once per identical input.
I then added "print self" to the output, and found a very startling
thing. Almost all of the inputs were already expanded. So I then used
a keyword argument to create a global variable to count the number of
times that expand has been called, and the number of times that the
input is different from the output. I then ran the tests on several
submodules
The results (number of times input == output / number of times
expand() was called):
solvers: 11487/13341 (86.1%)
integrals: 18952/21791 (86.97%)
physics: 6657/8394 (79.3%)
matrices: 418/500 (83.6%)
core: 1786/2109 (84.68%)
polys: 2619/2822 (92.8%)
and keep in mind that this is with the cache *on*. So this is only
counting globally unique inputs to expand() for each submodule.
So for these three modules, only about 15% of the time is expand doing
anything useful. It would seem that it's not so much expand((x + y +
z)**20) that we need to make fast, but rather just expand(x + y + z).
Some ideas:
- We should be more careful when creating Poly to use expand=False
whenever we know for sure that the input will never need to be
expanded. I got a lot of mileage out of this trick in the risch code.
- expand currently works by recursively calling _eval_expand_hint on
an expression. So if you have something like 1 + 2*log(x*y), it calls
(1 + 2*log(x*y))._eval_expand_log(), then
Integer(1)._eval_expand_log(), then (2*log(x*y))._eval_expand_log(),
and finally log(x*y)._eval_expand_log(). But it is only this final
form that actually does anything. This is a lot of recursive function
calls, which can be slow in Python. We should think of ways to reduce
this, while still keeping the API open (and backwards compatible if
possible).
What it *really* should do in this case is something like
expr.xreplace(dict([(l, l._eval_expand_log()) for l in
expr.atoms(log)])). Except according to the API, expr.atoms(log)
should really be replaced with something that gets all objects with
_eval_expand_log defined.
- For expand_mul and expand_multinomial, the real work horses of
expand(), we should add a fast check if the expression is a
polynomial. This would break the API as it currently sits, though. I
guess we need sort of the reverse of the idea from
https://code.google.com/p/sympy/issues/detail?id=3338.
- I wonder if it would be possible to do a sort of "structural"
caching. How easy is it to say, "is expression 1 the same as
expression 2, except up to a permutation of the Symbols?" (and
actually give the permutation), and most importantly, determine that
efficiently? How complicated would an expression have to be for it to
be faster to apply a permutation of Symbols against a cached result
instead of just doing things the way we do now? Even if it's slower, I
guess it would be much more efficient memory-wise for the cache.
- How can we use Poly to make expand() faster? This is a question that
I hope Mateusz has an answer to.
By the way, does anyone know of a more efficient way to perform the
above profile? I know that cProfile can count the number of times that
a function is called, but is there any way to extend it to check input
== output for each call, and count the results? My simple method made
things run very slowly.
I'm also curious what profile methods you use, Mateusz? Anything
beyond cProfile and kernprof (and timeit)?
Aaron Meurer