Thanks for all your comments, it's very helpful to get so thorough feedback.
On 05/03/2010 06:21 PM, Ronan Lamy wrote:
> Le dimanche 02 mai 2010 à 09:46 -0700, basti a écrit :
>
>> The last few days I spent quite some time on understanding the pattern
>> matching and substitution logic in sympy and trying out ideas to
>> improve them. Now I feel able and willing to redesign most of the
>> stuff and will in the following give an overview about my plans.
>>
>>
> Interesting read, which means I have a lot of comments. I don't agree
> with everything you've written, but I hope we can integrate something
> like this in the master docs.
>
> Matching
> --------
>
> The way you handle functions seems inconsistent, which is probably
> caused by the pervasive confusion in the current code between functions
> (e.g. sin) and their results (e.g. sin(x)). Specifically, here are some
> results I'd like to see:
> match(x, f) -> {f: x} (because Symbols are callable)
>
Yes in the moment symbols are python callables but does this really make
them functions? What's the sense of having both if they describe
mathematically the same? So I disagree with this.
> match(sin(x), f) -> None (because sin(x) is not a function)
>
You are right - this makes more sense.
> match(sin(x), f(u)) -> {f: sin, u: x}
>
Of course ...
> You don't tackle the major cause of unreliability in matching which is
> finding all possible matches and ranking them. This is required not only
> to get useful results (match(1+x**2, u+x**v) -> {u: x**2, v:0} is
> correct but probably not what was expected) but also to get merely
> correct ones when there are recursive calls to match.
>
Yeah the implementation section is by far not completed. I've already
more details written on paper which I'll add as soon as I find time.
But to your concerns: I think it's not necessary to first find all
possible solutions and then afterward rank them. It is possible to try
the different combinations in such an order that as you say the
"highest" ranked are tried first and if something matches it can be
returned immediately.
> but also to get merely
> correct ones when there are recursive calls to match.
Sorry I don't understand what you mean by this. Can you maybe give an example?
> The Hints idea suffers from combinatorial explosion.
Why combinatorial explosion? The implementation complexity will grow
linearly with the amount of hints as long as the extended matching rules
don't have to be permutated (for all cases I considered until now there
seemed to be only one sane ordering). Or do you mean something else?
> You've only
> mentioned Add, Mul and Pow, what about And, Or, Xor, Equivalent? What
> about more complex objects like Integral or Sum?
>
There will be a fallback matching function that will be called if two
expressions are of the same type. They match only if all it's args are
matching. Of course And, Or, Xor are special in that sense that they are
symmetric functions. They can be handled like Sum. Maybe there should be
an attribute is_symmetric or something similar so that this can handled
more general?
> How can this be
> extended for user-defined classes?
I guess similarly to the printing case. But you are right this is maybe
the strongest point of having match and subs directly implemented as
methods all around sympy.
> Substitution
> ------------
>
> I think that structural substitution should be a separate function, not
> just a subroutine handling a special case in subs. This would be useful
> for common subexpression elimination, at least.
>
What do you understand under structural substitution? Do you mean what I
called atomic substitution?
> General remarks
> ---------------
>
> You should take smaller steps but set more ambitious goals. Your design
> isn't much different from the current one, so it'll suffer from the same
> kind of problems. Try to forget the existing implementation and imagine
> what the API should be and what results you expect.
>
But I think the api is basically fine as it is now. The reason I started
doing this is simply that I *didn't* get the results I was expecting.
Therefore I wrote down the matching rules to clearly state what my
expectations are and to find a common consent.
> Don't base any work on commits that break tests. Good commits are easy
> to understand, and therefore have a limited scope, and do not break any
> tests. To merge the implementation of two functions, identify invariants
> that link them, correct the situations where they break down, and then
> factor out the duplications. Rinse and repeat as necessary. It might
> seem more fastidious than doing the change you have in mind right away
> and play whack-a-mole with the bugs that crop up, but it's usually
> faster and easier that way.
>
The implementation of subs and match by itself will be much easier to do
without worrying about the old code and having to ensure after every
commit that the old and the new algorithm interact perfectly. The
question is if the final merging is then that much harder and I think it
won't be. I succeeded in my "subs-first-try" branch to exactly do this
and fixed most things in under a day. The problem was just that the
matching algorithm was not good enough yet and I wanted to do it
thoroughly from the beginning.
> You haven't made the case for implementing match and subs as functions
> living in a separate module. The problems I see with this are that this
> is a crucial piece of functionality that, intuitively, belongs in the
> core and that extending the system seems more difficult with functions
> than with methods.
>
It's fine to me that subs stays in Basic but I'm strongly for putting
match into some extra module (but also in the directory sympy/core!).
Here are my reasons:
(1) spreading complex code into many different places makes it much
harder to understand th implementation.
(2) it's clear where to put common functionality, e.g. all symmetric
functions will use (nearly) identical algorithms.
(3) if I use hints in the implementation - and I plan to - then
scattering them around will make it nearly impossible for the user to
find out what hints there are and what they do. There are some possible
solutions I can think of, but I don't like them too much.
Maybe some other people can comment on this?
> Many of the existing methods are in need of a good cleanup. I think you
> should start with this. It'll simplify further work on the subject and
> you'll gain a better understanding of the details of the system.
>
I think it's easier to rewrite the code than changing the current
implementation.
> Ronan
>
>
Thanks again for your time,
Sebastian