[sympy] Future plans for pattern matching

28 views
Skip to first unread message

basti

unread,
May 2, 2010, 12:46:02 PM5/2/10
to sympy
The last few days I spent quite some time on understanding the pattern
matching and substitution logic in sympy and trying out ideas to
improve them. Now I feel able and willing to redesign most of the
stuff and will in the following give an overview about my plans.

The ultimate goal is to have a matching algorithm which is mostly
described here:
http://github.com/bastikr/sympy/blob/match/doc/src/modules/matching.txt

And a substitution algorithm described here:
http://github.com/bastikr/sympy/blob/subs/doc/src/modules/substitution.txt

The best way to read these documents might be to fetch the
corresponding branches and build the docs to get nice html output.
(But since it's restructured text you can also read them directly)

Maybe a short summary of the algorithms:

* matching algorithm:
The matching rules will be controllable by hints which can be given to
the main match function. This function will check that all input
arguments are valid and pass on to a dispatching function which will
call for different input arguments specialized functions. (E.g there
are specialized functions for generic functions, Add, Mul, Pow, ...).
These specialized functions will try to match subexpressions to
corresponding subpatterns using special rules according to the given
hints. If something is matched a dictionary with the matched wilds is
returned (it's empty if there are no wilds) or None if the expression
does not match the pattern.

* substitution algorithm
Substitution distinguishes between matching patterns which consist of
a single atomic object and complex patterns. Atomic substitution does
not care if the atom is a wild symbol or not - it stupidly just
replaces any occurrence by the substitution expression. This means
it's pretty fast but also less powerful than pattern matching. Pattern
matching tries to match subexpressions to the given pattern (using the
match algorithm) and if something matches it gets atomic substitution
rules for the wilds in the pattern. These rules are applied to the
substitution expression and the result is used as substitution for the
matched part in the expression.
E.g. this allows the following (x is a symbol, u is a Wild):

>>> subs(2+x**4, c+x**u, u*x**(u-1))
4*x**3


Now what will be the steps to achieve this aim:

(1) Implement a generic "Hint"s model which can be used to store
values of different hints and also be used to find out interactively
what hints are possible. (This is already done in the hints branch and
might also be useful for other parts in sympy)

(2) Implement the atomic substitution method described in
substitution.txt (This can either be done directly in the Basic class
or alternatively in some separate module)

(3) Complete the description of the matching algorithm in matching.txt

(4) Implement the matching algorithm described in matching.txt in a
separate module. (This relies on the Hints model and the atomic
substitution method)

(5) Implement the pattern substitution method described in
substitution.txt (Again either in a separate module together with the
atomic substitution method or directly in Basic)

(6) Write the wrapping "subs" function.

(7) Plug this new "subs" function in instead of the currently used
subs function. Some tests are expected to fail: Those which test
directly the substitution function have to be reviewed and might have
to be changed simply because other substitution behavior is expected
of the new substitution method.
Other test will fail because some algorithms will have to be adapted
to use the new substitution.

(8) Remove old substitution code.

(9) Do (7) and (8) with the "match" function.

I'm not completely sure when it is a good point to merge back into
master. Maybe only after (7) and after (9).

This is a pretty ambitious undertaking and might take some month to
accomplish. But before I start spending more time on this I want hear
the opinion of other developers. Do you think it's worth doing? And if
yes, do you have any tips, wishes or comments? Also help is of course
very much welcomed, just tell me what do you want to work on so that
we do not double the work!

Sebastian

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To post to this group, send email to sy...@googlegroups.com.
To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sympy?hl=en.

Aaron S. Meurer

unread,
May 2, 2010, 1:42:51 PM5/2/10
to sy...@googlegroups.com
You might want to look at the switch manager that Chris has been working form in issue 1725.
Yes, it is worth it! The ability to finely control subs alone will be worth it, (e.g., sin(x)**3.subs(sin(x)**2, 1 - cos(x)**2) will finally work and return sin(x)*(1 - cos(x)**2), which we can use to make trigsimp better). Also, matching is very buggy now, so if your new method can fix at least some of the issues at http://code.google.com/p/sympy/issues/list?q=label:Matching, that would be a great help too.

I will take a read of your guides after they compile to see if I have any further comments.

Aaron Meurer

Aaron S. Meurer

unread,
May 2, 2010, 1:58:52 PM5/2/10
to sy...@googlegroups.com
Regarding your guides:

I'm assuming match() is not implemented yet, so the doctests shouldn't be working (they don't). Even so, I see this:

>>> match(x+y, x+u)
{u: x+y}

Shouldn't it rather be {u: y}.

Regarding evenness, does that mean you will fix issue 1784? (+1) By the way, some of this stuff would ideally be worked with the new assumptions (like have an evenness assumption for cos, even powers, etc. and check using ask()), but this probably isn't necessary for now.

For pattern matching where the result is not well defined, what do you plan to do (e.g., (x + y).match(u + v), u and v are both Wild's)? Will you just have a disclaimer that if you do this, the result could be not what you expect?

So for pattern matching subs, can it be made smarter by having match able to return clues for subs as to where the expression is, for example, if it is nested deeply? Does this make sense?

Aaron Meurer
On May 2, 2010, at 10:46 AM, basti wrote:

Sebastian

unread,
May 2, 2010, 3:21:40 PM5/2/10
to sy...@googlegroups.com
On 05/02/2010 07:58 PM, Aaron S. Meurer wrote:
> Regarding your guides:
>
> I'm assuming match() is not implemented yet, so the doctests shouldn't be working (they don't). Even so, I see this:
>
>
>>>> match(x+y, x+u)
>>>>
> {u: x+y}
>
> Shouldn't it rather be {u: y}.
>
Sorry, I forgot pushing the last commit which fixes exactly this.
> Regarding evenness, does that mean you will fix issue 1784? (+1)
Yes I plan to. This was already implemented in the first quickly written
matching algorithm (it's in the branch subs-first-try).
> By the way, some of this stuff would ideally be worked with the new assumptions (like have an evenness assumption for cos, even powers, etc. and check using ask()), but this probably isn't necessary for now.
>
This is a good idea but I think it will be trivial to implement this later.
> For pattern matching where the result is not well defined, what do you plan to do (e.g., (x + y).match(u + v), u and v are both Wild's)? Will you just have a disclaimer that if you do this, the result could be not what you expect?
>
The algorithm should take the positioning of the terms as hint how to
match them. Of course since printing does some sorting it might be not
obvious to the user :(. Also the case (x+y+z).match(u+v) is interesting
since there are even more valid matching possibilities. I'll follow the
example of mathematica and try to let every wild match exactly one
symbol and assign the remaining terms to the last wild if possible. But
I think it's nearly impossible to find rules so that every match is
unambiguous - and because of speed reasons the algorithm will simple
return the first match it finds and never know if there had been other
possibilities. So in that sense a disclaimer might be necessary, stating
that ambiguous patterns may result in surprising matches.
> So for pattern matching subs, can it be made smarter by having match able to return clues for subs as to where the expression is, for example, if it is nested deeply? Does this make sense?
I'm not sure I understand you here, but the traversal through the
subexpressions is done inside the subs method which then calls match
this subexpression and the pattern. In that sense subs already knows
where the expression is but I don't know how the user could use this to
control the substitution. What I thought about was doing substitution
with different wilds for each match, e.g.:

>>> subs((x**3+y**5)/x**2, u**v, enumerate=True)
((u1**v1+u2**v2)/u3**v, {u1: x, v1: 3, u2: y, v2: 5, u3: x, v3: 2})

Then you could look at the returned expression, decide which wilds to
substitute and then make an atomic substitution.
>
> Aaron Meurer
> On May 2, 2010, at 10:46 AM, basti wrote:
>
>
Thanks for your comments! If you find anything in the guides hard to
understand please tell me so that I can try to improve them.

Vinzent Steinberg

unread,
May 3, 2010, 8:00:59 AM5/3/10
to sympy
On 2 mai, 19:58, "Aaron S. Meurer" <asmeu...@gmail.com> wrote:
> For pattern matching where the result is not well defined, what do you plan to do (e.g., (x + y).match(u + v), u and v are both Wild's)?  Will you just have a disclaimer that if you do this, the result could be not what you expect?

We could add aditional arguments to Wild(), about whether it is greedy
and what not, which non-atoms could be consumed (Add, Mul...
etc.), ...

This is somewhat related to regular expressions.

Vinzent

Ronan Lamy

unread,
May 3, 2010, 12:21:39 PM5/3/10
to sy...@googlegroups.com
Le dimanche 02 mai 2010 à 09:46 -0700, basti a écrit :
> The last few days I spent quite some time on understanding the pattern
> matching and substitution logic in sympy and trying out ideas to
> improve them. Now I feel able and willing to redesign most of the
> stuff and will in the following give an overview about my plans.
>
Interesting read, which means I have a lot of comments. I don't agree
with everything you've written, but I hope we can integrate something
like this in the master docs.

Matching
--------

The way you handle functions seems inconsistent, which is probably
caused by the pervasive confusion in the current code between functions
(e.g. sin) and their results (e.g. sin(x)). Specifically, here are some
results I'd like to see:
match(x, f) -> {f: x} (because Symbols are callable)
match(sin(x), f) -> None (because sin(x) is not a function)
match(sin(x), f(u)) -> {f: sin, u: x}

You don't tackle the major cause of unreliability in matching which is
finding all possible matches and ranking them. This is required not only
to get useful results (match(1+x**2, u+x**v) -> {u: x**2, v:0} is
correct but probably not what was expected) but also to get merely
correct ones when there are recursive calls to match.

The Hints idea suffers from combinatorial explosion. You've only
mentioned Add, Mul and Pow, what about And, Or, Xor, Equivalent? What
about more complex objects like Integral or Sum? How can this be
extended for user-defined classes?

Substitution
------------

I think that structural substitution should be a separate function, not
just a subroutine handling a special case in subs. This would be useful
for common subexpression elimination, at least.

General remarks
---------------

You should take smaller steps but set more ambitious goals. Your design
isn't much different from the current one, so it'll suffer from the same
kind of problems. Try to forget the existing implementation and imagine
what the API should be and what results you expect.

Don't base any work on commits that break tests. Good commits are easy
to understand, and therefore have a limited scope, and do not break any
tests. To merge the implementation of two functions, identify invariants
that link them, correct the situations where they break down, and then
factor out the duplications. Rinse and repeat as necessary. It might
seem more fastidious than doing the change you have in mind right away
and play whack-a-mole with the bugs that crop up, but it's usually
faster and easier that way.

You haven't made the case for implementing match and subs as functions
living in a separate module. The problems I see with this are that this
is a crucial piece of functionality that, intuitively, belongs in the
core and that extending the system seems more difficult with functions
than with methods.

Many of the existing methods are in need of a good cleanup. I think you
should start with this. It'll simplify further work on the subject and
you'll gain a better understanding of the details of the system.

Ronan

Sebastian

unread,
May 3, 2010, 4:11:05 PM5/3/10
to sy...@googlegroups.com
Thanks for all your comments, it's very helpful to get so thorough feedback.

On 05/03/2010 06:21 PM, Ronan Lamy wrote:
> Le dimanche 02 mai 2010 à 09:46 -0700, basti a écrit :
>
>> The last few days I spent quite some time on understanding the pattern
>> matching and substitution logic in sympy and trying out ideas to
>> improve them. Now I feel able and willing to redesign most of the
>> stuff and will in the following give an overview about my plans.
>>
>>
> Interesting read, which means I have a lot of comments. I don't agree
> with everything you've written, but I hope we can integrate something
> like this in the master docs.
>
> Matching
> --------
>
> The way you handle functions seems inconsistent, which is probably
> caused by the pervasive confusion in the current code between functions
> (e.g. sin) and their results (e.g. sin(x)). Specifically, here are some
> results I'd like to see:
> match(x, f) -> {f: x} (because Symbols are callable)
>
Yes in the moment symbols are python callables but does this really make
them functions? What's the sense of having both if they describe
mathematically the same? So I disagree with this.
> match(sin(x), f) -> None (because sin(x) is not a function)
>
You are right - this makes more sense.
> match(sin(x), f(u)) -> {f: sin, u: x}
>
Of course ...
> You don't tackle the major cause of unreliability in matching which is
> finding all possible matches and ranking them. This is required not only
> to get useful results (match(1+x**2, u+x**v) -> {u: x**2, v:0} is
> correct but probably not what was expected) but also to get merely
> correct ones when there are recursive calls to match.
>
Yeah the implementation section is by far not completed. I've already
more details written on paper which I'll add as soon as I find time.
But to your concerns: I think it's not necessary to first find all
possible solutions and then afterward rank them. It is possible to try
the different combinations in such an order that as you say the
"highest" ranked are tried first and if something matches it can be
returned immediately.

> but also to get merely
> correct ones when there are recursive calls to match.
Sorry I don't understand what you mean by this. Can you maybe give an example?


> The Hints idea suffers from combinatorial explosion.
Why combinatorial explosion? The implementation complexity will grow
linearly with the amount of hints as long as the extended matching rules
don't have to be permutated (for all cases I considered until now there
seemed to be only one sane ordering). Or do you mean something else?
> You've only
> mentioned Add, Mul and Pow, what about And, Or, Xor, Equivalent? What
> about more complex objects like Integral or Sum?
>
There will be a fallback matching function that will be called if two
expressions are of the same type. They match only if all it's args are
matching. Of course And, Or, Xor are special in that sense that they are
symmetric functions. They can be handled like Sum. Maybe there should be
an attribute is_symmetric or something similar so that this can handled
more general?
> How can this be
> extended for user-defined classes?
I guess similarly to the printing case. But you are right this is maybe
the strongest point of having match and subs directly implemented as
methods all around sympy.
> Substitution
> ------------
>
> I think that structural substitution should be a separate function, not
> just a subroutine handling a special case in subs. This would be useful
> for common subexpression elimination, at least.
>
What do you understand under structural substitution? Do you mean what I
called atomic substitution?
> General remarks
> ---------------
>
> You should take smaller steps but set more ambitious goals. Your design
> isn't much different from the current one, so it'll suffer from the same
> kind of problems. Try to forget the existing implementation and imagine
> what the API should be and what results you expect.
>
But I think the api is basically fine as it is now. The reason I started
doing this is simply that I *didn't* get the results I was expecting.
Therefore I wrote down the matching rules to clearly state what my
expectations are and to find a common consent.
> Don't base any work on commits that break tests. Good commits are easy
> to understand, and therefore have a limited scope, and do not break any
> tests. To merge the implementation of two functions, identify invariants
> that link them, correct the situations where they break down, and then
> factor out the duplications. Rinse and repeat as necessary. It might
> seem more fastidious than doing the change you have in mind right away
> and play whack-a-mole with the bugs that crop up, but it's usually
> faster and easier that way.
>
The implementation of subs and match by itself will be much easier to do
without worrying about the old code and having to ensure after every
commit that the old and the new algorithm interact perfectly. The
question is if the final merging is then that much harder and I think it
won't be. I succeeded in my "subs-first-try" branch to exactly do this
and fixed most things in under a day. The problem was just that the
matching algorithm was not good enough yet and I wanted to do it
thoroughly from the beginning.
> You haven't made the case for implementing match and subs as functions
> living in a separate module. The problems I see with this are that this
> is a crucial piece of functionality that, intuitively, belongs in the
> core and that extending the system seems more difficult with functions
> than with methods.
>
It's fine to me that subs stays in Basic but I'm strongly for putting
match into some extra module (but also in the directory sympy/core!).
Here are my reasons:

(1) spreading complex code into many different places makes it much
harder to understand th implementation.

(2) it's clear where to put common functionality, e.g. all symmetric
functions will use (nearly) identical algorithms.

(3) if I use hints in the implementation - and I plan to - then
scattering them around will make it nearly impossible for the user to
find out what hints there are and what they do. There are some possible
solutions I can think of, but I don't like them too much.

Maybe some other people can comment on this?
> Many of the existing methods are in need of a good cleanup. I think you
> should start with this. It'll simplify further work on the subject and
> you'll gain a better understanding of the details of the system.
>
I think it's easier to rewrite the code than changing the current
implementation.
> Ronan
>
>
Thanks again for your time,
Sebastian

Aaron S. Meurer

unread,
May 3, 2010, 4:08:51 PM5/3/10
to sy...@googlegroups.com

On May 3, 2010, at 10:21 AM, Ronan Lamy wrote:

> Le dimanche 02 mai 2010 à 09:46 -0700, basti a écrit :
>> The last few days I spent quite some time on understanding the pattern
>> matching and substitution logic in sympy and trying out ideas to
>> improve them. Now I feel able and willing to redesign most of the
>> stuff and will in the following give an overview about my plans.
>>
> Interesting read, which means I have a lot of comments. I don't agree
> with everything you've written, but I hope we can integrate something
> like this in the master docs.
>
> Matching
> --------
>
> The way you handle functions seems inconsistent, which is probably
> caused by the pervasive confusion in the current code between functions
> (e.g. sin) and their results (e.g. sin(x)). Specifically, here are some
> results I'd like to see:
> match(x, f) -> {f: x} (because Symbols are callable)
> match(sin(x), f) -> None (because sin(x) is not a function)
> match(sin(x), f(u)) -> {f: sin, u: x}
>
> You don't tackle the major cause of unreliability in matching which is
> finding all possible matches and ranking them. This is required not only
> to get useful results (match(1+x**2, u+x**v) -> {u: x**2, v:0} is
> correct but probably not what was expected) but also to get merely
> correct ones when there are recursive calls to match.

I think that it should generally be considered the user's fault if he uses an ambiguous pattern and gets an unambiguous result. In general for pattern matching, the pattern is known ahead of time and the expression being matched is not. It should therefore be up to the user to avoid things like u + v, and the above (do you mean {u:1 + x**2, v:0}?), but structuring match patterns in a proper manner and through the proper use of the exclude keyword argument to Wild(). So, in this case, the user might want to have u = Wild('u', exclude=[x]), which would make it work correctly.

Then again, maybe this isn't powerful enough as it is (what if u can be other things with x, just not x**v?), so it can probably be improved. Maybe the exclude argument itself should be allowed to be a wild expression.

Aaron Meurer

Aaron S. Meurer

unread,
May 3, 2010, 4:12:24 PM5/3/10
to sy...@googlegroups.com
There already is, it's called commutativity :)

In [9]: And(x, y).is_commutative
Out[9]: True

Aaron Meurer

Ronan Lamy

unread,
May 3, 2010, 4:26:58 PM5/3/10
to sy...@googlegroups.com
Le lundi 03 mai 2010 à 14:12 -0600, Aaron S. Meurer a écrit :
> On May 3, 2010, at 2:11 PM, Sebastian wrote:
> > There will be a fallback matching function that will be called if two
> > expressions are of the same type. They match only if all it's args are
> > matching. Of course And, Or, Xor are special in that sense that they are
> > symmetric functions. They can be handled like Sum. Maybe there should be
> > an attribute is_symmetric or something similar so that this can handled
> > more general?
>
> There already is, it's called commutativity :)
>
> In [9]: And(x, y).is_commutative
> Out[9]: True

No, that's not at all what this means.
It means actually that Mul(And(x, y), z) == Mul(z, And(x, y)), assuming
z.is_commutative == True. Whether is makes any sense is a different
debate...

smichr

unread,
May 3, 2010, 10:12:38 PM5/3/10
to sympy
Sebastian wrote:

(1) Implement a generic "Hint"s model which can be used to store
values of different hints and also be used to find out interactively
what hints are possible. (This is already done in the hints branch and
might also be useful for other parts in sympy)

I have a manage_hints routine in iterables.py in the 1766 commit at
smichr's 1766 branch at github. I used that with expand. You might
check it out and see if it could be used for your purposes. What I
settled on was that any routine that had hints it wanted to manage
would store those hints within itself. To see what the hints are, the
expression "hints" is sent (and checked for before anything else) in
the routine. e.g. to see the hint options for expand one would
execute

expand("hints")

Sebastian

unread,
May 4, 2010, 7:44:03 AM5/4/10
to sympy
Thank's for pointing this out! Arron already mentioned it and I had a
look on it but I think it's maybe a bit too limited for general usage.
I wrote up my ideas of a hinting model in the branch "hints" on my
github repo: http://github.com/bastikr/sympy/tree/hints
You can pull it and build the documentation or just view it online
(without nice formatation) on: http://github.com/bastikr/sympy/blob/hints/doc/src/modules/hinting.txt

What do you think? Would this be an acceptable way to do it?

Sebastian

Vinzent Steinberg

unread,
May 4, 2010, 10:26:20 AM5/4/10
to sympy
On May 3, 10:11 pm, Sebastian <basti...@gmail.com> wrote:
> The implementation of subs and match by itself will be much easier to do
> without worrying about the old code and having to ensure after every
> commit that the old and the new algorithm interact perfectly. The
> question is if the final merging is then that much harder and I think it
> won't be. I succeeded in my "subs-first-try" branch to exactly do this
> and fixed most things in under a day. The problem was just that the
> matching algorithm was not good enough yet and I wanted to do it
> thoroughly from the beginning.

You got about 50 failures in your branch, this will be much work to
debug. I don't know if Ronan's proposed iterative approach would be
more effective.

Vinzent

Ronan Lamy

unread,
May 4, 2010, 11:08:16 AM5/4/10
to sy...@googlegroups.com
Le lundi 03 mai 2010 à 22:11 +0200, Sebastian a écrit :
> Thanks for all your comments, it's very helpful to get so thorough feedback.
>
> On 05/03/2010 06:21 PM, Ronan Lamy wrote:
> > Le dimanche 02 mai 2010 à 09:46 -0700, basti a écrit :
> >
> >> The last few days I spent quite some time on understanding the pattern
> >> matching and substitution logic in sympy and trying out ideas to
> >> improve them. Now I feel able and willing to redesign most of the
> >> stuff and will in the following give an overview about my plans.
> >>
> >>
> > Interesting read, which means I have a lot of comments. I don't agree
> > with everything you've written, but I hope we can integrate something
> > like this in the master docs.
> >
> > Matching
> > --------
> >
> > The way you handle functions seems inconsistent, which is probably
> > caused by the pervasive confusion in the current code between functions
> > (e.g. sin) and their results (e.g. sin(x)). Specifically, here are some
> > results I'd like to see:
> > match(x, f) -> {f: x} (because Symbols are callable)
> >
> Yes in the moment symbols are python callables but does this really make
> them functions? What's the sense of having both if they describe
> mathematically the same? So I disagree with this.

Symbols aren't functions, they *can be* anything, including functions.
We'll probably need to develop a real notion of mathematical types at
some point to model this properly, but in any case having objects that
can be anything seems inevitable in order to parse strings into
expressions.

> > match(sin(x), f) -> None (because sin(x) is not a function)
> >
> You are right - this makes more sense.
> > match(sin(x), f(u)) -> {f: sin, u: x}
> >
> Of course ...
> > You don't tackle the major cause of unreliability in matching which is
> > finding all possible matches and ranking them. This is required not only
> > to get useful results (match(1+x**2, u+x**v) -> {u: x**2, v:0} is
> > correct but probably not what was expected) but also to get merely
> > correct ones when there are recursive calls to match.
> >
> Yeah the implementation section is by far not completed. I've already
> more details written on paper which I'll add as soon as I find time.
> But to your concerns: I think it's not necessary to first find all
> possible solutions and then afterward rank them. It is possible to try
> the different combinations in such an order that as you say the
> "highest" ranked are tried first and if something matches it can be
> returned immediately.
>
> > but also to get merely
> > correct ones when there are recursive calls to match.
> Sorry I don't understand what you mean by this. Can you maybe give an example?

I'm thinking of something like match(sin(a+b)*cos(b), sin(u+v)*cos(u)):
suppose matching goes from left to right, you get {u:a, b:v} from the
first term, plug it into the second, failure!
>
>
> > The Hints idea suffers from combinatorial explosion.
> Why combinatorial explosion? The implementation complexity will grow
> linearly with the amount of hints as long as the extended matching rules
> don't have to be permutated (for all cases I considered until now there
> seemed to be only one sane ordering). Or do you mean something else?

I was thinking of (identity, commutative, ...) * (Add, Mul, ...), which
is only O(m*n) so perhaps not really an explosion. The other problem is
what do you do if you want to apply a different set of hints to
different parts of the expression.

> > You've only
> > mentioned Add, Mul and Pow, what about And, Or, Xor, Equivalent? What
> > about more complex objects like Integral or Sum?
> >
> There will be a fallback matching function that will be called if two
> expressions are of the same type. They match only if all it's args are
> matching. Of course And, Or, Xor are special in that sense that they are
> symmetric functions. They can be handled like Sum. Maybe there should be
> an attribute is_symmetric or something similar so that this can handled
> more general?
> > How can this be
> > extended for user-defined classes?
> I guess similarly to the printing case. But you are right this is maybe
> the strongest point of having match and subs directly implemented as
> methods all around sympy.
> > Substitution
> > ------------
> >
> > I think that structural substitution should be a separate function, not
> > just a subroutine handling a special case in subs. This would be useful
> > for common subexpression elimination, at least.
> >
> What do you understand under structural substitution? Do you mean what I
> called atomic substitution?

I mean substitution based on structural matching, so it's similar to
atomic substitution, but without the restriction that it only applies to
atoms.
Define "succeeded", because under my definition you did not:
tests finished: 1988 passed, 22 failed, 1 skipped, 51 expected to
fail,
2 expected to fail but passed, 27 exceptions, in 452.17 seconds
DO *NOT* COMMIT!


> > You haven't made the case for implementing match and subs as functions
> > living in a separate module. The problems I see with this are that this
> > is a crucial piece of functionality that, intuitively, belongs in the
> > core and that extending the system seems more difficult with functions
> > than with methods.
> >
> It's fine to me that subs stays in Basic but I'm strongly for putting
> match into some extra module (but also in the directory sympy/core!).

OK. I thought you wanted to create a new subpackage. If match becomes a
function, this is reasonable.

> Here are my reasons:
>
> (1) spreading complex code into many different places makes it much
> harder to understand th implementation.

OK, but spreading the implementation of a class into many different
places also makes it much harder to understand.
>
> (2) it's clear where to put common functionality, e.g. all symmetric
> functions will use (nearly) identical algorithms.

... which could fit nicely in operations.py, next to each other.
>
> (3) if I use hints in the implementation - and I plan to - then
> scattering them around will make it nearly impossible for the user to
> find out what hints there are and what they do. There are some possible
> solutions I can think of, but I don't like them too much.

Ordinary users don't read the source, so I don't think this matters
either way.
>
> Maybe some other people can comment on this?

I guess this comes down to a matter of taste (grouping by class vs by
functionality) and implementation practicality...

> > Many of the existing methods are in need of a good cleanup. I think you
> > should start with this. It'll simplify further work on the subject and
> > you'll gain a better understanding of the details of the system.
> >
> I think it's easier to rewrite the code than changing the current
> implementation.

I guess the customary answer to this is "Famous last words..."

Sebastian

unread,
May 4, 2010, 11:54:43 AM5/4/10
to sy...@googlegroups.com
Okay that sounds reasonable but I still don't agree that match(x,f) should return {f: x}
This case is somehow similar to match(f, cos) - the more generic case is on the left side. Do you think this should return {cos: f}?



  
match(sin(x), f) -> None (because sin(x) is not a function)
  
      
You are right - this makes more sense.
    
match(sin(x), f(u)) -> {f: sin, u: x}
  
      
Of course ...
    
You don't tackle the major cause of unreliability in matching which is
finding all possible matches and ranking them. This is required not only
to get useful results (match(1+x**2, u+x**v) -> {u: x**2, v:0} is
correct but probably not what was expected) but also to get merely
correct ones when there are recursive calls to match.
  
      
Yeah the implementation section is by far not completed. I've already
more details written on paper which I'll add as soon as I find time.
But to your concerns: I think it's not necessary to first find all
possible solutions and then afterward rank them. It is possible to try
the different combinations in such an order that as you say the
"highest" ranked are tried first and if something matches it can be
returned immediately.

    
but also to get merely
correct ones when there are recursive calls to match.
      
Sorry I don't understand what you mean by this. Can you maybe give an example?
    
I'm thinking of something like match(sin(a+b)*cos(b), sin(u+v)*cos(u)):
suppose matching goes from left to right, you get {u:a, b:v} from the
first term, plug it into the second, failure!
  
This is exactly why I need atomic substitution - if something matches the patterns have to be updated.


    
The Hints idea suffers from combinatorial explosion.
      
Why combinatorial explosion? The implementation complexity will grow
linearly with the amount of hints as long as the extended matching rules
don't have to be permutated (for all cases I considered until now there
seemed to be only one sane ordering). Or do you mean something else?
    
I was thinking of (identity, commutative, ...) * (Add, Mul, ...), which
is only O(m*n) so perhaps not really an explosion. The other problem is
what do you do if you want to apply a different set of hints to
different parts of the expression.

  
If the user really wants such special behavior I think it's reasonable that he has more work to do himself. E.g. first getting the subparts he wants to handle differently and then treat them separately.
But if you can think of a good way to implement this please tell me!

  
 You've only
mentioned Add, Mul and Pow, what about And, Or, Xor, Equivalent? What
about more complex objects like Integral or Sum? 
  
      
There will be a fallback matching function that will be called if two
expressions are of the same type. They match only if all it's args are
matching. Of course And, Or, Xor are special in that sense that they are
symmetric functions. They can be handled like Sum. Maybe there should be
an attribute is_symmetric or something similar so that this can handled
more general?
    
How can this be
extended for user-defined classes?
      
I guess similarly to the printing case. But you are right this is maybe
the strongest point of having match and subs directly implemented as
methods all around sympy.
    
Substitution
------------

I think that structural substitution should be a separate function, not
just a subroutine handling a special case in subs. This would be useful
for common subexpression elimination, at least.
  
      
What do you understand under structural substitution? Do you mean what I
called atomic substitution?
    
I mean substitution based on structural matching, so it's similar to
atomic substitution, but without the restriction that it only applies to
atoms.
  
So structural matching wouldn't for example match x*y with y*x?
Okay maybe succeeded is overstated :) But to be fair you have to look a little bit closer:

* in core only one test that fails does not directly test subs. All others test for behavior that I don't agree with.
* Many others that fail are related to the gruntz algorithm where extended matching is used which I haven't implemented yet. (And before doing this I wanted to implement the matching algorithm cleanly). That means many things relying on limits fail!


  
You haven't made the case for implementing match and subs as functions
living in a separate module. The problems I see with this are that this
is a crucial piece of functionality that, intuitively, belongs in the
core and that extending the system seems more difficult with functions
than with methods.
  
      
It's fine to me that subs stays in Basic but I'm strongly for putting
match into some extra module (but also in the directory sympy/core!).
    
OK. I thought you wanted to create a new subpackage. If match becomes a
function, this is reasonable.

  
Here are my reasons:
 
(1) spreading complex code into many different places makes it much
harder to understand th implementation.
    
OK, but spreading the implementation of a class into many different
places also makes it much harder to understand.
  
If there are methods subs and match in Basic which just call these functions it will be clear what they do and simpler to find since they aren't hidden between many helper functions.


(2) it's clear where to put common functionality, e.g. all symmetric
functions will use (nearly) identical algorithms.
    
... which could fit nicely in operations.py, next to each other.
  

I'm not sure what the scope of operations.py really is (No docstring). Maybe you're right...


(3) if I use hints in the implementation - and I plan to - then
scattering them around will make it nearly impossible for the user to
find out what hints there are and what they do. There are some possible
solutions I can think of, but I don't like them too much.
    
Ordinary users don't read the source, so I don't think this matters
either way.
  
True, but I already have some ideas about the hinting model ( http://github.com/bastikr/sympy/blob/hints/doc/src/modules/hinting.txt) and it's easier to use if all hints concerning one topic are define in the same module.


Maybe some other people can comment on this?
    
I guess this comes down to a matter of taste (grouping by class vs by
functionality) and implementation practicality...
  
I guess it does...


  
Many of the existing methods are in need of a good cleanup. I think you
should start with this. It'll simplify further work on the subject and
you'll gain a better understanding of the details of the system.
  
      
I think it's easier to rewrite the code than changing the current
implementation.
    
I guess the customary answer to this is "Famous last words..."

  
:)

Aaron S. Meurer

unread,
May 4, 2010, 3:41:17 PM5/4/10
to sy...@googlegroups.com
This probably goes without saying, but speed is an issue here, at least for atomic matching and substitution.  For example, there are two match() calls in Basic._compare_pretty that are called all the time when using sympy.  Making sure that this runs fast will therefore be very important (actually, this needs to be rewritten to a Python 3 compatible key function, which will also make it faster).

That would be hard to do anyway, because sympy normalizes Mul(x, y) and Mul(y, x) into the same object (unless of course it is non-commutative, in which case of course it shouldn't match).

I think maybe there is just a semantic confusion here, between the use of the word atomic to mean basic and the Atom class in sympy, which represents only the simplest items (no recursive .args).  

I think if it makes sense in the algorithm to have something super simplistic that it only matches Atoms, then that is fine.  Otherwise, the user only wants structural (or literal) matching, and different forms of more intelligent algebraic matching.  
I'm not sure what the scope of operations.py really is (No docstring). Maybe you're right…

operations.py is the home of AssocOp, which is the base class of Mul and Add.  Incidentally, it is also the home of the current matching algorithm, in AssocOp._matches_commutative.  It also holds LatticeOp, which is the base class of And and Or.  So basically, it holds operations classes that are not used directly but are rather low-level base classes for operations that are used.  (p.s. feel free to add a docstring :)

I guess it depends on the size of the resulting code.  If there is a lot of it, it should probably go in a separate file.  Otherwise, you can put it somewhere logical like operations.py.  

Aaron Meurer

Sebastian

unread,
May 4, 2010, 4:03:37 PM5/4/10
to sy...@googlegroups.com

>>>>> Substitution
>>>>> ------------
>>>>>
>>>>> I think that structural substitution should be a separate function, not
>>>>> just a subroutine handling a special case in subs. This would be useful
>>>>> for common subexpression elimination, at least.
>>>>>
>>>>>
>>>> What do you understand under structural substitution? Do you mean what I
>>>> called atomic substitution?
>>>>
>>> I mean substitution based on structural matching, so it's similar to
>>> atomic substitution, but without the restriction that it only applies to
>>> atoms.
>>>
>> So structural matching wouldn't for example match x*y with y*x?
>
> That would be hard to do anyway, because sympy normalizes Mul(x, y)
> and Mul(y, x) into the same object (unless of course it is
> non-commutative, in which case of course it shouldn't match).
In [1]: Mul(x,y,evaluate=False).args
Out[1]: (x, y)

In [2]: Mul(y,x,evaluate=False).args
Out[2]: (y, x)

>
> I think maybe there is just a semantic confusion here, between the use
> of the word atomic to mean basic and the Atom class in sympy, which
> represents only the simplest items (no recursive .args).
Yes maybe. I understand under atomic that it has no further elements in
args. The important thing is that it has to be possible to check really
fast if two atoms are equal.

Aaron S. Meurer

unread,
May 6, 2010, 9:35:11 AM5/6/10
to sy...@googlegroups.com
I think that both have their benefits. I like how Sebastian's model makes hints objects, which means that they can have docstrings and can be callable. I also like having the hint manager be and object, and the decorator idea.

Of Chris's stuff, I like some of the shortcuts he has implemented that reduce typing for more complex hints. Sebastian's writeup doesn't mention some of the key things, like the logic for what turns other hints on and off (I am assuming this would all go in the HintManager).

So probably Sebastian's model will be more extendable. Remember that Chris's model was designed specifically with expand() in mind. I think we should take the best of both worlds.

Aaron Meurer

smichr

unread,
May 7, 2010, 8:57:25 AM5/7/10
to sympy
> Of Chris's stuff, I like some of the shortcuts he has implemented that reduce typing for more complex hints.  Sebastian's writeup doesn't mention some of the key things, like the logic for what turns other hints on and off (I am assuming this would all go in the HintManager).  

I can quickly elaborate on the hint manager's features:

keyword expansion (mult=True -> multinomial=True)
keyword validation (multy=True could generate an error; bas=True is
ambiguous and generates an error)
keyword concatenation (mul_multinomial=True)
keyword ordering when concatenated; otherwise ordering is as given in
the stored hints
customizable default keyword (e.g. all=True can set all hints to True)

After doing this, I wonder if the keyword manager for running scripts
might be enlisted to handle some of this...I never looked into it,
however.

Sebastian

unread,
May 7, 2010, 4:44:14 PM5/7/10
to sy...@googlegroups.com
On 05/07/2010 02:57 PM, smichr wrote:
>> Of Chris's stuff, I like some of the shortcuts he has implemented that reduce typing for more complex hints. Sebastian's writeup doesn't mention some of the key things, like the logic for what turns other hints on and off (I am assuming this would all go in the HintManager).
>>
> I can quickly elaborate on the hint manager's features:
>
> keyword expansion (mult=True -> multinomial=True)
>
This might be convenient for quick hacks, but code might break as soon
as new keywords are introduced. Also this really does not enhance
readability. I think it's okay to define keywords as abbreviations but I
don't like this general expanding.

> keyword validation (multy=True could generate an error; bas=True is
> ambiguous and generates an error)
>
+1 for validation.

> keyword concatenation (mul_multinomial=True)
>
I don't think this is much of an improvement. Of course code gets a
little bit shorter but you also loose the underscore for hints.

> keyword ordering when concatenated; otherwise ordering is as given in
> the stored hints
>
Why would order be important if you set all the concatenated hints to
the same value? I think order of hints really shouldn't change the
behavior of code.

> customizable default keyword (e.g. all=True can set all hints to True)
>
+1 for having hints that trigger default values for other hints. But
maybe to set all keywords at once to the same value might not for all
cases where hints are needed be a good choice.

> After doing this, I wonder if the keyword manager for running scripts
> might be enlisted to handle some of this...I never looked into it,
> however.
>
>
Sorry, what keyword manager do you mean?

Sebastian

Sebastian

unread,
May 7, 2010, 4:50:27 PM5/7/10
to sy...@googlegroups.com
On 05/06/2010 03:35 PM, Aaron S. Meurer wrote:
> I think that both have their benefits. I like how Sebastian's model makes hints objects, which means that they can have docstrings and can be callable. I also like having the hint manager be and object, and the decorator idea.
>
> Of Chris's stuff, I like some of the shortcuts he has implemented that reduce typing for more complex hints. Sebastian's writeup doesn't mention some of the key things, like the logic for what turns other hints on and off (I am assuming this would all go in the HintManager).
>
This is a good idea - to have hints which can set default values for
other hints. It would already be possible to simply subclass the
HintManager and implement such logic, but I think this really deserves
some special treatment directly inside the HintManager. I'll think about
it ;)

Sebastian

Aaron S. Meurer

unread,
May 7, 2010, 5:31:42 PM5/7/10
to sy...@googlegroups.com

On May 7, 2010, at 2:44 PM, Sebastian wrote:

> On 05/07/2010 02:57 PM, smichr wrote:
>>> Of Chris's stuff, I like some of the shortcuts he has implemented that reduce typing for more complex hints. Sebastian's writeup doesn't mention some of the key things, like the logic for what turns other hints on and off (I am assuming this would all go in the HintManager).
>>>
>> I can quickly elaborate on the hint manager's features:
>>
>> keyword expansion (mult=True -> multinomial=True)
>>
> This might be convenient for quick hacks, but code might break as soon
> as new keywords are introduced. Also this really does not enhance
> readability. I think it's okay to define keywords as abbreviations but I
> don't like this general expanding.
>
>> keyword validation (multy=True could generate an error; bas=True is
>> ambiguous and generates an error)
>>
> +1 for validation.
>
>> keyword concatenation (mul_multinomial=True)
>>
> I don't think this is much of an improvement. Of course code gets a
> little bit shorter but you also loose the underscore for hints.
>
>> keyword ordering when concatenated; otherwise ordering is as given in
>> the stored hints
>>
> Why would order be important if you set all the concatenated hints to
> the same value? I think order of hints really shouldn't change the
> behavior of code.

For expand(), it can matter. For example, sin((x+y)**2).expand(trig=True, multinomial=True) will come out different depending on if you do the multinomial expansion first, or the trig expansion first (because the trig expansion only works when the args are an Add). See also the docstring of expand().

Another illustrative example is log((x + 1)*(y + 1)). If you expand the Mul first, you can't expand the log; if you expand the log first, you can't expand the Mul.
>
>> customizable default keyword (e.g. all=True can set all hints to True)
>>
> +1 for having hints that trigger default values for other hints. But
> maybe to set all keywords at once to the same value might not for all
> cases where hints are needed be a good choice.

I agree. Right now, the hints for expand are hard to use, because you have to turn off the hints you don't want. It would be much easier and future proof to new hints to just have expand(mul=True) turn off everything else (the general gist of Chris's manager is to do this, I think).
>
>> After doing this, I wonder if the keyword manager for running scripts
>> might be enlisted to handle some of this...I never looked into it,
>> however.
>>
>>
> Sorry, what keyword manager do you mean?

I'm guessing the OptionParser.

Aaron Meurer

Sebastian

unread,
May 7, 2010, 5:47:45 PM5/7/10
to sy...@googlegroups.com
On 05/07/2010 11:31 PM, Aaron S. Meurer wrote:
> On May 7, 2010, at 2:44 PM, Sebastian wrote:
>
>
>> On 05/07/2010 02:57 PM, smichr wrote:
>>
>>>> Of Chris's stuff, I like some of the shortcuts he has implemented that reduce typing for more complex hints. Sebastian's writeup doesn't mention some of the key things, like the logic for what turns other hints on and off (I am assuming this would all go in the HintManager).
>>>>
>>>>
>>> I can quickly elaborate on the hint manager's features:
>>>
>>> keyword expansion (mult=True -> multinomial=True)
>>>
>>>
>> This might be convenient for quick hacks, but code might break as soon
>> as new keywords are introduced. Also this really does not enhance
>> readability. I think it's okay to define keywords as abbreviations but I
>> don't like this general expanding.
>>
>>
>>> keyword validation (multy=True could generate an error; bas=True is
>>> ambiguous and generates an error)
>>>
>>>
>> +1 for validation.
>>
>>
>>> keyword concatenation (mul_multinomial=True)
>>>
>>>
>> I don't think this is much of an improvement. Of course code gets a
>> little bit shorter but you also loose the underscore for hints.
>>
>>
>>> keyword ordering when concatenated; otherwise ordering is as given in
>>> the stored hints
>>>
>>>
>> Why would order be important if you set all the concatenated hints to
>> the same value? I think order of hints really shouldn't change the
>> behavior of code.
>>
> For expand(), it can matter. For example, sin((x+y)**2).expand(trig=True, multinomial=True) will come out different depending on if you do the multinomial expansion first, or the trig expansion first (because the trig expansion only works when the args are an Add). See also the docstring of expand().
>
> Another illustrative example is log((x + 1)*(y + 1)). If you expand the Mul first, you can't expand the log; if you expand the log first, you can't expand the Mul.
>
Right, thanks for the explanation. Would an order argument be an
acceptable alternative? E.g. something like:

expr.expand(trig=True, multinomial=True, order=("trig", "multinomial"))

I really think the order of the keywords themselves should not be used
for any decisions. If you had for example two keywords which are
sensitive to order and you want to give them different values then you
are out of luck.

Aaron S. Meurer

unread,
May 7, 2010, 5:51:53 PM5/7/10
to sy...@googlegroups.com
I think it is impossible to use the order of the keywords anyway, because Python puts them in a dictionary, which is unordered.

The order argument seems like it could be simpler (too much typing!). Plus, it is redundant, because you are giving the hints twice. Therefore, I think the best solution is to have something like Chris suggested, where you can give the hints all together and it takes the order from that, so just expand(trig_multinomial=True), or expand(hints=("trig", "multinomial") should be enough. And if you don't care about order, then expand(trig=True, multinomial=True) should be fine (just make it clear in the docstring that this will necessarily be unordered).

Another possibility would be to somehow take advantage of *args, which is ordered.

Aaron Meurer

Vinzent Steinberg

unread,
May 8, 2010, 8:37:13 AM5/8/10
to sympy
On 7 Mai, 23:51, "Aaron S. Meurer" <asmeu...@gmail.com> wrote:
> Another possibility would be to somehow take advantage of *args, which is ordered.

What about expand('mul', 'trig')?

Vinzent

smichr

unread,
May 8, 2010, 9:43:11 AM5/8/10
to sympy


On May 8, 5:37 pm, Vinzent Steinberg
<vinzent.steinb...@googlemail.com> wrote:
> On 7 Mai, 23:51, "Aaron S. Meurer" <asmeu...@gmail.com> wrote:
>
> > Another possibility would be to somehow take advantage of *args, which is ordered.
>
> What about expand('mul', 'trig')?

This works if you just want hints that should be on, but you would
have to do something like require a bool to be entered if you want
some on and some off:

foo(True, 'foo', 'bar', False, 'baz')

But this only works if no hint needs a string value.

======

as to use of the underscore for separation: use the double underscore
for the separator and require that no hint starts or ends with an
underscore.

======

Yes, I meant OptionParser which has built in help...but you would have
to figure out how to get it to parse something other than the
sys.argv[1:].

Aaron S. Meurer

unread,
May 8, 2010, 11:21:42 AM5/8/10
to sy...@googlegroups.com
It could work if the hints are callable objects as Sebastian suggests, though you would have to first dig up the objects in order to call them.

Aaron Meurer

William Ratcliff

unread,
May 13, 2010, 2:44:30 PM5/13/10
to sympy
Any thoughts on speed? When we have REALLY long expressions, we find
that subs (and collect) can both run rather slow. We're interested in
contributing to increasing their speed for common use cases...

Cheers,
William


On May 8, 8:37 am, Vinzent Steinberg

Sebastian

unread,
May 13, 2010, 5:13:34 PM5/13/10
to sympy
Yes, I definitely think it should be possible to make things faster:

* Don't rebuild expressions if nothing is substituted. In the moment
all expressions are rebuild every time. (I think this is somehow
mended by cashing).

* Parallel substitutions. If more than one pattern should be
substituted it currently does this sequential. This is of course
slower and also might lead to unwanted substitutions.

* It is planed to introduce keywords which control the matching
behavior of patterns. If rules that need many more combinations which
have to be tried out are turned off it will make things much faster.

I won't have much time this weekend, but if you have questions
regarding substitution and pattern matching I'll try to answer them as
good as I can...

Sebastian
Reply all
Reply to author
Forward
0 new messages