All,
In my spare time, I've been working on implementing a fast pattern matcher that accounts for Associative and Commutative symbols. It's going to be a while before I'm ready to release the code (it needs some serious cleanup), but as of now it is "partly functional". Some notation:
T = set of known patterns. A trie like data structure is used to preprocess these patterns to aid in fast matching.
s = expression to be matched
x, y = variables
a, b, c = constants
Currently the matcher can determine what patterns in T match s, in the presence of associative and commutative symbols, as long as the pattern is "linear". By linear, I mean no repeated *variable* symbols (`x + y` is acceptable, `x + x` is not). If no associative and commutative symbols are present, it can also determine a match substitution to make `s` equivalent with the pattern.
Some design questions before I continue work:
- Is there any use for just determining what patterns match, without generating a match substitution? Determining if a match exists is faster than determining a specific substitution. However, in the presence of nonlinear patterns a substitution needs to be found to verify the match is valid. This decision determines how tightly coupled the two algorithms will be.
- Is there need for nonlinear patterns? I plan to account for them, but they make the algorithm a bit more complicated. Nonlinear, AC pattern matching is NP complete. Linear AC pattern matches can be found in polynomial time.
- In a general case, there can be many substitutions that match a specific pattern mod AC. For example, given `t = x*y`, and `s = a*b*c`, this can match with {x: a*b, y: c}, or {x:a, y: b*c}, etc... This is easy to account for, and can be done lazily. Api-wise, are there cases where generating more than one matching substitution are needed?
- The algorithm lazily determines matches. Currently preference is given to matching variables before constants, but it could go the other way around. i.e. if T = {a + b, x + b}, and s = a + b, then the matcher will find `x + b` before `a + b`. I'd rather not make this configurable, as it complicates things. The question is, do we want the most general match first, or the most specific?
- At the moment, the code is completely removed from dependence on sympy, and can function on its own. As such, I'm planning on making it a separate library that can be used by SymPy as a dependency. I understand we were against that in the past, but this may be changing? I'd rather not maintain two separate codebases.
- There's a `commutative` assumption in sympy, but there doesn't seem to be an `associative` one. How can I check if a function/operator is associative? Does this information need to be kept elsewhere?
All,
In my spare time, I've been working on implementing a fast pattern matcher that accounts for Associative and Commutative symbols. It's going to be a while before I'm ready to release the code (it needs some serious cleanup), but as of now it is "partly functional". Some notation:
T = set of known patterns. A trie like data structure is used to preprocess these patterns to aid in fast matching.
s = expression to be matched
x, y = variables
a, b, c = constants
Currently the matcher can determine what patterns in T match s, in the presence of associative and commutative symbols, as long as the pattern is "linear". By linear, I mean no repeated *variable* symbols (`x + y` is acceptable, `x + x` is not). If no associative and commutative symbols are present, it can also determine a match substitution to make `s` equivalent with the pattern.
Some design questions before I continue work:
- Is there any use for just determining what patterns match, without generating a match substitution? Determining if a match exists is faster than determining a specific substitution. However, in the presence of nonlinear patterns a substitution needs to be found to verify the match is valid. This decision determines how tightly coupled the two algorithms will be.
- Is there need for nonlinear patterns? I plan to account for them, but they make the algorithm a bit more complicated. Nonlinear, AC pattern matching is NP complete. Linear AC pattern matches can be found in polynomial time.
- In a general case, there can be many substitutions that match a specific pattern mod AC. For example, given `t = x*y`, and `s = a*b*c`, this can match with {x: a*b, y: c}, or {x:a, y: b*c}, etc... This is easy to account for, and can be done lazily. Api-wise, are there cases where generating more than one matching substitution are needed?
- The algorithm lazily determines matches. Currently preference is given to matching variables before constants, but it could go the other way around. i.e. if T = {a + b, x + b}, and s = a + b, then the matcher will find `x + b` before `a + b`. I'd rather not make this configurable, as it complicates things. The question is, do we want the most general match first, or the most specific?
- At the moment, the code is completely removed from dependence on sympy, and can function on its own. As such, I'm planning on making it a separate library that can be used by SymPy as a dependency. I understand we were against that in the past, but this may be changing? I'd rather not maintain two separate codebases.
- There's a `commutative` assumption in sympy, but there doesn't seem to be an `associative` one. How can I check if a function/operator is associative? Does this information need to be kept elsewhere?
> Nonlinear, AC pattern matching is NP complete. Linear AC pattern matches
> can be found in polynomial time.
Interesting. Why is that?
>The question is, do we want the most general match first, or the
> most specific?
Whichever one is the correct behavior for a dispatch system. I *think*
that means specific, but I could be getting it backwards.
I'm assuming that the whole point of having a set of patterns rather
than a single pattern at a time is so you can do this.
You mean expressions like Add(x, x, evaluate=False) ? I think there should be a way to check where an expression is in a canonical order.
Maybe it is preferrable that the first lazily generated substitution appears in the same order as Mathematica's pattern matching. It would make Mathematica's code more easily portable to SymPy.
What kind of AST are you matching?
One last complication: have you had any thought of how this could be extended to an AST node containing a PermutationGroup instance determining how its arguments commute/anticommute? That could have some applications in the tensor module, but I guess it's hard or impossible to find a polynomial algorithm for that.
By the way, are you using Wild objects, or are you specifying wilds by an extra parameter?
In unify.rewriterule, you just put in two expressions, and specify which ones are variables (i.e. wilds) in another parameter. Furthermore, there is an option to put a filtering lambda function and assumptions, so that matches should be compatible with a test function and/or SymPy's new-style assumptions.
Awesome, thanks for doing this.
Another major use case is selecting the "best" match. In this case we might also want to control the order in which matches come out of the sequence so that matches that we think are best come first.
In my ideal world I might be able to decide the order in which we traverse the tree of possible matches. I don't know exactly how your algorithm works but, in some algorithms, we obtain a set of partial matches at each step and then have to decide which branch to pursue first. I've found that accepting a user-provided objective function is useful. It allows the user to specify a greedy search strategy.
Is it a basic building block for other algorithms to use? Then favor
predictable performance over features. Something that has unpredictable
performance isn't very attractive to an algorithm author.
Are you aiming for something that can solve as many situations as
possible? Then aim for more features.
I'm somewhat sceptical whether just pattern matching will be the right
approach, but I suspect you aren't after this one anyway :-)
I don't know if your AC matcher is intended to be used for (say) arithmetic expressionsor something else. But if arithmetic -- your stuff also needs to deal with identities.In that case ifyou don't handle identities, your matcher becomes far less interesting.
There's a long history of pattern matching "fast" including work by Richard Jenks, (Scratchpad, predecessor of Axiom).
The literature accessible to me on Associative-Commutative-Identity many-to-one pattern matching is lacking, or I'm not the best at finding it. After some thinking, the formation of matches should still work mod identities, if they are as I described above.
I'm not certain that the lack of this constitutes "far less interesting", but I can see how it could be useful.
There's a long history of pattern matching "fast" including work by Richard Jenks, (Scratchpad, predecessor of Axiom).
The papers I've read have been almost exclusively from the theorem proving world. The work of Jim Christian on HIPER and Leo Bachmair on REVEAL specifically, which only cover matching mod AC. I only found one relevant paper by Jenks, and it had only 3 paragraphs on pattern matching (just describing the syntax of it in his system).
A pattern compiler
| Tools and Resources
Share: |
I'm sure the code in macsyma/maxima would be quite useful, but being as lisp-illiterate as I am, I can't make much use of it. Thanks for the tip though.
- Jim
Awesome.
> The papers I've read have been almost exclusively from the theorem proving
> world.
I think you should be mostly fine working off these.
Essentially it's all tree matching of some kind. Things will start to
diverge as soon as domain specifics start to matter; it would be nice to
have a not too domain-specific basic building block and add the various
strategies on top of that.
--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/345a4a58-a44f-48a4-8f8e-64b920c1d65e%40googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "sympy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sympy/-7lyMkiB5SY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sympy+un...@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6LQqOTcs500bg6_RZFJkZEdGAObXmi01QKAHSO7nxs4gg%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+unsubscribe@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/547A4F57.6050902%40durchholz.org.
For that, firstly, you'd need to understand what exactly each design
decision in Peter's (or anybody else's) code was intended to achieve.
For that, merely learning Lisp is not enough, you need to have mastered
it - which is a few years of experience away for most of us (except for
you, I assume, and maybe the odd one or two lurkers on this list).
Second, you need to understand all about how to design the equivalents
in Python, dropping those design patterns that don't work in Python, and
replacing them with something that works as well.
(For example,
everything macro would have to be replaced or redone; is there anything
relevant left if you take out macros in the useful Lisp work?)
So you
need to be a Python master as well. Given your record I doubt that you
are, I certainly am not, and those people around here that I'd consider
Python masters, I doubt that any of them is also a Lisp master.
Third, the resulting code needs to be maintainable by people who do not
know anything about Lisp, and barely enough of Python to become a GSoC
student.
I have serious doubts that Lisp style is going to work for that, at all,
even if you let a Lisp And Python Master do the translation.
So right now, my point of view is that trying to get that (doubtlessly
very valuable) Lisp experience into SymPy, while it could do all kinds
of good, is probably not practically possible.
Unless somebody can delineate a clear path to overcame the various
obstacles I just enumerated.
> I highly recommend it. In fact, I would go further than that, and say
> that I would expect you to re-invent much of what he writes out,
> or alternatively, have a severely deficient program.
Sure, but if his book makes you learn Lisp, the presentation is heavily
biased towards Lisp, and much of what he writes won't be transferrable.
My recommendation would be to reinvent first (and discover how the
Python patterns for that particular field work), then read the Lisp
equivalents, then transfuse the ideas.
The point being that once enough experience with the Python patterns is
available, it's easier to see what to transfer and what to better leave
untransferred.
YMMV :-)
Regards,
Jo
<big snip>
.Another question is, how can you teach the pattern matcher that a function maps to an identity, like cos(a)*x can match x with a = 0, or x**a*y can match y with a = 0?
Am 30.11.2014 um 06:53 schrieb Richard Fateman:
First-class functions are easy in Python (easier than in C/C++).
What you can't do easily is macros. There's a "lambda" construct but
it's a bit too clunkly to be useful for the small helpful things, so if
you can't do it in a function it's no more good Python design.
Partial evaluation isn't doable at all.
The salient point being "generally" here.
I had several instances of MEGO trying to understand why the CL standard
library did what it does. (Particularly the parts that enable multiple
dispatch - those facilities made me think of a toolbox full of loaded
and unlocked guns.)
Also, you need to be 150% clear about the semantics of quote. I found it
extremely hard to deal with that - the theory was easy enough, yet in
practice I found myself tripping over it at every other opportunity, and
it didn't get better with practice.
> 2. If they are not, it is probably a crappy unnecessary use of macros,
> written by a naive programmer.
:-)
> 3. Look at Norvig's book and see how often he uses macros. Rarely.
> And see how long they are. (2 lines?)
I'm unconcerned about Peter's use of macros.
What I'm concerned about is the code that they replace, and how good
that code could be abstracted out on the Python side.
What kind of
design we'd need, whether it would fit with the rest of the design. What
I fear most is a downhill creep in quality - each time you build a
workaround for a macro you can't do in Python, you lose just a small
amount of design quality, but if you have a dozen macros, the overall
design could have become so bad it's unusable.
>> Third, the resulting code needs to be maintainable by people who do not
>> know anything about Lisp, and barely enough of Python to become a GSoC
>> student.
>
> This is a real problem. If you have unskilled and unknowledgeable personnel
> "maintaining" a code base, that's a tragedy in the works. It means that
> someone
> who is skilled and knowledgeable will have to go over the code and fix it.
Fortunately, the unskilled people who do not become skilled tend to go
away in a volunteer project.
It's still something to look out for. Some of our codebase has become
unmaintainable already (I'm looking at you, runtests.py - though that
specific file does not suffer from design problems but from too many
feature additions and too little refactoring).
> I suppose I could write programs that are challenging to understand and
> to translate; I understand Python is lacking functional objects with more
> than one argument;
Actually that's not the case.
> I could write stuff relying on multiprocessing which
> presumably would differ. But that's not run-of-the-mill.
Agreed.
>> So right now, my point of view is that trying to get that (doubtlessly
>> very valuable) Lisp experience into SymPy, while it could do all kinds
>> of good, is probably not practically possible.
>
> All I can say is, try to read Norvig's book. My sincere belief is that
> even if you never write a single line of Lisp, it will improve your
> programming
> in Python, because you will think about problems (like pattern matching)
> more clearly.
Quite possible.
My suggestion would be to try one's hand, then read Peter's book, then
integrate the new ideas.
YMMV.
>> Sure, but if his book makes you learn Lisp, the presentation is heavily
>> biased towards Lisp, and much of what he writes won't be transferrable.
>>
> Totally disagree.
I can imagine.
Let me say the more it makes you learn Lisp, the less it's probably
going to be useful if you don't do Lisp. Maybe I was overestimating how
much it makes you learn Lisp - a dose of Lisp is certainly useful, the
full Kool-Aid can become a liability (due to overspecialization, and
because Kool-Aid is generally problematic unless you can switch entirely).
> There is an old saying, 2 weeks in the computer lab can save 1 hour in
> the library.
True when learning tried-and-true issues.
False when it comes to trying and judging whether it's true.
We can certainly benefit from algorithms.
Less so from "doing things the Lisp way" if we're working in Python.
The problem is that separating algorithms from "the Lisp way" isn't
always easy. And an important part of writing a framework in programming
language X is "doing it the X way" (says he who has participated in such
projects for multiple values of X).
Regards,
Jo
>> Partial evaluation isn't doable at all.
>>
> I don't know what you mean by partial evaluation.
Difference between lazy and eager evaluation.
> Is this something you
> think that Lisp does? Maybe confused about quote?
Quote is Lisp's way of deferring evaluation, yes.
Well, sort of, unevaluated expressions are also lists.