Thanks for responding. I think most of what I want to say is in
response to Matthew, but let me just mention that I agree with Ronan.
Integer is very different from int. Aside from the obvious difference
with division, every attribute and method of the classes it inherits
from are avaialble on it, giving it a standard interface. This is
fundamental to this discussion. Basic subclasses are Liskov
substitutable. I can write expr.args or expr.atoms or expr.xreplace
without worrying if expr is an Add or a Symbol or an Integer.
Division is a special case of this. This simply is not true for
strings and ints.
By the way, Ronan, which option do you prefer? Or is there another
even better option that only you can see?
On Tue, May 21, 2013 at 1:06 AM, Matthew Rocklin <
mroc...@gmail.com> wrote:
> I've added my thoughts to the wiki. I'm double posting them here. These
> follow Aaron's opinion section which is definitely worth a read.
>
> ### Matthew
>
> I mostly prefer 1
>
> - Small point but while it's fresh in your mind, `MatrixSymbol('x', 1,
> 2).args` can't be () easily. This is because the expression could be
> `MatrixSymbol('x', n, k+m)`. The second two arguments are the shape and in
> general are `Expr`s. We'll need to traverse down these so they'll either
> need to be args or we'll need to do a lot of special casing within
> `MatrixSymbol`. A pull request exists to replace `'x'` with `Symbol('x')`.
> This seems wrong to me because `'x'` here is a name, not a mathematical
> scalar. The right way to do this under 3 would be to create a
> `String(Basic)`. This also feels wrong to me because we're being forced to
> reinvent a class with no new functionality.
OK, you are right about this. MatrixSymbol clearly cannot be a leaf.
Ronan is right when he says that Symbol is not a replacement for str.
The two are unrelated. You can't do Symbol('xyz')[:1] or Symbol('
').join(['a', 'b', 'c']), and Symbol('x') + Symbol('y') != 'x' + 'y'.
But I think it doesn't matter. We don't need a full-fledged string,
just like we don't really need it anywhere that we use Symbol (for
those rare cases where we do do some string processing we just use
Symbol.name). The only aspect of strings that we use in Symbols
(aside from the fact that they are names) is equality, and for that,
they agree: Symbol(a) == Symbol(b) iff a == b.
So my recommendation here goes back to using Symbol as the first
argument of MatrixSymbol. Also, the class should clearly not subclass
from Symbol in this case (I don't remember if it already does).
Perhaps a mathematician's point of view is worth considering here
(this is my point of view, when I am in my graduate student shoes).
When we have a mathematical object like MatrixSymbol that is indexed
by another expression, in this case, the rows and columns, it is often
convenient to consider that object as a function. In this case, a
matrix symbol is a function from NxN -> M, where M is the set of all
matrices over whatever (complex numbers).
This functorial relationship for us means that those are children in
the expression tree. Now, I'm no logician, but to me, it makes sense
to consider a matrix symbol also as a function on the alphabet
(strings). The name we give to a matrix symbol is inconsequential to
what it is, at least mathematically.
Anyway, that analogy may or may not be completely off base, so take it
or leave it.
>
> - In general, I think that having all identifying information within args
> will simplify the code in the long term.
This is where I think you are completely wrong. As I noted in my
first bullet point of my opinions, it seems like it will do this,
because the invariant becomes simpler. But as I hope I demonstrated in
the three fundamental expression recursion algorithms, it will make
things much more complicated. The reason is that we have to check if
things are instances of Basic everywhere, and handle them differently.
Non-Basics don't have Basic interface (in particular, .args, but
really anything that we might want to do). We can't even be sure that
we can do math operations on non-Basics, even if they are Python
builtin numbers, because of the int/int problem (not to mention the
floating point precision issues).
If we go with option 1, what I see happening all throughout SymPy is
code being nested under "if isinstance(expr, Basic)" blocks. Very
little useful code will actually make sense on non-Basics; most if it
will fail with AttributeError or some other exception (if you don't
believe me, remove "arg = sympify(arg)" from the top of any algorithm
and try passing 0 to it).
> It also opens various performance
> improvements. See the following timings from my laptop
>
> ```
> In [2]: timeit Add(x, y, z)
> 100000 loops, best of 3: 2.15 us per loop
>
> In [3]: timeit Add(x, y, z, evaluate=False) # no idea why this is slower
> 100000 loops, best of 3: 4.14 us per loop
>
> In [4]: timeit Basic.__new__(Add, x, y, z)
> 1000000 loops, best of 3: 616 ns per loop
>
> In [5]: timeit (Add, x, y, z)
> 10000000 loops, best of 3: 96.7 ns per loop
> ```
You mentioned this before, and I didn't get it then either. How does
this have anything to do with what the leaves of the expression tree
are?
>
> Having all information in args turns the SymPy Basic into something like an
> s-expression (a glorified tuple). Certainly we don't want to go this far
> (we like lots of things that the Basic object gives to us), but having that
> simplicity enables more advanced manipulations of SymPy expressions with
> greater trust in their accuracy. If someone wrote code that required really
> intense expression manipulation they could switch into tuple mode (or list
> mode for mutability), do lots of manipulations quickly, then switch back and
> re-evaluate. If all information is in args then they could perform these
> transformations with high fidelity. I'm not proposing that this is the
> reason we should switch but rather that this is an example of something that
> is easy with an invariant like "All identifying information is in `.args`"
I claim that this is just as easy (in fact easier, because of the
isinstance(expr, Basic) issue) with the option 3 invariant. You just
treat objects with empty args as leaves. These are the symbolic atoms
from your lisp metaphor.
At the end of the day, *something* has to be a symbolic atom. You
can't recurse forever, until you hit the ones and zeros of the machine
representation and wonder how you can split those down into smaller
entities. I don't see why it has to be a non-Basic type.
>
> - Personally I manipulate SymPy expressions quite a bit. I often do this
> from outside SymPy (Theano, LogPy, Strategies). Having the "all identifying
> information is in `.args`" invariant makes SymPy significantly more
> interoperable if you don't have access to the codebase (or don't want to
> fuss with the codebase).
I don't understand what you mean by "access to the codebase". How are
you doing anything with any SymPy classes if you can't access them?
> Classes like Symbol, Rational, AppliedPredicate
> all end up requiring special attention in each of these endeavors. I'm
> mostly willing to put up with it because I'm familiar with them but
> potential collaborators are often turned off. I believe that
> interoperability is more important than we realize for long term growth.
Well I'd love to hear testimonials from these collaborators. I think
it's important for us as core contributors who are very familiar with
SymPy and its idioms to see how people who are new to it view things.
That will show us what things we are doing that need to be documented
better.
I suspect these people are falling into the trap of seeing the
simplicity of the option 1 invariant over the option 3 invariant,
without noticing that it really leads to more complicated code.
Aaron Meurer