Improving subs (issue 2026)

Aaron Meurer

unread,

Jul 7, 2011, 10:54:20 PM7/7/11

to sy...@googlegroups.com

Hi everyone.

As many of you may know, the main thing blocking the merge of my work
on the Risch algorithm (see my integration3 branch) is not any
deficiency in the algorithm, though there are several parts that are
still not implemented, but the lack of a so called "atomic
substitution" framework. The relevant issue here is 2026
(http://code.google.com/p/sympy/issues/detail?id=2026).

Basically, the following breaks the preprocessing code in risch_integrate:

In [1]: exp(2*x).subs(exp(x), y)
Out[1]:
2
y

I need a way for subs to behave exactly, so the above would return
exp(2*x). Thus, I have disabled this completely in my integration3
branch, but this is only a temporary solution, as there is a lot of
code that relies on this behavior (especially in the series/limits
code), and it would be a regression anyway.

So there needs to be a way to do

>>> exp(2*x).subs(exp(x), y, atomic=True)
exp(2*x)

Now, as it turns out, it has come up in other places that people want
control over the way that subs works in other ways. In the issue, I
talk about something called integer_powers, which would work like

>>> exp(2*x).subs(exp(x), y, integer_powers=True)
y**2
>>> exp(x).subs(exp(2*x), y, integer_powers=True)
exp(x)

In other words, it does not do power manipulation in the replacement
unless the resulting power is an integer. This is needed in some
places such as the heurisch algorithm to ensure that the resulting
expression will be a polynomial (actually, a rational function) in the
substitution variable. In addition, there is also some concern about
the assumptions validity of certain algebraic substitution rules. See
issues 2081 and 2552.

So in the interest of doing this right, I think there needs to be some
kind of hints mechanism to subs. My question is, what do you think
would be the best way to implement this? Presently the expand
function has something like this, but I'm not really convinced that
the way that it's implemented is a very good one.

Here's (roughly) the way that subs works now: Basic defines two
methods, .subs and ._eval_subs. Basic.subs() is of course the user
level function that everyone calls, and pretty much no subclass of
Basic overrides it. The actual substitution happens in ._eval_subs,
which is also responsible for recursing the substitution down the
.args. Basic has a simple implementation, but most classes end up
overriding it (for example, exp has overridden it to allow the above
fancy algebraic substitution).

What's the best way to implement the various hints I want to add to
.subs()? A few things to take into consideration:

- .expand() works, as I mentioned earlier, by having
._eval_expand_hint() methods. I don't think this is the best way, so
that's why I'm asking here to see if anyone has any better ideas.

- It should remain backwards compatible with any class that defines
._eval_subs(self, old, new). Unfortunately, there wasn't much
foresight when this was originally designed, so the protocol does not
call for any *args or **kwargs. However, that doesn't necessarily
weigh those options out, as we could easily make Basic.subs() check
for an old style definition and ignore hints in that case.

- I haven't looked at it, but we might be able to implement at least
atomic substitution entirely in Basic (no class need override any
methods to get it to work). This is because it is so simple that the
default agnostic method might be able to do it entirely. The rule for
atomic substitution by the way is that expr.subs(old, new,
atomic=True) should replace old with new in expr if and only if old is
in expr.atoms(old.__class__).

So I'm open to any ideas on how to implement this, API-wise.

Also, Chris, did you start this at all in any of your branches and/or
are you willing to help with this?

Aaron Meurer

Chris Smith

unread,

Jul 8, 2011, 12:39:13 AM7/8/11

to sy...@googlegroups.com

On Thu, Jul 7, 2011 at 9:54 PM, Aaron Meurer <asme...@gmail.com> wrote:

Hi everyone.

As many of you may know, the main thing blocking the merge of my work
on the Risch algorithm (see my integration3 branch) is not any
deficiency in the algorithm, though there are several parts that are
still not implemented, but the lack of a so called "atomic
substitution" framework. The relevant issue here is 2026
(http://code.google.com/p/sympy/issues/detail?id=2026).

Basically, the following breaks the preprocessing code in risch_integrate:

In [1]: exp(2*x).subs(exp(x), y)
Out[1]:
2
y

I need a way for subs to behave exactly, so the above would return
exp(2*x). Thus, I have disabled this completely in my integration3
branch, but this is only a temporary solution, as there is a lot of
code that relies on this behavior (especially in the series/limits
code), and it would be a regression anyway.

So there needs to be a way to do

>>> exp(2*x).subs(exp(x), y, atomic=True)
exp(2*x)

I would call it `exact`, not `atomic` since Atom already has a precise meaning in sympy. (cf issue 2026.)

Now, as it turns out, it has come up in other places that people want
control over the way that subs works in other ways. In the issue, I
talk about something called integer_powers, which would work like

>>> exp(2*x).subs(exp(x), y, integer_powers=True)
y**2
>>> exp(x).subs(exp(2*x), y, integer_powers=True)
exp(x)

I would call this `extractive` and allow only changes that extract_multiplicatively or extract_additively allow as applicable. Those extractive functions are suppose to only do extractions that `retain the original form` so a fraction shouldn't be extracted from an integer. Those are changes that I implemented in t2 which never made it to sympy. This would also apply to powers since the substitution code for powers would only work if the exponent of the old pattern were multiplcatively extractable from the expression.

My t2 had exact and atomic (sympy's def of atomic) implemented and *explicit algebraic* implemented which you reviewed but didn't want to get ready for 0.7 ( http://code.google.com/p/sympy/issues/detail?id=2026#c5 ) In trying to resurrect that I did a general cleanup in pull request https://github.com/sympy/sympy/pull/234 that stalled with no further suggestions from anyone but no clear +1. [Of course I am biased, but I like the organization of that request and would have used the hint given to subs to call the correct _eval_subs_foo routine. (If you read the final comment I made maybe you will understand the organization and whether Ronan's comments are valid or not.)]

I think 3 _eval_subs_foo routines would be needed: _eval_subs_atomic, _eval_subs_exact, _eval_subs_extract ... and maybe the existing eval_subs if the extract doesn't match the expected current behavior. Also, `exact` and `atomic` might be made the same with a slight overhead by checking if expr is Atom or Function instead of just Atom.

If you like what I've already done I might be able to be of help. Check out my t2 branch and https://github.com/sympy/sympy/pull/234

Aaron Meurer

unread,

Jul 8, 2011, 12:51:58 AM7/8/11

to sy...@googlegroups.com

On Thu, Jul 7, 2011 at 10:39 PM, Chris Smith <smi...@gmail.com> wrote:
>
>
> On Thu, Jul 7, 2011 at 9:54 PM, Aaron Meurer <asme...@gmail.com> wrote:
>>
>> Hi everyone.
>>
>> As many of you may know, the main thing blocking the merge of my work
>> on the Risch algorithm (see my integration3 branch) is not any
>> deficiency in the algorithm, though there are several parts that are
>> still not implemented, but the lack of a so called "atomic
>> substitution" framework. The relevant issue here is 2026
>> (http://code.google.com/p/sympy/issues/detail?id=2026).
>>
>> Basically, the following breaks the preprocessing code in risch_integrate:
>>
>> In [1]: exp(2*x).subs(exp(x), y)
>> Out[1]:
>> 2
>> y
>>
>> I need a way for subs to behave exactly, so the above would return
>> exp(2*x). Thus, I have disabled this completely in my integration3
>> branch, but this is only a temporary solution, as there is a lot of
>> code that relies on this behavior (especially in the series/limits
>> code), and it would be a regression anyway.
>>
>> So there needs to be a way to do
>>
>> >>> exp(2*x).subs(exp(x), y, atomic=True)
>> exp(2*x)
>
> I would call it `exact`, not `atomic` since Atom already has a precise
> meaning in sympy. (cf issue 2026.)

I originally called this exact, but I though atomic was a better name
because of the invariant with .atoms(). Actually, thinking about
this, exact might still be a better name.

>>
>> Now, as it turns out, it has come up in other places that people want
>> control over the way that subs works in other ways. In the issue, I
>> talk about something called integer_powers, which would work like
>>
>> >>> exp(2*x).subs(exp(x), y, integer_powers=True)
>> y**2
>> >>> exp(x).subs(exp(2*x), y, integer_powers=True)
>> exp(x)
>
> I would call this `extractive` and allow only changes that
> extract_multiplicatively or extract_additively allow as applicable. Those
> extractive functions are suppose to only do extractions that `retain the
> original form` so a fraction shouldn't be extracted from an integer. Those
> are changes that I implemented in t2 which never made it to sympy. This
> would also apply to powers since the substitution code for powers would only
> work if the exponent of the old pattern were multiplcatively extractable
> from the expression.

Well, integer_powers isn't that great of a name, but extractive
doesn't seem to be much better imo. But anyway, the actual names are
irrelevant at this point, as they would be easy to change.

Well, you need to consider scalability. Right now, there are three
methods, but it could grow. I already think we might need a forth
hint, which would probably be called force, which ignores assumptions
(e.g., in issue 2552, (sqrt(x)*sqrt(y)).subs(x*y, z, force=True) would
produce sqrt(z) even if x and y are assumed to be complex). And I
could easily imagine the use of even more fine grained control.

expand() does this, with a lot of hints, and it's not a very good
system in my opinion. Unlike expand(), subs wouldn't be called in
multiple passes for each hint, but nonetheless, I still see the system
as one that could be improved (though how, I am still not yet sure).

Aaron Meurer

> If you like what I've already done I might be able to be of help. Check out
> my t2 branch and https://github.com/sympy/sympy/pull/234
>

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to
> sympy+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sympy?hl=en.
>

Vinzent Steinberg

unread,

Jul 8, 2011, 12:09:42 PM7/8/11

to sympy

On Jul 8, 6:51 am, Aaron Meurer <asmeu...@gmail.com> wrote:
> expand() does this, with a lot of hints, and it's not a very good
> system in my opinion.

I think the problem with expand it that it does a lot more at once
than it should be doing, and that it is hard to use otherwise. I don't
think the interface is fundamentally broken.

> Unlike expand(), subs wouldn't be called in
> multiple passes for each hint, but nonetheless, I still see the system
> as one that could be improved (though how, I am still not yet sure).

subs() could have a "method='method'" keyword argument.

Vinzent

Aaron Meurer

unread,

Jul 8, 2011, 4:51:24 PM7/8/11

to sy...@googlegroups.com

On Fri, Jul 8, 2011 at 10:09 AM, Vinzent Steinberg
<vinzent....@googlemail.com> wrote:
> On Jul 8, 6:51 am, Aaron Meurer <asmeu...@gmail.com> wrote:
>> expand() does this, with a lot of hints, and it's not a very good
>> system in my opinion.
>
> I think the problem with expand it that it does a lot more at once
> than it should be doing, and that it is hard to use otherwise. I don't
> think the interface is fundamentally broken.

Well, that problem would not show up here, since subs should only do
one pass, whereas expand does a pass for each expansion hint. Indeed,
we need to be careful that it does indeed only do exactly one pass, so
we don't get stuff like

>>> sin(x).subs(sin(x), 2*sin(x), **somehints)
4*sin(x)

But the problem with the way expand works with the _eval_expand_hint
methods is that each _eval_expand_hint method is responsible for
passing all the hints through recursively, and in such a way that it's
own hint is stopped from being passed through at the base case, to
prevent infinite recursion. The deep keyword also causes a lot of
problems here if it is not handled very carefully (if we ever decide
to implement a deep hint to subs, it should be implemented just the
same as any other hint).

If we do _eval_subs_hint() methods, there would be exactly the same
problem. To me, it's a very unclean way of doing things, and since
I'm implementing these hints almost from scratch, I'd like to do it
better. I say almost from scratch because we would need to maintain
backwards compatibility with older _eval_subs(old, new) (no **kwargs)
methods, which is perhaps another reason that we cannot use the expand
method here.

There's got to be a better object oriented way to do this.

>
>> Unlike expand(), subs wouldn't be called in
>> multiple passes for each hint, but nonetheless, I still see the system
>> as one that could be improved (though how, I am still not yet sure).
>
> subs() could have a "method='method'" keyword argument.
>
> Vinzent
>

I don't think that would be general enough. It needs to be
subs(**hints). I listed four types of substitution in the OP
(including the catch-all "algebraic", which tries to be as smart as
possible), but I could easily imagine more fine grained control. For
example, suppose we implement a substitution method that first tries
to rewrite expr in terms of old before substituting. This would
require making rewrite smarter, but it would work something like:

>>> cos(x).subs(sin(x), y, rewrite=True)
(1 - y**2)**(1/2)

But rewrite=True should be mutually exclusive to some other option,
say integer_powers:

>>> sqrt(1 - sin(x)**2).subs(cos(x)**2, y, rewrite=True, integer_powers=False)
sqrt(y)
>>> sqrt(1 - sin(x)**2).subs(cos(x)**2, y, rewrite=True, integer_powers=True)
sqrt(1 - sin(x)**2)

The only hint that I think might not make sense to combine with other
hints is exact, which should turn all other hints off. Because of
this, and the fact that I think exact substitution can be implemented
entirely in Basic without any method overriding necessary in
subclasses (basically, use the current Basic._eval_subs), I think
maybe exact should be completely separate from the rest of the subs
hints.

Or maybe it would make sense to use some kind of hints manager for
subs like Chris started to implement for expand().

Aaron Meurer

Brian Granger

unread,

Jul 9, 2011, 1:29:44 PM7/9/11

to sy...@googlegroups.com

> As many of you may know, the main thing blocking the merge of my work
> on the Risch algorithm (see my integration3 branch) is not any
> deficiency in the algorithm, though there are several parts that are
> still not implemented, but the lack of a so called "atomic
> substitution" framework. The relevant issue here is 2026
> (http://code.google.com/p/sympy/issues/detail?id=2026).
>
> Basically, the following breaks the preprocessing code in risch_integrate:
>
> In [1]: exp(2*x).subs(exp(x), y)
> Out[1]:
> 2
> y
>
> I need a way for subs to behave exactly, so the above would return
> exp(2*x). Thus, I have disabled this completely in my integration3
> branch, but this is only a temporary solution, as there is a lot of
> code that relies on this behavior (especially in the series/limits
> code), and it would be a regression anyway.

I have always thought that subs should not know about any mathematical
relationships, but should behave as you are proposing (atomic or
exact=True). In my mind, subs is a foundation that can be used to
build more advanced pattern matching and rule capabilities. But those
more advanced rules (such as done by power) should be in subs itself,
but in that higher level. Thus.

I am +1 on the exact or atomic keyword to subs (I prefer exact).

I am ++1 on having that be the default behavior

Cheers,

Brian

> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
>
>

--
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgra...@calpoly.edu and elli...@gmail.com

Vinzent Steinberg

unread,

Jul 9, 2011, 2:04:41 PM7/9/11

to sympy

On Jul 8, 10:51 pm, Aaron Meurer <asmeu...@gmail.com> wrote:
> Or maybe it would make sense to use some kind of hints manager for
> subs like Chris started to implement for expand().

Polys has also a nice option manager featuring dependencies and such.

Vinzent

Ronan Lamy

unread,

Jul 9, 2011, 2:48:52 PM7/9/11

to sy...@googlegroups.com

Le samedi 09 juillet 2011 à 10:29 -0700, Brian Granger a écrit :
> > As many of you may know, the main thing blocking the merge of my work
> > on the Risch algorithm (see my integration3 branch) is not any
> > deficiency in the algorithm, though there are several parts that are
> > still not implemented, but the lack of a so called "atomic
> > substitution" framework. The relevant issue here is 2026
> > (http://code.google.com/p/sympy/issues/detail?id=2026).
> >
> > Basically, the following breaks the preprocessing code in risch_integrate:
> >
> > In [1]: exp(2*x).subs(exp(x), y)
> > Out[1]:
> > 2
> > y
> >
> > I need a way for subs to behave exactly, so the above would return
> > exp(2*x). Thus, I have disabled this completely in my integration3
> > branch, but this is only a temporary solution, as there is a lot of
> > code that relies on this behavior (especially in the series/limits
> > code), and it would be a regression anyway.
>
> I have always thought that subs should not know about any mathematical
> relationships, but should behave as you are proposing (atomic or
> exact=True). In my mind, subs is a foundation that can be used to
> build more advanced pattern matching and rule capabilities. But those
> more advanced rules (such as done by power) should be in subs itself,
> but in that higher level. Thus.

I agree. We need some way of performing direct replacements, based only
on the structure of the expression. We also need a way to transform
expressions while preserving their mathematical meaning. Currently,
subs() tries to do both, which causes most of the problems with it.

>
> I am +1 on the exact or atomic keyword to subs (I prefer exact).
>
> I am ++1 on having that be the default behavior

-1 on adding keyword arguments. These are different operations that
require different interfaces and implementations. Lumping together
distinct functions under the same name via keyword switches always
creates a mess.

Aaron Meurer

unread,

Jul 9, 2011, 4:00:42 PM7/9/11

to sy...@googlegroups.com

I am actually -1 on making exact the default. The implementation
would be *very* dumb. None of the following would work:

(x*y*z).subs(x*y, 42, exact=True)
(x + y + z).subs(x + y, 42, exact=True)
(x**4).subs(x**2, 42, exact=True)
(x**2*y).subs(x*y, 42, exact=True)
(2*x + y).subs(x + y, 42, exact=True)

This is because exact subs *only* works if old is in
expr.atoms(old.__class__). So for example, in the first one,
(x*y*z).atoms(Mul) is set([x*y*z]), which does not contain x*y, so it
would not do anything.

Aaron Meurer

unread,

Jul 9, 2011, 4:06:19 PM7/9/11

to sy...@googlegroups.com

Yes, exactly. I actually need both in my Risch code: a way to do very
precise exact substitution of terms I find using .atoms(), and a
method that tries to be as mathematically smart as possible to "get a
term back in the expression" regardless of its form.

The former is needed for the preprocessing step, where I only want to
substitute dummy symbols for exactly the expressions I want, and the
latter is for after integration, where I often need to do a back
substitution to make the integral look like the integrand. For
example, to integrate 2**x, it first has to be converted to
exp(x*log(2)). But I try to replace it back after integration, so the
integral has 2**x like the user expects. If the integrand had 2**x
and the integral has exp(2*x*log(x)), it won't work without fancy
power substitution that recognizes that as (2**x)**2. I will also
need ways to convert trig expressions back and forth to each other
once I have that implemented.

>>
>> I am +1 on the exact or atomic keyword to subs (I prefer exact).
>>
>> I am ++1 on having that be the default behavior
>
> -1 on adding keyword arguments. These are different operations that
> require different interfaces and implementations. Lumping together
> distinct functions under the same name via keyword switches always
> creates a mess.