Symbolic integrator using a neural network

311 views
Skip to first unread message

Aaron Meurer

unread,
Sep 27, 2019, 2:48:21 PM9/27/19
to sympy
There's a review paper for ICLR 2020 on training a neural network to
do symbolic integration. They claim that it outperforms Mathematica by
a large margin. Machine learning papers can sometimes make overzealous
claims, so scepticism is in order.

https://openreview.net/pdf?id=S1eZYeHFDS

The don't seem to post any code. The paper is in double blind review,
so maybe it will be available later. Or maybe it is available now and
I don't see it. If someone knows, please post a link here.

They do cite the SymPy paper, but it's not clear if they actually use SymPy.

I think it's an interesting concept. They claim that they generate
random functions and differentiate them to train the network. But I
wonder if one could instead take a large pattern matching integration
table like RUBI and train it on that, and produce something that works
better than RUBI. The nice thing about indefinite integration is it's
trivial to check if an answer is correct (just check if
diff(integral(f)) - f == 0), so heuristic approaches that can
sometimes give nonsense are tenable, because you can just throw out
wrong answers.

I'm also curious (and sceptical) on just how well a neural network can
"learn" symbolic mathematics and specifically an integration
algorithm. Another interesting thing to do would be to try to train a
network to integrate rational functions, to see if it can effectively
recreate the algorithm (for those who don't know, there is a complete
algorithm which can integrate any rational function). My guess is that
this sort of thing is still beyond the capabilities of a neural
network.

Aaron Meurer

Vishesh Mangla

unread,
Sep 27, 2019, 2:50:04 PM9/27/19
to sy...@googlegroups.com

Integration by parts definitely needs ML.

 

Sent from Mail for Windows 10

--

You received this message because you are subscribed to the Google Groups "sympy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6LfA6q0LLUNF_VM29hqtX4tPHZuJeuKEChn5AeMJWtaBg%40mail.gmail.com.

 

Ondřej Čertík

unread,
Sep 28, 2019, 1:56:23 AM9/28/19
to sympy
On Fri, Sep 27, 2019, at 12:48 PM, Aaron Meurer wrote:
> There's a review paper for ICLR 2020 on training a neural network to
> do symbolic integration. They claim that it outperforms Mathematica by
> a large margin. Machine learning papers can sometimes make overzealous
> claims, so scepticism is in order.
>
> https://openreview.net/pdf?id=S1eZYeHFDS
>
> The don't seem to post any code. The paper is in double blind review,
> so maybe it will be available later. Or maybe it is available now and
> I don't see it. If someone knows, please post a link here.
>
> They do cite the SymPy paper, but it's not clear if they actually use SymPy.

They wrote:

"The validity of a solution itself is not provided by the model, but by an external symbolic framework (Meurer et al., 2017). "

So that seems to suggest they used SymPy to check the results.

>
> I think it's an interesting concept. They claim that they generate
> random functions and differentiate them to train the network. But I
> wonder if one could instead take a large pattern matching integration
> table like RUBI and train it on that, and produce something that works
> better than RUBI. The nice thing about indefinite integration is it's
> trivial to check if an answer is correct (just check if
> diff(integral(f)) - f == 0), so heuristic approaches that can
> sometimes give nonsense are tenable, because you can just throw out
> wrong answers.
>
> I'm also curious (and sceptical) on just how well a neural network can
> "learn" symbolic mathematics and specifically an integration
> algorithm. Another interesting thing to do would be to try to train a
> network to integrate rational functions, to see if it can effectively
> recreate the algorithm (for those who don't know, there is a complete
> algorithm which can integrate any rational function). My guess is that
> this sort of thing is still beyond the capabilities of a neural
> network.

I saw this paper too today. My main question is whether their approach is better than Rubi (say in Mathematica, as it doesn't yet work 100% in SymPy yet). They show that their approach is much better than Mathematica, but so is Rubi.

The ML approach seems like a brute force. So is Rubi. So it's fair to compare ML with Rubi. On the other hand, I feel it's unfair to compare brute force with an actual algorithm, such as Risch.

Ondrej

Aaron Meurer

unread,
Sep 28, 2019, 2:30:30 AM9/28/19
to sympy
It actually isn't clear to me yet that they've shown it. I want to see
what their test suite of functions looks like.

Aaron Meurer

>
> The ML approach seems like a brute force. So is Rubi. So it's fair to compare ML with Rubi. On the other hand, I feel it's unfair to compare brute force with an actual algorithm, such as Risch.
>
> Ondrej
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/db41cf67-acc9-4a84-8267-2742b748de4d%40www.fastmail.com.

Francesco Bonazzi

unread,
Sep 28, 2019, 6:47:03 AM9/28/19
to sympy
Their paper appears to be an attempt at using the transformer model for language translation to symbolic math.

There is a Jupyter notebook with an example on how to create a translator from Portuguese to English using the transformer model:

If someone has some spare time, it would be interesting to see how this model would perform with SymPy (just add a tokenizer to the output of srepr and replace the Portuguese-English dataset).
> To unsubscribe from this group and stop receiving emails from it, send an email to sy...@googlegroups.com.

Oscar Benjamin

unread,
Sep 28, 2019, 9:27:35 AM9/28/19
to sympy
Neural nets are trained for a particular statistical distribution of
inputs and in the paper they describe their method for generating a
particular ensemble of possibilities. There might be something
inherent about the examples they give that means they are all solved
using a particular approach. From their description I could imagine
writing a pattern-matching integrator that would explicitly try to
reverse the way the examples are generated.

Perhaps the examples from e.g. the SymPy test suite would in some ways
represent a more "natural" distribution since they are written by
humans and show the kinds of problems that humans wanted to solve. It
would be interesting to see how the accuracy of the net looks on that
distribution of inputs (although any comparison with SymPy on that
data would be unfair).

Oscar
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAKgW%3D6JExav6cUscxJbv%2B9QhY_S2AP52L9ycNzA%2Be2jtgdGSQQ%40mail.gmail.com.

David Bailey

unread,
Sep 28, 2019, 12:38:45 PM9/28/19
to sy...@googlegroups.com
On 28/09/2019 14:27, Oscar Benjamin wrote:
Neural nets are trained for a particular statistical distribution of
inputs and in the paper they describe their method for generating a
particular ensemble of possibilities. There might be something
inherent about the examples they give that means they are all solved
using a particular approach. From their description I could imagine
writing a pattern-matching integrator that would explicitly try to
reverse the way the examples are generated.

Perhaps the examples from e.g. the SymPy test suite would in some ways
represent a more "natural" distribution since they are written by
humans and show the kinds of problems that humans wanted to solve. It
would be interesting to see how the accuracy of the net looks on that
distribution of inputs (although any comparison with SymPy on that
data would be unfair).

I suppose a fair test might be to take a set of integrals from Abramowitz and Stegun, but of course, for indefinite integrals the accuracy should be 100% because the output can be checked by differentiation.

Sadly the paper mainly focuses on the generation of the test sets, because the rest is inevitably completely opaque!

I noticed this sentence:

"Unfortunately, functions do not have an integral which can be expressed with usual functions (e.g.f(x) = exp(x2) or f(x) = log(log(x))), and solutions to arbitrary differential equations cannot always be expressed with usual functions."

This suggests that their integrator will not handle anything that resolves to a higher transcendental functions - e.g. elliptic integrals.

I guess this is a rather special case of a significant maths problem that is hard in one direction but easy to check, and where pattern matching can be used extensively. It would be sad if one day the whole of SymPy were replaced with an opaque program like this - but I don't think that is likely.

David


Gagandeep Singh (B17CS021)

unread,
Sep 28, 2019, 1:43:23 PM9/28/19
to sympy
When I skimmed through the paper, I had the following queries:

1. Is integration really a tree to tree translation? Because, neural network is predicting the resulting expression tree for the input equation. However, integration is not a predictive operation. Moreover, how can we define that whether the model is over-fitting or not on the training data? What was the size of the training data set?

2. Does the data set of equations contains noise? Is there any uncertainty in the data set? For example, while translating from one language to another there are chances that one word can be mapped to different ones having the same meaning. However, here, it is not the case, there may be multiple results but we can check whether they are correct or not with 100% surety. 

3. Is this model able to generalise over any mathematical expression. The way they generated data sets is algorithmic and deterministic. It is not random(random number generators are itself deterministic). So, how can we say that this model is the one that outperforms any CAS?

Neural Networks don't learn, the way human beings do. They just imitate the underlying distribution of the data. But, I don't think that mathematical expressions have any such underlying distribution. Well, we can take a subset of those expressions which can have a distribution and I think that's the case with their model.

PS - I can be wrong at any many places above as, I am just a beginner in ML/DL/AI. Please correct me wherever possible. Thanks.

Francesco Bonazzi

unread,
Sep 29, 2019, 9:06:52 AM9/29/19
to sympy


On Saturday, 28 September 2019 19:43:23 UTC+2, Gagandeep Singh (B17CS021) wrote:
When I skimmed through the paper, I had the following queries:

1. Is integration really a tree to tree translation? Because, neural network is predicting the resulting expression tree for the input equation. However, integration is not a predictive operation. Moreover, how can we define that whether the model is over-fitting or not on the training data? What was the size of the training data set?

They claim they write the expression tree into a sequence using Polish-notation. After that they train a sequence-to-sequence model developed by Google for machine translation (I have previously posted a link to a notebook containing an example of that model used for Portuguese to English translation).

They discuss the possibility that the resulting sequence in Polish-notation may not be parseable into an expression tree, but they claim that this rarely occurs.

Any correct result for the equation is OK, as long as it is correct. The current SymPy integrator only returns one result (even if there are multiple).


2. Does the data set of equations contains noise? Is there any uncertainty in the data set? For example, while translating from one language to another there are chances that one word can be mapped to different ones having the same meaning. However, here, it is not the case, there may be multiple results but we can check whether they are correct or not with 100% surety. 

It would be interesting to see how many results are actually parseable (let alone correct). If a translation gives you a grammar mistake, you can still read the text, but it doesn't work like that on expression trees.


3. Is this model able to generalise over any mathematical expression. The way they generated data sets is algorithmic and deterministic. It is not random(random number generators are itself deterministic). So, how can we say that this model is the one that outperforms any CAS?

I really doubt it will outperform CAS algorithms, but let's wait for their results first.


Neural Networks don't learn, the way human beings do. They just imitate the underlying distribution of the data. But, I don't think that mathematical expressions have any such underlying distribution. Well, we can take a subset of those expressions which can have a distribution and I think that's the case with their model.

I think it's more promising to use mixtures of ML and rules. For example, you have many possible rules to apply and use ML/neural networks to decide which rules to try first.

I'm always a bit skeptical of neural networks, in the end they work as huge black boxes.

Aaron Meurer

unread,
Sep 29, 2019, 6:46:41 PM9/29/19
to sympy
On Sun, Sep 29, 2019 at 7:06 AM Francesco Bonazzi
<franz....@gmail.com> wrote:
>
>
>
> On Saturday, 28 September 2019 19:43:23 UTC+2, Gagandeep Singh (B17CS021) wrote:
>>
>> When I skimmed through the paper, I had the following queries:
>>
>> 1. Is integration really a tree to tree translation? Because, neural network is predicting the resulting expression tree for the input equation. However, integration is not a predictive operation. Moreover, how can we define that whether the model is over-fitting or not on the training data? What was the size of the training data set?
>
>
> They claim they write the expression tree into a sequence using Polish-notation. After that they train a sequence-to-sequence model developed by Google for machine translation (I have previously posted a link to a notebook containing an example of that model used for Portuguese to English translation).
>
> They discuss the possibility that the resulting sequence in Polish-notation may not be parseable into an expression tree, but they claim that this rarely occurs.
>
> Any correct result for the equation is OK, as long as it is correct. The current SymPy integrator only returns one result (even if there are multiple).
>
>>
>> 2. Does the data set of equations contains noise? Is there any uncertainty in the data set? For example, while translating from one language to another there are chances that one word can be mapped to different ones having the same meaning. However, here, it is not the case, there may be multiple results but we can check whether they are correct or not with 100% surety.
>
>
> It would be interesting to see how many results are actually parseable (let alone correct). If a translation gives you a grammar mistake, you can still read the text, but it doesn't work like that on expression trees.

I'm also curious what it returns for non integrable functions. Like
what does it think the integral for exp(-x^2) is and is it in any way
relatable to the error function.

>
>>
>> 3. Is this model able to generalise over any mathematical expression. The way they generated data sets is algorithmic and deterministic. It is not random(random number generators are itself deterministic). So, how can we say that this model is the one that outperforms any CAS?
>
>
> I really doubt it will outperform CAS algorithms, but let's wait for their results first.
>
>>
>> Neural Networks don't learn, the way human beings do. They just imitate the underlying distribution of the data. But, I don't think that mathematical expressions have any such underlying distribution. Well, we can take a subset of those expressions which can have a distribution and I think that's the case with their model.
>
>
> I think it's more promising to use mixtures of ML and rules. For example, you have many possible rules to apply and use ML/neural networks to decide which rules to try first.
>
> I'm always a bit skeptical of neural networks, in the end they work as huge black boxes.

(Indefinite) integration is a great fit because it's trivial to tell
if an answer is correct or not (up to some caveats, but at least you
won't ever get any false positives). So any kind of heuristic can be
useful. Even if it returns wrong answers some of the time, but right
answers a good fraction of the time, it can be useful.

Other CAS algorithms don't have this property, so I'd be much more
sceptical of using machine learning to solve them. For instance people
sometimes want to use some kind of genetic algorithm or something to
do expression simplification. But unless you constrain it to only
perform mathematically correct transformations, you cannot guarantee
that the resulting "simplification" is correct, and testing expression
equality is an unsolvable problem that can only be done heuristically
in general.

Aaron Meurer

>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/644a1d7a-48e7-4bcb-842d-223ba225091b%40googlegroups.com.

oldk1331

unread,
Oct 8, 2019, 10:16:32 PM10/8/19
to sy...@googlegroups.com
(This mail is copied from my response at maxima mailing list.)

My opinion on this paper:

First, their dataset (section 4.1) can be greatly improved using
existing integration theory, Risch algorithm says that every elementary
function integration can be reduced to 3 cases: transcendental (only
contains rational functions and exp/log/tan, other trigonometric
functions can transform to 'tan'), algebraic (only contains rational
functions and nth-root ^), and mixed-case.

So their method to prepare the dataset concentrates greatly on the
transcendental cases, extremely lacks algebraic cases. And they uses
only numbers from -5 to 5. I think it scales badly for wider ranges of
numbers.

For transcendental cases, I think FriCAS has fully implemented this
branch of Risch algorithm, so it should always give correct result. For
algebraic cases, I highly doubt that this ML program can solve
integrate(x/sqrt(x^4 + 10*x^2 - 96*x - 71), x) =
log((x^6+15*x^4+(-80)*x^3+27*x^2+(-528)*x+781)*(x^4+10*x^2+(-96)*x+(-71))^(1/2)+(x^8+20*x^6+(-128)*x^5+54*x^4+(-1408)*x^3+3124*x^2+10001))/8

In fact, I doubt that this program can solve some rational function
integration that requires Lazard-Rioboo-Trager algorithm to get
simplified result.

So I think this ML program has many flaws, but we can't inspect it.

> I'm also curious (and sceptical) on just how well a neural network can
> "learn" symbolic mathematics and specifically an integration
> algorithm. Another interesting thing to do would be to try to train a
> network to integrate rational functions, to see if it can effectively
> recreate the algorithm (for those who don't know, there is a complete
> algorithm which can integrate any rational function). My guess is that
> this sort of thing is still beyond the capabilities of a neural
> network.

I totally agree.

- Qian

Aaron Meurer

unread,
Jan 10, 2020, 12:25:40 PM1/10/20
to sympy
For those who didn't see, the final paper was posted with many updates
https://arxiv.org/abs/1912.01412. The newest version addresses some of
the things that were discussed here, and makes more use of SymPy,
including demonstrating some integrals that SymPy cannot solve, as
well as making it clearer how SymPy was used to check the results of
integration.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/ca99c7a0-b372-82cb-74e6-2ff978f40d54%40gmail.com.

S.Y. Lee

unread,
Apr 16, 2020, 5:50:01 AM4/16/20
to sympy
They have opened the source code and the dataset
> To unsubscribe from this group and stop receiving emails from it, send an email to sy...@googlegroups.com.

Aaron Meurer

unread,
Apr 16, 2020, 6:27:39 PM4/16/20
to sympy
FWIW the license they chose (CC-BY-NC) isn't actually open source. But
at least the code is there if you want to run it.

Aaron Meurer
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/9d091321-07dd-4ade-9442-795282269627%40googlegroups.com.

Jason Moore

unread,
Apr 16, 2020, 6:40:53 PM4/16/20
to sympy
The license they chose is open source, but it just isn't readily compatible with OSI approved licenses.


Jason

Aaron Meurer

unread,
Apr 16, 2020, 9:59:24 PM4/16/20
to sympy
Non-commercial licenses aren't open source by the OSI open source
definition https://opensource.org/osd-annotated (see points 5 and 6).
I think it's important that we don't use the term "open source" for a
license unless it fits that definition, and, ideally, is OSI approved.

There are a lot of issues with non-commercial license. There are some
links here explaining why.
https://en.wikipedia.org/wiki/Creative_Commons_NonCommercial_license#Commentary.
Basically, the definition of what is considered "commercial" is much
broader than what you would expect.

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAP7f1AgScnV30J297a%3DZ8yo8gmvDqXx%2BWgR9fhi5FrqSfVsisw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages