the "deep learning" "neural network" symbolic integrator

21 views
Skip to first unread message

Qian Yun

unread,
Nov 16, 2020, 7:35:12 AM11/16/20
to fricas...@googlegroups.com
Hi guys,

I assume you all know the paper "DEEP LEARNING FOR SYMBOLIC MATHEMATICS"
by facebook AI researchers, almost one year ago, posted on
https://arxiv.org/abs/1912.01412

And the code was posted 8 months ago:
https://github.com/facebookresearch/SymbolicMathematics

Have you played with it?

Finally I have some time recently and played with it for a while,
and I believe I found some flaws. I will post my findings with
more details later. And it's a really interesting experience.

If you have some spare time and want to have fun, I strongly
advise you to play with it and try to break it :-)

Tips: to run the jupyter notebook example, apply the following
patch to run it on CPU instead of CUDA:

- Best,
- Qian

=====================

diff --git a/beam_integration.ipynb b/beam_integration.ipynb
index f9ef329..00754e3 100644
--- a/beam_integration.ipynb
+++ b/beam_integration.ipynb
@@ -64,6 +64,6 @@
"\n",
" # model parameters\n",
- " 'cpu': False,\n",
+ " 'cpu': True,\n",
" 'emb_dim': 1024,\n",
" 'n_enc_layers': 6,\n",
" 'n_dec_layers': 6,\n",
diff --git a/src/model/__init__.py b/src/model/__init__.py
index 2b0a044..73ec446 100644
--- a/src/model/__init__.py
+++ b/src/model/__init__.py
@@ -38,7 +38,7 @@
# reload pretrained modules
if params.reload_model != '':
logger.info(f"Reloading modules from {params.reload_model} ...")
- reloaded = torch.load(params.reload_model)
+ reloaded = torch.load(params.reload_model,
map_location=torch.device('cpu'))
for k, v in modules.items():
assert k in reloaded
if all([k2.startswith('module.') for k2 in
reloaded[k].keys()]):
diff --git a/src/utils.py b/src/utils.py
index bd90608..ef87582 100644
--- a/src/utils.py
+++ b/src/utils.py
@@ -25,7 +25,7 @@
FALSY_STRINGS = {'off', 'false', '0'}
TRUTHY_STRINGS = {'on', 'true', '1'}

-CUDA = True
+CUDA = False


class AttrDict(dict):

Qian Yun

unread,
Nov 21, 2020, 9:41:45 AM11/21/20
to fricas...@googlegroups.com
OK, here is some of my preliminary findings of this deep
learning system (I'll refer as DL):

(The test are done with a beam size of 10, which means this DL
will give 10 answers that it deems most likely.)
(And I use the official FWD + BWD + IBP trained model.)

1. It doesn't handle large numbers very well.

For example, to integrate "x**1678", its answers are

-0.05505 NO x**169/169
-0.08730 NO x**1685/1685
-0.10008 NO x**1681/1681
-0.21394 NO x**1689/1689
-0.25264 NO x**1687/1687
-0.25288 NO x**1688/1681
-0.28164 NO x**1678/1678
-0.28320 NO x**1678/1679
-0.29745 NO x**1684/1684
-0.31267 NO x**1678/1685

This example is testing DL's understanding of pattern
"integration of x^n is x^(n+1)/(n+1)".

This result seems to show that DL understands the pattern but fails to
do "n+1" for some not so large n.

2. DL may give correct result that contains strange constant.

For example, to integrate "x**2", its answers are

-0.25162 OK x**3*(1/cos(2) + 1)/(6*(1/(2*cos(2)) + 1/2))
-0.25220 OK x**3*(1 + 1/cos(1))/(6*(1/2 + 1/(2*cos(1))))
-0.25304 OK x**3*(1 + 1/sin(2))/(6*(1/2 + 1/(2*sin(2))))
-0.25324 OK x**3*(1 + 1/sin(1))/(6*(1/2 + 1/(2*sin(1))))
-0.25458 OK x**3*(1/tan(1) + 1)/(15*(1/(5*tan(1)) + 1/5))
-0.25508 OK x**3*(1 + log(1024))/(15*(1/5 + log(1024)/5))
-0.25525 OK x**3*(1/tan(2) + 1)/(15*(1/(5*tan(2)) + 1/5))
-0.25647 OK x**3*(1 + 1/cos(1))/(15*(1/5 + 1/(5*cos(1))))
-0.25774 OK x**3*(1 + 1/sin(1))/(15*(1/5 + 1/(5*sin(1))))
-0.28240 OK x**3*(log(2) + 1)/(15*(log(2)/5 + 1/5))

3. DL doesn't understand multiplication very well.

For example, to integrate "19*sin(x/17)", its answers are

-0.12595 NO -365*cos(x/17)
-0.12882 NO -373*cos(x/17)
-0.14267 NO -361*cos(x/17)
-0.14314 NO -357*cos(x/17)
-0.18328 NO -353*cos(x/17)
-0.20499 NO -377*cos(x/17)
-0.21484 NO -352*cos(x/17)
-0.25740 NO -369*cos(x/17)
-0.26029 NO -359*cos(x/17)
-0.26188 NO -333*cos(x/17)

4. DL doesn't handle long expression very well.

For example to integrate
'sin(x)+cos(x)+exp(x)+log(x)+tan(x)+atan(x)+acos(x)+asin(x)'
its answers are

-0.00262 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + sin(x) - cos(x)
-0.07420 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + log(cos(x)) + sin(x) - cos(x)
-0.10192 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + 2*sin(x) - cos(x)
-0.10513 NO x*log(x) + x*acos(x) + x*asin(x) + exp(x) - log(x**2 +
1)/2 - log(cos(x)) + sin(x) - cos(x)
-0.10885 NO x*log(x) + x*sin(x) + x*acos(x) + x*asin(x) - x + exp(x) -
log(x**2 + 1)/2 + sin(x) - cos(x)
-0.10947 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + sin(x) - cos(x)
-0.13657 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + log(exp(x) + 1) + sin(x) - cos(x)
-0.16144 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) + log(x + 1)
- log(x**2 + 1)/2 + sin(x) - cos(x)
-0.16806 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + log(cos(x)) + sin(x) - cos(x)
-0.19019 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +
1)/2 + log(exp(asinh(x)) + 1) + sin(x) - cos(x)


5. For the FWD test set with 9986 integrals, (which is generate random
expression first, then try to solve with sympy and discard failures)
FriCAS can solve 9980 out of 9986 in 71 seconds, of the remaining 6
integrals, FriCAS can solve another 2 under 100 seconds, and gives
"implementation incomplete" for 2 integrals, and the remaining 2
integrals contain complex constant like "acos(acos(tan(3)))", which
FriCAS can solve using another function.

The DL system can solve 95.6%, by comparison FriCAS is over 99.94%.

6. The DL system is slow. To solve the FWD test set, the DL system
may use around 100 hours of CPU time.

7. For the BWD test set, (which is generate random expression first,
then take derivatives as integrand), FriCAS can roughly solve 95%.
Compared with DL's claimed 99.5%. The paper says Mathematica can
solve 84.0%, I'll a little skeptical about that.

8. DL doesn't handle rational function integration very well.

It can handle '(x+1)^2/((x+1)^6+1)' but not its expanded form.

So DL can recognize patterns, but it really doesn't have insight.

Rational function integration can be well handled by
Lazard-Rioboo-Trager algorithm, while DL falis at many
rational function integrals.

So some of my comments a year ago are correct:

"
In fact, I doubt that this program can solve some rational function
integration that requires Lazard-Rioboo-Trager algorithm to get
simplified result.
"

9. DL doesn't handle algebraic function integration very well.

I have a list of algebraic functions that FriCAS can solve while
other CASs can't, DL can't solve them as well.

10. For the harder mixed-cased integration, I have a list of
integrations that FriCAS can't handle, DL can't solve them as well.

- Best,
- Qian

Grégory Vanuxem

unread,
Nov 22, 2020, 10:55:31 PM11/22/20
to fricas...@googlegroups.com
hi,

Good to see that.

I'm just looking about that for FriCAS.

Have a good day*

__
Greg
> --
> You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fricas-devel/a6e0c843-fdcc-6b1c-3159-21499d7c05b9%40gmail.com.



--
__
G. Vanuxem

Waldek Hebisch

unread,
Nov 25, 2020, 10:04:24 AM11/25/20
to fricas...@googlegroups.com
On Sat, Nov 21, 2020 at 10:41:31PM +0800, Qian Yun wrote:
> OK, here is some of my preliminary findings of this deep
> learning system (I'll refer as DL):
>
> (The test are done with a beam size of 10, which means this DL
> will give 10 answers that it deems most likely.)
> (And I use the official FWD + BWD + IBP trained model.)
>
> 1. It doesn't handle large numbers very well.

The paper said that for training they used numbers up to 5
in random expressions. Differentiation and arthemtic
simplification may produce larger numbers, but clearly
large number go beyond training set. Also, in the
past there were works suggesting that arithmetic is
hard for ANN-s. OTOH we do not need DL for arithmetic
and IMO use of DL for integration is mostly independent
of arithmetic.
> -0.00262 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + sin(x) - cos(x)
> -0.07420 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + log(cos(x)) + sin(x) - cos(x)
> -0.10192 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + 2*sin(x) - cos(x)
> -0.10513 NO x*log(x) + x*acos(x) + x*asin(x) + exp(x) - log(x**2 +=20
> 1)/2 - log(cos(x)) + sin(x) - cos(x)
> -0.10885 NO x*log(x) + x*sin(x) + x*acos(x) + x*asin(x) - x + exp(x) -=20
> log(x**2 + 1)/2 + sin(x) - cos(x)
> -0.10947 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + sin(x) - cos(x)
> -0.13657 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + log(exp(x) + 1) + sin(x) - cos(x)
> -0.16144 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) + log(x + 1)=20
> - log(x**2 + 1)/2 + sin(x) - cos(x)
> -0.16806 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + log(cos(x)) + sin(x) - cos(x)
> -0.19019 NO x*log(x) + x*acos(x) + x*asin(x) - x + exp(x) - log(x**2 +=20
> 1)/2 + log(exp(asinh(x)) + 1) + sin(x) - cos(x)
>
>
> 5. For the FWD test set with 9986 integrals, (which is generate random
> expression first, then try to solve with sympy and discard failures)
> FriCAS can solve 9980 out of 9986 in 71 seconds, of the remaining 6
> integrals, FriCAS can solve another 2 under 100 seconds, and gives
> "implementation incomplete" for 2 integrals, and the remaining 2
> integrals contain complex constant like "acos(acos(tan(3)))", which
> FriCAS can solve using another function.
>
> The DL system can solve 95.6%, by comparison FriCAS is over 99.94%.
>
> 6. The DL system is slow. To solve the FWD test set, the DL system
> may use around 100 hours of CPU time.

You mean 10000 examples? That would be average 36 seconds
per example... IIUC you run on CPU, they probably got
much shorter runtime on GPU.

> 7. For the BWD test set, (which is generate random expression first,
> then take derivatives as integrand), FriCAS can roughly solve 95%.
> Compared with DL's claimed 99.5%. The paper says Mathematica can
> solve 84.0%, I'll a little skeptical about that.

I posted here generator that attemped to match parameters
to the DL paper and got 78% success rate. That discoverd
few bugs and percentage should be higher now, but much
lower than 95%. So apparently they used easier examples
(several details in the paper were rather unclear and
I had to use my guesses). I wonder how well DL would
do on examples from my generator? In particular, the
paper does not mention simplification of examples.
Unsimplified derivatives tend to contain visible traces
of primitive, after simplification problem gets harder.

> 8. DL doesn't handle rational function integration very well.
>
> It can handle '(x+1)^2/((x+1)^6+1)' but not its expanded form.
>
> So DL can recognize patterns, but it really doesn't have insight.
>
> Rational function integration can be well handled by
> Lazard-Rioboo-Trager algorithm, while DL falis at many
> rational function integrals.
>
> So some of my comments a year ago are correct:
>
> "
> In fact, I doubt that this program can solve some rational function
> integration that requires Lazard-Rioboo-Trager algorithm to get
> simplified result.
> "
>
> 9. DL doesn't handle algebraic function integration very well.
>
> I have a list of algebraic functions that FriCAS can solve while
> other CASs can't, DL can't solve them as well.
>
> 10. For the harder mixed-cased integration, I have a list of
> integrations that FriCAS can't handle, DL can't solve them as well.
>
> - Best,
> - Qian
>
> On 11/16/20 8:34 PM, Qian Yun wrote:
> > Hi guys,
> > =20
> > I assume you all know the paper "DEEP LEARNING FOR SYMBOLIC MATHEMATICS"
> > by facebook AI researchers, almost one year ago, posted on
> > https://arxiv.org/abs/1912.01412
> > =20
> > And the code was posted 8 months ago:
> > https://github.com/facebookresearch/SymbolicMathematics
> > =20
> > Have you played with it?
> > =20
> > Finally I have some time recently and played with it for a while,
> > and I believe I found some flaws. I will post my findings with
> > more details later. And it's a really interesting experience.
> > =20
> > If you have some spare time and want to have fun, I strongly
> > advise you to play with it and try to break it :-)
> > =20
> > Tips: to run the jupyter notebook example, apply the following
> > patch to run it on CPU instead of CUDA:
> > =20
> > - Best,
> > - Qian
> > =20
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > =20
> > diff --git a/beam_integration.ipynb b/beam_integration.ipynb
> > index f9ef329..00754e3 100644
> > --- a/beam_integration.ipynb
> > +++ b/beam_integration.ipynb
> > @@ -64,6 +64,6 @@
> > =C2=A0=C2=A0=C2=A0=C2=A0 "\n",
> > =C2=A0=C2=A0=C2=A0=C2=A0 "=C2=A0=C2=A0=C2=A0 # model parameters\n",
> > -=C2=A0=C2=A0=C2=A0 "=C2=A0=C2=A0=C2=A0 'cpu': False,\n",
> > +=C2=A0=C2=A0=C2=A0 "=C2=A0=C2=A0=C2=A0 'cpu': True,\n",
> > =C2=A0=C2=A0=C2=A0=C2=A0 "=C2=A0=C2=A0=C2=A0 'emb_dim': 1024,\n",
> > =C2=A0=C2=A0=C2=A0=C2=A0 "=C2=A0=C2=A0=C2=A0 'n_enc_layers': 6,\n",
> > =C2=A0=C2=A0=C2=A0=C2=A0 "=C2=A0=C2=A0=C2=A0 'n_dec_layers': 6,\n",
> > diff --git a/src/model/__init__.py b/src/model/__init__.py
> > index 2b0a044..73ec446 100644
> > --- a/src/model/__init__.py
> > +++ b/src/model/__init__.py
> > @@ -38,7 +38,7 @@
> > =C2=A0=C2=A0=C2=A0=C2=A0 # reload pretrained modules
> > =C2=A0=C2=A0=C2=A0=C2=A0 if params.reload_model !=3D '':
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 logger.info(f"Reloading=
> modules from {params.reload_model} ...")
> > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 reloaded =3D torch.load(param=
> s.reload_model)
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 reloaded =3D torch.load(param=
> s.reload_model,=20
> > map_location=3Dtorch.device('cpu'))
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for k, v in modules.ite=
> ms():
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
> assert k in reloaded
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
> if all([k2.startswith('module.') for k2 in=20
> > reloaded[k].keys()]):
> > diff --git a/src/utils.py b/src/utils.py
> > index bd90608..ef87582 100644
> > --- a/src/utils.py
> > +++ b/src/utils.py
> > @@ -25,7 +25,7 @@
> > =C2=A0FALSY_STRINGS =3D {'off', 'false', '0'}
> > =C2=A0TRUTHY_STRINGS =3D {'on', 'true', '1'}
> > =20
> > -CUDA =3D True
> > +CUDA =3D False
> > =20
> > =20
> > =C2=A0class AttrDict(dict):
>
> --=20
> You received this message because you are subscribed to the Google Groups "=
> FriCAS - computer algebra system" group.
> To unsubscribe from this group and stop receiving emails from it, send an e=
> mail to fricas-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/=
> fricas-devel/a6e0c843-fdcc-6b1c-3159-21499d7c05b9%40gmail.com.

--
Waldek Hebisch

Qian Yun

unread,
Nov 26, 2020, 4:52:18 AM11/26/20
to fricas...@googlegroups.com

>> 6. The DL system is slow. To solve the FWD test set, the DL system
>> may use around 100 hours of CPU time.
>
> You mean 10000 examples? That would be average 36 seconds
> per example... IIUC you run on CPU, they probably got
> much shorter runtime on GPU.

Solve one example takes around 2 seconds with over 10 threads.

>> 7. For the BWD test set, (which is generate random expression first,
>> then take derivatives as integrand), FriCAS can roughly solve 95%.
>> Compared with DL's claimed 99.5%. The paper says Mathematica can
>> solve 84.0%, I'll a little skeptical about that.
>
> I posted here generator that attemped to match parameters
> to the DL paper and got 78% success rate. That discoverd
> few bugs and percentage should be higher now, but much
> lower than 95%. So apparently they used easier examples
> (several details in the paper were rather unclear and
> I had to use my guesses). I wonder how well DL would
> do on examples from my generator? In particular, the
> paper does not mention simplification of examples.
> Unsimplified derivatives tend to contain visible traces
> of primitive, after simplification problem gets harder.

The "95%" number was based on the first 1000 integrals or so.

So I did a full run and I'm attaching my test run log.

I took the first 7707 integrals from the tests (minus integrals
containing 'sign(x)' and treating 'Abs(x)' as 'x').

I use a timeout of 10 seconds for FriCAS to run this test.

The total run time of FriCAS is 40 minutes.

There are 191 timeout, 530 integration error, and 61
integrals that FriCAS falsely claim it's unintegrable.
So success rate is 89.8%.

(I didn't 'diff' them back and use 'normalize' to check equality,
there are extra complexity involving constant that are complex number.)

(There are regressions between 1.3.4 and 1.3.5, I'm looking at some.)

- Qian
fricas-bwd-test.tbz2

Qian Yun

unread,
Nov 26, 2020, 5:35:03 AM11/26/20
to fricas...@googlegroups.com
Some (alarming?) information, for some integrals, if you run them
in a fresh session, there's no problem, but when run as part of
the test set, FriCAS gives wrong answer.

Example integrals are:

x*(tan(x**2 - 4)**2 + 1)*(tan(1/(2*(tan(x**2 - 4) + 1)))**2 +
1)/(2*(tan(x**2 - 4) + 1)**2*sqrt(tan(1/(2*(tan(x**2 - 4) + 1))))) + 0

(6*x**2 + 1)*sin(2*x**3 + x + log(3)**5)/cosh(cos(2*x**3 + x +
log(3)**5))**2 + 0
Reply all
Reply to author
Forward
0 new messages