If they were interested in a fair comparison they would use a test set from
(for example) Rubi or one of the CAS.
My guess is that they did this:
1. generate a random expression S favoring + and * in the tree.
2. differentiate S to get S'
3. "learn" the integral of S'.
Here's the trick. S' will, with very high probability, be a sum. Say s1+s2+s3.
A CAS will usually try to compute integrate(s1,x) + integrate(s2,x)+ integrate(s3,x).
That's the way integral tables work too.
Unfortunately, for many "random" expressions, s1, s2, s3, ... are
NOT integrable in terms of elementary functions. Only their sum.
So a CAS will fail.
Here's a particular example. exp(-x^3)/x^4.
Differentiate (I'm copy/pasting from Maxima) to get
-(3*%e^(-x^3))/x^2- (4*%e^(-x^3))/x^5
Neither of these terms is separately integrable in terms of elementary functions.
So a "real" CAS will fail on even "simple" problems. If you generate
trees with 15 random operators, the probably of failing increases.
For this particular example, which is the 2nd one I tried, Maxima gives
gamma_incomplete(-1/3,x^3)+(4*gamma_incomplete(-4/3,x^3))/3
Non elementary it seems. But we know this is supposed to be the same as exp(-x^3)/x^4.
A minute testing numerically suggests it is, indeed, equal.
So what we have for "ML" here is a made-up test set that is not
reflective of the actual task of computing integrals as needed in
applied math, and as considered in (for example) integral
tables or integration algorithms.
We are perhaps familiar with the notion of "teaching for the test"
in which students and teachers collude to get excellent grades
on some standardized test. Yet the students may really
not know the material.
This is maybe worse because the "test" is not some
important standardized suite of integration problems.
It is just randomly generated. Maybe it would
be fair to call it noise? The author could post the
test suite, I suppose.
RJF
RJF