On Fri, Dec 15, 2023 at 09:33:17AM -0800, 'Nasser M. Abbasi' via FriCAS - computer algebra system wrote:
> fyi, on related subject, I tested the MIT integration test problems with
> open AI chatGPT last summer. It did not do well. Here is the result
>
>
https://12000.org/my_notes/CAS_integration_tests/reports/summer_2022/test_cases/11_MIT//reportchapter1.htm
>
> The result of solved % on this one file only (over 300 integration
> problems) were
>
> Mathematica 99.38%
> Fricas96.57%
> Maple 95.33%
> Rubi 94.08%
> Maxima 92.52%
> Giac 91.59%
> Mupad 90.03%
> Sympy 82.24%
> ChatGPT 14.33%
IIUC this was version 3.5. Newer version is supposed to be
much stronger. Few weeks ago OpenAI people claimed to made
"breakthrough" in handling of math problems. Of course, ATM
it is not clear what they have. But some things are possible
to guess. There were earlier reports that proper "didactics"
helps: start training on easy problems and gradually move
to harder ones. Supposedly learing "step by step" reasoning
is easier than "end to end" approach which was used (and
quite sucessful) on other problems. There is also an
obvious workaround: since ANN-s seem to have trouble with
algorithmically trival math one could couple ANN to a
"calculator", that is conventional program for doing
calculations. So there are various approaches. In particular
neural nets have enough capacity to memorize something like
Rubi rules. If "trainig" could teach it to chain rules, than
ANN could get to level comparable to Rubi. It would probably
get some side conditions wrong, but it could compesate by
having more rules and via "beam search" (that is producing
many candidate solutions and using external differentiator
to remove wrong candidates). Better net could learn to
internally discard wrong candidates.
I have no idea at what stage is Google offer. Public version
is probably weaker than versions available to internal
developers and paying customers. Still, progress seem to
be rapid and there is something like a year from availability
of ChatGPT 3.5.
--
Waldek Hebisch