Hi everyone,
My name is Sai Tadepalli and I'm a first year Computer Science student at the University of Cambridge. I've been going through the GSoC project ideas list and I'm particularly interested in the Rule-based symbolic integration project. I've been working on a related project involving the use of Vedic algorithms (sutras) for symbolic integration, and have had success performing integration on certain types of functions significantly faster than `sympy.integrate()`. You can check it out here:
https://github.com/SaiNikhilTadepalli/NeuralSutra.
As of now, the Vedic sutras (algorithms) that have been implemented are:
-
Paravartya Yojayet ("Transpose and Apply"), which performs polynomial division
-
Urdhva Tiryagbhyam ("Vertically and Crosswise"), which performs polynomial multiplication. This is essentially equivalent to the linear convolution of the coefficient arrays of the polynomials, but I felt that it deserved to be implemented as described in Vedic mathematics literature.
-
Urdhva Tiryagbhyam tabular method, which is essentially equivalent to the DI method for integration by parts. Again, I felt that this algorithm should be implemented faithfully.
The approach essentially involves applying the appropriate Vedic algorithm to individual nodes of the AST of the expression to be integrated, until the expression no longer contains `Integral` nodes (or the expression no longer changes, in other words, a fixed-point iteration).
The project trains a simple Bi-LSTM classifier on a synthetically generated dataset of SymPy expressions (the script to generate a similar dataset is also contained in the repo). The dataset consists of SymPy expressions (to integrate) each associated with one of four classes ("multiply", "divide", "integrate" or "fallback"). These classes simply correspond to the most appropriate Vedic algorithm (or none in the case of "fallback") to apply first to the given expression. The model is used to predict which Vedic algorithm to perform on each node of the expression to integrate. If none of the Vedic algorithms are appropriate to perform on a given node, i.e. the classifier predicts "fallback", then SymPy is used to perform the integration.
I believe that the system could be transitioned to a purely rule-based approach to replace the current Bi-LSTM classifier, which would potentially fit better with the Rule-based symbolic integration project idea.
The project contains a benchmarking script, which runs NeuralSutra and `sympy.integrate()` on a set of complex test expressions to integrate. It turns out that NeuralSutra is much, much faster than SymPy for integrating certain types of expressions (particularly long polynomial products and high-degree sparse polynomials), although I am not entirely sure why this is the case. I suspect it may have something to do with the recursive decomposition of the original expression to be integrated. I did find that the Vedic algorithms individually were faster than their SymPy counterparts (e.g.
Paravartya Yojayet compared to `sympy.div()`) on certain examples while playing around during development, but I haven't formalised or included any test cases to compare the performances of the individual algorithms with any amount of rigour within the project yet, so I haven't been able to make any conclusions.
You can find a more detailed explanation of how NeuralSutra works in the repo's README. I was wondering whether a project involving the implementation of these and more Vedic algorithms for symbolic integration would be appropriate for a GSoC project proposal. I would obviously need to research the additional Vedic algorithms that are useful for integration, plan out their implementations and clearly define the scope (e.g. whether we include the recursive decomposition logic as a performance improvement or just stick to improving the range of the rule-based integration methods currently available in SymPy by implementing Vedic algorithms).
Alternatively, would it be more suitable to add some of these algorithms (the ones that directly perform integration such as the
Urdhva Tiryagbhyam tabular method) to `sympy.integrals.manualintegrate` given that they resemble by-hand integration techniques? If so, would opening a PR implementing one of these methods be considered an acceptable thing to do in preparation for a GSoC project proposal?
Thank you very much for your time and I'd appreciate any feedback.
Kind regards,
Sai Tadepalli