NLP parser for sympy

59 views
Skip to first unread message

Moses Paul

unread,
May 13, 2020, 1:19:54 PM5/13/20
to sympy
So I've been working on an NLP parser for sympy.
This is how it works,
  • The Input is first "cleaned up" and rewritten into a structure that is comprehended by a NMT model (seq2seq)
  • The processed input is passed on to the model which then gives a specific type of output, which is then "processed".
  • The final result is one that works when used inside
    sympify('Expression')
So Far I've been able to train using data generated from Functions similar to Sum, Max, Min i.e functions with a list of inputs and also with functions such as Summations and Integrals.
Since I haven't gone through SymPy's entire codebase, it would be really useful if I had sort of a Glossary or an equivalent structure from which I can glean information about the various functions SymPy has, like a list of single parameter functions, two parameters, multiple parameters and so on.

I haven't been able to find anything so far, help would be much appreciated

Cheers
Moses Paul

Aaron Meurer

unread,
May 13, 2020, 2:17:22 PM5/13/20
to sympy
What sorts of things is it able to parse?

I don't know if there is a well structured glossary of SymPy
functions. The default namespace (what gets imported with "from sympy
import *") is the best place to start.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/9fef4da7-aa4f-4c47-ac44-932efacb1dcd%40googlegroups.com.

Moses Paul

unread,
May 13, 2020, 2:53:45 PM5/13/20
to sympy


On Wednesday, May 13, 2020 at 11:47:22 PM UTC+5:30, Aaron Meurer wrote:
What sorts of things is it able to parse?

As of now, it can do stuff like
  • "What is the maximum of x,3,4,5,y" which returns Max(3,4,5,x,y) (passed to sympify)
  • "Find the sum of x, x+y, x^3" -> sum(x, x+y, x**3)
  • "Calculate the integral of x^2 + 3x" -> Integral(x**2+3*x)
  • "Sum of x from 0 to 100" -> Sum(x, (x, 0, 100))
Kinda like that.
( I'm structuring work I've done so far, I'll post a link to my repo here, once I finish that )

I don't know if there is a well structured glossary of SymPy
functions. The default namespace (what gets imported with "from sympy
import *") is the best place to start.

Gotcha!

Aaron Meurer

On Wed, May 13, 2020 at 11:19 AM Moses Paul <iammos...@gmail.com> wrote:
>
> So I've been working on an NLP parser for sympy.
> This is how it works,
>
> The Input is first "cleaned up" and rewritten into a structure that is comprehended by a NMT model (seq2seq)
> The processed input is passed on to the model which then gives a specific type of output, which is then "processed".
> The final result is one that works when used inside
> sympify('Expression')
>
> So Far I've been able to train using data generated from Functions similar to Sum, Max, Min i.e functions with a list of inputs and also with functions such as Summations and Integrals.
> Since I haven't gone through SymPy's entire codebase, it would be really useful if I had sort of a Glossary or an equivalent structure from which I can glean information about the various functions SymPy has, like a list of single parameter functions, two parameters, multiple parameters and so on.
>
> I haven't been able to find anything so far, help would be much appreciated
>
> Cheers
> Moses Paul
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sy...@googlegroups.com.

Moses Paul

unread,
May 13, 2020, 3:03:23 PM5/13/20
to sympy
(ps I'm aware that the examples (sum, Max) I gave up there use iterables )
Here's an excerpt from the model training dataset
what be the maximum of D, m => Max ( D , m )
what be the max of D
, m => Max ( D , m )
what be the biggest of D
, m => Max ( D , m )
find the sum of D
, m => sum ( D , m )
find the total of D
, m => sum ( D , m )
find the minimum of D
, m => Min ( D , m )
find the min of D
, m => Min ( D , m )
find the smallest of D
, m => Min ( D , m )
find the maximum of D
, m => Max ( D , m )
find the max of D
, m => Max ( D , m )

The above dataset is from a lemmatized version of natural language queries (is -> be)

and this is how the output looks like when passed to sympify

>>> sympify('Max(1, 2, 3)')
3
>>> sympify('Max(1, 2, x)')
Max(2, x)


Aaron Meurer

unread,
May 13, 2020, 3:19:24 PM5/13/20
to sympy
We should add this to SymPy Gamma once you have this working.

Aaron Meurer
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/e88b5533-19d6-4b0f-9ce7-222735b1adfc%40googlegroups.com.

David Bailey

unread,
May 13, 2020, 5:15:47 PM5/13/20
to sy...@googlegroups.com
On 13/05/2020 20:03, Moses Paul wrote:
(ps I'm aware that the examples (sum, Max) I gave up there use iterables )
Here's an excerpt from the model training dataset
what be the maximum of D, m => Max ( D , m )
what be the max of D
, m => Max ( D , m )
what be the biggest of D
, m => Max ( D , m )
find the sum of D
, m => sum ( D , m )
find the total of D
, m => sum ( D , m )
find the minimum of D
, m => Min ( D , m )
find the min of D
, m => Min ( D , m )
find the smallest of D
, m => Min ( D , m )
find the maximum of D
, m => Max ( D , m )
find the max of D
, m => Max ( D , m )

I think perhaps the greatest use of this parser would be as a natural language way to find out how to do things in SymPy - so it would be useful to return the resultant expression unevaluated - maybe in response to a "how" question.

For that purpose, it would be almost ideal.

Maybe it could also be extended to some vaguer questions such as :

"How do I evaluate line integrals using SymPy?"

David

Moses Paul

unread,
May 14, 2020, 3:52:08 AM5/14/20
to sympy
Sure! that could definitely be done, down the line 😁

And Aaron, I'm currently scraping through the SymPy documentation, I'll probably end up creating a "glossary" because it can be immensely helpful in automating the generation of training data for the NMT model.

Nicolas Guarin

unread,
May 14, 2020, 9:26:54 PM5/14/20
to sympy
If you have a list of terms from similar projects, maybe we can crowdsource the "glossary". If not, it can still be useful to have a document/site where this terms can be added from some community members. Although, I think that some kind of instructions are needed.

Moses Paul

unread,
May 15, 2020, 4:42:20 AM5/15/20
to sympy
yeah! we could crowdsource the glossary.
like a website or form where members could add info.
the one i currently have looks kinda like this

standardUse = ["What be the", "Find the"] # followed by 'of'
dictOfFuncs = {
'Min': {
'params': 'Multi',
'useCase': 'Standard',
'alternates': ["Minimum", "Min", "Smallest"]
},
'Max': {
'params': 'Multi',
'useCase': 'Standard',
'alternates': ["Maximum", "Max", "Greatest", "Biggest"]
},
}

I haven't started off with a rigid structure for this, planning to work on it during the weekend.

Chris Smith

unread,
May 19, 2020, 11:17:49 AM5/19/20
to sympy
Novak has been working on NLP for physics for decades. It might be worth checking outĀ https://www.cs.utexas.edu/users/novak/Ā andĀ https://www.cs.utexas.edu/users/novak/cgi/physdemod.cgiĀ .

/c

Chris Smith

unread,
May 19, 2020, 11:23:01 AM5/19/20
to sympy

Moses Paul

unread,
May 19, 2020, 2:19:48 PM5/19/20
to sympy
Thanks Chris!! His work is pretty impressive, Will def check it out 😁
Reply all
Reply to author
Forward
0 new messages