Intro and Potential GSOC Project Ideas

188 views
Skip to first unread message

Tilo RC

unread,
Mar 21, 2023, 5:02:24 AM3/21/23
to sympy

Hello! My name is Tilo and I am a sophomore at Pomona College in Claremont, California. Currently, I am double majoring in math and cs. I am highly interested in participating in GSoC with sympy because I believe it would be a great opportunity to enhance my programming skills and give back to the open-source community.


In the past two years, I have gained extensive experience in Python through personal projects, a Machine Learning club, leetcode, and college classes. Although Python is my most comfortable language, I also have substantial experience in Java through classes I have taken.


I have a strong interest in statistics and machine learning, and have taken several classes in these subjects. I’m currently taking a class on natural language processes which has further sparked my interest in the field.


In addition to these areas, I also have experience with discrete math. I have taken semester-long courses in both Number Theory and Combinatorics, as well as learned about discrete math through computer science courses.


I learned about SymPy in my first semester of college when I was taking linear algebra. As I do all of my math homework in LaTeX, I found the ability to instantly convert matrices into LaTeX particularly useful. This semester I’ve been using SymPy to help a professor test conjectures related to Schur Polynomials. 



I am uncertain about which GSoC Ideas would suit me best, given my background. However, I have some tentative ideas:

  1. Parsing  

Currently, I am enrolled in an introductory course on languages and the theory of computation, where we have recently started exploring parsers. Additionally, I am taking a natural language processing class that involves programming intensive assignments on probabilistic context-free grammars and sentence parsing. However, I have limited experience in some of the potentially required languages such as Fortran, C, C++, Julia, Rust, LLVM, Octave, and Matlab. I think a project that focuses on existing LaTeX functionality would be a good fit.

  1. Assumptions 

This idea seems very challenging but it also interests me a lot. However, a bug I found today made me want to learn more about how sympy’s assumption system works. It’s not exactly clear to me what the prereqs for working on this idea are. However, I have experience with number theory which is listed as one of the prereqs for some reason. Also, I’ve taken a class on functional programing with coq which seems like it could possibly be relevant. 

  1. Improve the plotting module 

If the other ideas seem unrealistic or impractical, this project seems well-suited to my capabilities. I have experience with HTML, Javascript, and CSS. (Actually, JavaScript was the first language I learned. I even taught a lesson on using JS to approximate integrals in my high school calculus class. Hey! Python would’ve been better but JavaScript worked!). While working on an issue related to polygons I noticed that there was no convenient way to plot polygons and other geometric objects, so maybe some of the work on this idea could add functionality related to that. 

I would appreciate your opinion on which of these ideas to explore further, and whether there are any better-suited to my background.

Oscar Benjamin

unread,
Mar 21, 2023, 4:23:12 PM3/21/23
to sy...@googlegroups.com
On Tue, 21 Mar 2023 at 09:02, Tilo RC <tilor...@gmail.com> wrote:
>
> Hello! My name is Tilo and I am a sophomore at Pomona College in Claremont, California. Currently, I am double majoring in math and cs. I am highly interested in participating in GSoC with sympy because I believe it would be a great opportunity to enhance my programming skills and give back to the open-source community.

Hi Tilo,

> I am uncertain about which GSoC Ideas would suit me best, given my background. However, I have some tentative ideas:
>
> Parsing
>
> Currently, I am enrolled in an introductory course on languages and the theory of computation, where we have recently started exploring parsers. Additionally, I am taking a natural language processing class that involves programming intensive assignments on probabilistic context-free grammars and sentence parsing. However, I have limited experience in some of the potentially required languages such as Fortran, C, C++, Julia, Rust, LLVM, Octave, and Matlab. I think a project that focuses on existing LaTeX functionality would be a good fit.

I think that what really needs to happen with parsing is to rewrite
the parsers based on something like lark. The current parsing code
either uses ad-hoc parsers or in the case of parse_expr uses eval. I
think that making it so that parse_expr does not use eval should be a
top priority in parsing.

> Assumptions
>
> This idea seems very challenging but it also interests me a lot. However, a bug I found today made me want to learn more about how sympy’s assumption system works. It’s not exactly clear to me what the prereqs for working on this idea are. However, I have experience with number theory which is listed as one of the prereqs for some reason. Also, I’ve taken a class on functional programing with coq which seems like it could possibly be relevant.

Assumptions are a big topic. Making it so that the new assumptions
system can understand inequalities meaningfully would make a good
project. That is a very common feature request. I don't think number
theory is relevant.

> Improve the plotting module
>
> If the other ideas seem unrealistic or impractical, this project seems well-suited to my capabilities. I have experience with HTML, Javascript, and CSS. (Actually, JavaScript was the first language I learned. I even taught a lesson on using JS to approximate integrals in my high school calculus class. Hey! Python would’ve been better but JavaScript worked!). While working on an issue related to polygons I noticed that there was no convenient way to plot polygons and other geometric objects, so maybe some of the work on this idea could add functionality related to that.

Significant improvements have been made to the plotting module through
sympy_plot_backends:
https://sympy-plot-backends.readthedocs.io/en/latest/

Integrating that improved plotting module back into SymPy would make a
great project. There are some (I think small) difficulties in doing
this in a way that is backwards compatible so it needs a bit of work:
https://github.com/sympy/sympy/issues/23036

--
Oscar


>
> I would appreciate your opinion on which of these ideas to explore further, and whether there are any better-suited to my background.
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/c4b9de3d-40e8-40e6-8546-c8fff11db9ban%40googlegroups.com.

Tilo RC

unread,
Mar 29, 2023, 2:21:01 AM3/29/23
to sympy
Thank you for your response Oscar. I think I’m interested in taking a stab at improving the capabilities of the new assumption system to deal with inequalities (and perhaps relations between symbols in general?). I’ve spent a lot of time reading through the documentation, code, discussions, and previous proposals related to assumptions and come to some understanding. I’ve tried to explain what I understand so far below. Could you read through what I’ve written and correct any mistakes in my thinking?

For around a decade the SymPy community has wanted to transition from the “old assumptions” to the “new assumptions.” The old assumptions are seen as fundamentally limited compared to the new assumptions for reasons I’m trying to figure out. This is my understanding:

  • The old assumption system is deeply connected to the core of sympy. I think this is bad because it’s responsible for some of the other problems with it and violates OOP principles. 

  • One fundamental flaw of the old assumption system is that it’s not possible to make assumptions about relations between different variables. The new assumption system has this capability, and even though it is not very feature rich, it has a lot of potential. 

  • The old assumption system slows down SymPy. I’m not sure how true this still is. It seems like the core and the old assumption system have gotten many improvements to make them faster. For example, because I accidentally read the SymPy 1.11 instead of the SymPy 1.13  documentation I learned about the ManagedProperties metaclass and then saw it was removed to improve efficiency. (By the way, both of these sections from the documentation say that “This explanation is written as of SymPy 1.7.” even though the writing clearly changed between 1.11 and 1.13).  


As for improving how the new assumptions deal with inequalities, there are some very basic features they’re missing. For example, ask(Q.negative(x),Q.ge(x,1)) returns None under the current system even though it seems like it would be really simple to implement something where if x > 0 then x is not negative. Similarly, ask(Q.gt(x,0),Q.gt(x,1)) returns None. A more complicated query that returns None is ask(Q.eq(x,1), Q.integer(x) & Q.gt(x,0) & Q.lt(x,2)).


I also notice that even equality is somewhat limited in the new assumption system. For example, ask(Q.positive(y), Q.eq(x,y) & Q.positive(x)) returns None. I’m a bit unsure why this hasn’t been implemented. One guess I have is that implementing this functionality would be too slow because maybe you would have to use the solver?


Frankly, although I’ve spent a lot of time reading the files in the new assumption system, I only have a surface level understanding of them:


  • relation/equality.py

    • I probably understand this file the best out of all the files in the new assumptions. It contains implementations of binary relation for >, ≥, ≤, <, =, and ≠.

  • Ask.py

    • Heart of the new assumptions. Defines ask() and Q. In general, ask tries to find computationally cheap ways to evaluate propositions. If it can’t do it in a cheap way it calls the SAT solver. Also, all of the assumptions are converted into a normal form which I think is analogous to the role Chomsky normal form plays in allowing a parser to parse a grammar efficiently.


How difficult would it be to implement something that deals with inequalities better? What about the specific cases I outline above? Would this be enough for a summer project? Do you think this task to advanced for someone of my background?

Aaron Meurer

unread,
Mar 29, 2023, 3:26:47 AM3/29/23
to sy...@googlegroups.com
On Wed, Mar 29, 2023 at 12:21 AM Tilo RC <tilor...@gmail.com> wrote:
Thank you for your response Oscar. I think I’m interested in taking a stab at improving the capabilities of the new assumption system to deal with inequalities (and perhaps relations between symbols in general?). I’ve spent a lot of time reading through the documentation, code, discussions, and previous proposals related to assumptions and come to some understanding. I’ve tried to explain what I understand so far below. Could you read through what I’ve written and correct any mistakes in my thinking?

For around a decade the SymPy community has wanted to transition from the “old assumptions” to the “new assumptions.” The old assumptions are seen as fundamentally limited compared to the new assumptions for reasons I’m trying to figure out. This is my understanding:

  • The old assumption system is deeply connected to the core of sympy. I think this is bad because it’s responsible for some of the other problems with it and violates OOP principles. 


It's not bad because of some abstract reasons. The issue is that the old assumptions design limits what it is able to do. Assumptions can only be made on symbols, deductions can only be done via direct chains of inference from handlers, and it's impossible to add additional facts to the system except by adding handlers to the specific relevant classes.

Just as an example, inequality assumptions like x > 1 are simply impossible to even represent in the old assumptions. 
 
  • One fundamental flaw of the old assumption system is that it’s not possible to make assumptions about relations between different variables. The new assumption system has this capability, and even though it is not very feature rich, it has a lot of potential. 


This is correct. 
 
  • The old assumption system slows down SymPy. I’m not sure how true this still is. It seems like the core and the old assumption system have gotten many improvements to make them faster. For example, because I accidentally read the SymPy 1.11 instead of the SymPy 1.13  documentation I learned about the ManagedProperties metaclass and then saw it was removed to improve efficiency. (By the way, both of these sections from the documentation say that “This explanation is written as of SymPy 1.7.” even though the writing clearly changed between 1.11 and 1.13).  


This is still true. The issue isn't just the speed of the old assumptions, but the fact that object constructors make heavy use of them. Take the expression in https://github.com/sympy/sympy/issues/24565, which takes 30 seconds just to construct. This time is all spent in assumptions. Ideally, simply constructing an expression shouldn't call any assumptions. One of the reasons the new assumptions are separate from the core is that there was a hope that this would make it easier to remove assumptions from core constructors (but really that isn't necessary; we just need to be better about not using assumptions in constructors).
 

As for improving how the new assumptions deal with inequalities, there are some very basic features they’re missing. For example, ask(Q.negative(x),Q.ge(x,1)) returns None under the current system even though it seems like it would be really simple to implement something where if x > 0 then x is not negative. Similarly, ask(Q.gt(x,0),Q.gt(x,1)) returns None. A more complicated query that returns None is ask(Q.eq(x,1), Q.integer(x) & Q.gt(x,0) & Q.lt(x,2)).


I also notice that even equality is somewhat limited in the new assumption system. For example, ask(Q.positive(y), Q.eq(x,y) & Q.positive(x)) returns None. I’m a bit unsure why this hasn’t been implemented. One guess I have is that implementing this functionality would be too slow because maybe you would have to use the solver?


No, it just hasn't been implemented. The idea here is that we need a way to rewrite these predicates and/or generate new predicates that will allow the assumptions system SAT solver to deduce things about the symbols. It needs to be done efficiently. There is likely literature on ways to do this. For instance, if we have Eq(x, y) then we could add a predicate Equivalent(p(x), p(y)) for every predicate p(x) and p(y), but there could be a lot of such predicates. There's likely smarter ways to represent this substitution system. I'm sure there's also good heuristics too (like just substituting y for x everywhere).

In general, we may need to move from a SAT solver to a SMT solver approach to handle some of these things completely efficiently. 


Frankly, although I’ve spent a lot of time reading the files in the new assumption system, I only have a surface level understanding of them:


  • relation/equality.py

    • I probably understand this file the best out of all the files in the new assumptions. It contains implementations of binary relation for >, ≥, ≤, <, =, and ≠.

  • Ask.py

    • Heart of the new assumptions. Defines ask() and Q. In general, ask tries to find computationally cheap ways to evaluate propositions. If it can’t do it in a cheap way it calls the SAT solver. Also, all of the assumptions are converted into a normal form which I think is analogous to the role Chomsky normal form plays in allowing a parser to parse a grammar efficiently.


How difficult would it be to implement something that deals with inequalities better? What about the specific cases I outline above? Would this be enough for a summer project? Do you think this task to advanced for someone of my background?


You'll need to do more research, but if you are motivated it's not impossible to do. You should first get a better understanding of the basics of SAT and concepts like CNF and satisfiability. Understanding the inner workings of a SAT solver is also useful but not as important, at least for a first pass. For the most part, you can treat a SAT solver as a black box.

It also would be a good idea to look at the literature to see if you can find if anyone has written any papers on how to do this. This sort of thing is already doable by SMT solvers so it's not completely novel, but there may be simpler approaches when dealing with pure inequalities. 

The files you want to get an understanding of are satask and sathandlers. In particular, if you can grok the basic way that satask works in how it gets answers from the SAT solver, that is a good first step. 

Aaron Meurer

Peter Stahlecker

unread,
Mar 29, 2023, 4:03:12 AM3/29/23
to sy...@googlegroups.com
Dear Tilo,

I am just a (semi retired) recreational user of sympy, mostly sympy.physics.mechanics. I have no ambition, and certainly no skills to do anything with the code itself.
Still, I like to understand things as much as possible, therefore my question:

what is meant by assumptions?

Thanks a lot!

Peter

--
Best regards,

Peter Stahlecker

Jason Moore

unread,
Mar 29, 2023, 4:11:07 AM3/29/23
to sy...@googlegroups.com
Hi Peter,

You've probably seen this example: https://moorepants.github.io/learn-multibody-dynamics/sympy.html#differentiating, but in the "warning" box you can see how setting assumptions on the variables changes the results (positive, real, etc).

Jason

Peter Stahlecker

unread,
Mar 29, 2023, 5:10:17 AM3/29/23
to sy...@googlegroups.com
Dear Jason,

YES, of course I remember.
This means, assumptions have the literal English meaning, like 'I asssume x to be real and larger than 5'?
If I understood correctly, then what are 'old' and 'new' assumptions?

Thanks, Peter

Jason Moore

unread,
Mar 29, 2023, 5:33:21 AM3/29/23
to sy...@googlegroups.com
The old are what I just showed "positive=True" when defining variables.

The new idea is to do something like:

with assume(x > 0):
    res = Abs(x)

res will then be just x.

That's not the syntax though, I don't know it off the top of my head.


Peter Stahlecker

unread,
Mar 29, 2023, 5:46:48 AM3/29/23
to sy...@googlegroups.com
Dear Jason,

Thanks!
I think, I got the idea.

Peter

Alan Bromborsky

unread,
Mar 29, 2023, 7:57:01 AM3/29/23
to sy...@googlegroups.com

Especially with regard to mechanics how many problems could be solved with piecewise functions on a fixed grid?

Tilo RC

unread,
Apr 2, 2023, 6:20:50 AM4/2/23
to sympy
Thank you Aaron for taking the time to reply to my email. I appreciate your input.

I wrote a draft of my GSOC proposal. Could you (or anyone else) give me feedback? Where is my understanding wrong and which parts should I expand upon?

Particularly for the timeline section I'm not sure what to put. I still need to do a lot of research and I'm not sure how long things will take. 

Also, a question: why does SymPy implement its own SAT solver instead of using a SAT/SMT solver such as Z3. Although Z3 is associated with Microsoft, it is licensed under the MIT License and has a python wrapper. Wouldn't using something like Z3 which is super optimized be a lot faster than writing our own SAT solver?

Tilo RC

unread,
Apr 2, 2023, 3:31:10 PM4/2/23
to sympy
Probably better to look at my proposal in the wiki page for current applications to get the most recent version. I've already updated it once and will be updating it. 

Aaron Meurer

unread,
Apr 3, 2023, 3:01:15 AM4/3/23
to sy...@googlegroups.com
On Sun, Apr 2, 2023 at 4:20 AM Tilo RC <tilor...@gmail.com> wrote:
>
> Thank you Aaron for taking the time to reply to my email. I appreciate your input.
>
> I wrote a draft of my GSOC proposal. Could you (or anyone else) give me feedback? Where is my understanding wrong and which parts should I expand upon?
>
> Particularly for the timeline section I'm not sure what to put. I still need to do a lot of research and I'm not sure how long things will take.
>
> Also, a question: why does SymPy implement its own SAT solver instead of using a SAT/SMT solver such as Z3. Although Z3 is associated with Microsoft, it is licensed under the MIT License and has a python wrapper. Wouldn't using something like Z3 which is super optimized be a lot faster than writing our own SAT solver?

SymPy can interface with some faster sat solvers like pycosat. This
improves the assumptions speed a little bit, but quite a bit of the
time is spent just setting up the problem that is sent to the solver.

As far as SMT solvers go, SymPy hasn't yet made use of SMT at all yet,
although there is some work on that front
(https://github.com/sympy/sympy/pull/23961). I would love to see
integration with Z3. Although if Z3 were the only way to achieve
certain operations that would probably require some discussion because
it might imply that Z3 should be a hard dependency, which we might
want to avoid. Typically SymPy has pure Python implementations and can
optionally interface with faster C implementations (e.g., gmpy2).

It would be good to dig into Z3 and see how well it does on these
sorts of problems, and if possible, determine if there are any
specific algorithms its using under the hood.

Aaron Meurer
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/18580097-c93b-4b09-b5ec-820017bbe603n%40googlegroups.com.

Md Affan

unread,
Apr 3, 2023, 4:40:48 PM4/3/23
to sympy
Hello 
Where I can Find the previous work for the project -Classical Mechanics: Efficient Equation of Motion Generation with C++
Md Affan
Reply all
Reply to author
Forward
0 new messages