Google Summer of Code 2020 - Proposal
Improving Assumptions Module in SymPy
Me, the person:
Name: Arpan Chattopadhyay
University: Birla Institute of Technology and Science, Pilani
Email: f201...@pilani.bits-pilani.ac.in
GitHub: Arpan612 (Arpan Chattopadhyay)
Gitter: Arpan612
LinkedIn: Arpan Chattopadhyay
Time-zone: IST (UTC+5:30)
Age: 19
I prefer to be contacted via Gitter as I am very active there. However, other ways of communication are also welcome.
I am Arpan Chattopadhyay, pursuing B.E.(Hons.) Electrical and Electronics (2nd Year) at Birla Institute of Technology and Science, Pilani (BITS Pilani). I am highly interested in the fields of Symbolic Mathematics, Applied Mathematics, Machine Learning, and Mathematical Modeling. I would love to continue working in open source projects and hopefully, one day, create my team of programmers for building an open-source project as big as that of SymPy. I am very well versed in English and love interacting with people. Apart from coding, I love watching films, particularly mystery ones. I play and follow football as well. I believe I have been a good fit in the community culture and I hope to contribute even after the GSoC program gets over.
Me, the programmer:
I have been introduced to programming in Class 11 where we were taught C++ programming language as a part of our course curriculum. I started Python a month after joining college. I found great comfort and interest in coding with Python. I made some projects with Django, a framework for Python in the coding projects our college used to organize. I also got experience with Javascript, My SQL, and Assembly level programming in my college. I found great inspiration in collaborating with people and working on projects. At that time, I was introduced to the world of open source and the limitless possibilities it held for me. I was fortunate enough to take part in a Study Oriented Project which involved Symbolic Mathematics and the use of SymPy in Python. I found out about the various tasks which can be done by SymPy and was very impressed. Thus, I decided to finally give wings to my dream of working and collaborating in a large open-source project and SymPy was a natural choice, given the interest, it sparked in me.
OS: Ubuntu 16.04 LTS
Hardware Configuration: Intel i7 8th generation
Python: Version 3.7.4
Editor: Atom (Version 1.45.0) and Anaconda (Version 4.7.12)
Python Projects Created:
Gateway Interface with Google Accounts
Blogging Application
Polling Application
Implementable Google O-Auth Gateway
Biometric Verification Application with Python and C++
Algorithmic Trading Application with Python and C++
Algorithms for Stock Market Trading with Python, C++, bash scripting
Automated Trading Platform with Python and C++
Retina Sensor Detection Algorithms
Internship and Research Experience with Python:
Winter Intern as a Quantitative Research Analyst in Veda Capital, Opera House, Mumbai
Responsible for making Algorithms for Trading in the Indian Equity and Commodity Market, managing and mining F&O data using MySQL, coding the algorithms and alphas in Python programming language with Pandas and NumPy for backtesting and live testing.
Laboratory and Study Project on Application of Machine and Deep Learning in Communication System and Spectrum Sensing
This is a Laboratory and Study Project on Application of Machine and Deep Learning in Communication System and Spectrum Sensing under Shishir Maheshwari, Department of Electrical and Electronics, BITS Pilani. Here we apply Machine Learning techniques in Python for Spectrum Sensing when the primary user has Multiple Transmit Powers. We also investigate and build a machine learning model for spectrum sensing in cognitive radio networks.
Coursework done in College:
Mathematics I, II and III (Vectors, Coordinate Geometry, Sequence and Series, Number Theory, Linear Algebra, Complex Numbers, PDE, ODE, etc.)
Digital Design (Uses extensive Boolean Algebra)
Logic in Computer Science
Computer Programming in C
Operating Systems
Object Oriented Programming in Java
Data Structure and Algorithms
Machine Learning in Python
I firmly believe that Python is ahead of almost all languages in 2020 and after another 15 years or so, will be way ahead than any other language. The biggest three reasons are its simplicity, vast applications, and super strong community support. Python is used extensively in Back End Web Development, Machine Learning, Computer Automation, Applied and Symbolic Mathematics, Database Access, GUI and lots more. Also, the powerful and wholesome libraries Python has with vast community support helps accomplish a lot of computational tasks in a short time. In a time where almost everyone expects Computer Science, Mathematics and Automation to be the most desirable skills, the importance of Python is absolute.
The simplicity and ease of handling complex mathematics is an awesome feature I feel SymPy has. The easy plotting of variable change and variables also impressed me. However, my favorite feature is the complete and wholesome documentation of each function of SymPy and the SymPy interactive shell, where we can easily try out the codes we have written, with the strong community support in case of any queries.
Contributions in SymPy:
Merged PRs:
#18946: Issue 18921: “Extra Strong” Lucas Pseudoprimes
#18945: Issue 18666: Move Matrix out of test_matrices.py
#18961: Issue 18959: Give ‘digits’ a ‘bits’ argument
#18928: Issue 18891: Improper docstring in codegen.array_utils recognize_matrix_expression
Open PRs:
#18973: Issue 18963: `ibin` should raise OverflowError if too few bits are requested; negative arg
Closed PRs:
#18944: Issue 18666: Move Matrix out of test_matrices.py
#18943 : Issue 18666: Move Matrix out of test_matrices.py
Reviews and Discussion:
Issue #18889: Running sympy in Brython (Python in the browser)
PR #18681: Issue 16234: Refine should simplify symmetric matrix element
Issue #18987: Vector Expressions
Issue #18934: Integral cannot be evaluated
My Project:
A core part of any CAS is the assumptions system. This is the subsystem used to record and infer properties of expressions, since virtually any algorithm will only work under some restrictions. Before 2009, we were using an assumption system which was very complex. The assumptions module, core and caching were designed in such an interdependent manner, segregating them individually became a really difficult task.Apart from this, the system had many performance issues. Another problem we faced was that only a handful of facts were implemented. Due to this fundamental nature of the assumption system being so compounded, changes and implementations which were needed on an immediate basis could not be made. This problem escalated because of its proximity with SymPy core.
As a response, building up of a new Assumptions module had begun. This was necessary since the existence of the entire SymPy library depended on it. Since this is a structural change in the library, it takes many years and GSoC projects to implement this. The biggest advantage I have now is that since so many people have worked on it, through their project reports, I have been able to understand to a great extent the structure of it and the precise challenges we face. This proposal aims to systematically identify the issues, formulate tasks and execute it. The old and the new assumption model is shown below.
Aim and Objective:
The main aim is to figure out a way to disconnect the assumptions from the SymPy core, within the time frame before the next release. As far as the SymPy core is concerned, we will use some logic and facts to implement calculation in convenient mode like .is_bounded, is_infinity, is_zero and so on.
The Old Assumptions is time-tested and fast while New Assumptions is slow but provides better logical inferences. We aim to maintain the speed as close to the old assumptions as possible while implementing all the new functionalities and logical inferences we wish to achieve.
Things to remember:
If you change the API, you'll have to modify everywhere that API is used. If you change what an assumption means (e.g., if you change oo.is_real), you'll have to go through
everywhere that assumption and make sure that its use is still correct. Even just making the assumptions more powerful (able to return True or False as opposed to None more often) can require some changes, because some code may take different paths. The final goal is to not have multiple assumptions systems, at least from a user perspective. We will probably have multiple inference layers, but these should be hidden from the user. To do this, we either need to remove one of the existing systems (already tried and failed), or merge the two, so that one calls the other and they just become different layers in the inference stack. So we try to make as much addition possible in new assumptions. The changes which are not scalable from old to new are used in the merged layer. Also the changes we wish to bring is quite huge. The only way to get such a large change is to split it up into much smaller changes that can be merged into master as soon as possible. So, we make sure that the workflow also segregates PRs into specific functionalities and then we add it to master.
Motivation:
I believe that the most important problems of this world are solved by Mathematics. In Mathematics, In Mathematics, I enjoy Number Theory and Boolean Algebra the most. I have had a good exposure to it since high school. I will thoroughly enjoy being part of a project where I am able to put the theoretical knowledge gathered over the years to practical use.
Time Available and Other Commitments:
I will have my end-semester examinations for 14 days which most probably will lie in the community period. The dates are not finalised but will be announced soon. As soon as I get to know the dates I will inform my mentors. Other than that, I will be devoting 50-55 hours a week as I have no other commitment during the GSoC period and have a great interest in the project I intend to do.
Documentation:
Good documentation is an integral part of any successful project. It also helps in receiving a wider acceptance from the developer community. After working on the project for some time, I would write documentation on the various aspects of the project.
Communication:
I will be in regular contact with mentors using email. In case I am stuck somewhere, I would reach out to them via Gitter Chat and Mailing List.
If selected, I also plan to make a blog where I will be giving regular updates about the work completed. I believe that the problems faced by me and their solutions would certainly help other fellow developers later just like I have received great help from various blogs on the internet.
Project Details:
Phase 0: (Community Bonding)
I will begin by doing a thorough analysis of the implementation of the assumption module across the codebase. I will be noting down the areas and other modules where the effect of assumption change is either huge or very less. These notes will be key points used throughout the project to make the project sustainable.
Discussion of Proof of concept for the new assumptions with the mentors. Identifying the areas here where work is complete and figuring out the areas where the work branches out to. Discussing the additional implementation we can pursue in the work already done and start working on it.
Discussion of the work done in the project GSoC 2019 Improving-Assumptions. Try to find out what key concepts did he focus on which improved the speed by this margin. Discuss further possibilities of improvement of speed of satask both by the approach used before and explore the possibility of some new approach for the task. The PR below is relevant for this.
https://github.com/sympy/sympy/pull/17144
Chalk out a plan for the scalable implementation of the SAT solvers. Figure out the conflicts we might face and try to figure out how to handle them.
For non-real expressions isn't the is_positive property always False? · Issue #16313 · sympy/sympy
S(0)**real(!=0) should be (0 or zoo) and hence non-positive. · Issue #16332 · sympy/sympy
The above issues will be explored and discussed with mentors to get a better understanding of handlers and the codegen. The work left in the PR below will be completed. If any extension of the work is possible, a separate PR will be made to work on it.
Improved AskPositiveHandler and AskNegativeHandler · Issue #16213 · sympy/sympy
Phase 1:
We begin by removing the inconsistencies in both old and new assumption modules. The issues above give a good insight of the problems already we face.
Contradictory facts about infinities in the new assumptions · Issue #5976 · sympy/sympy
https://code.google.com/archive/p/sympy/issues/2877
These examples show how the old assumption gave incorrect answers to simple algebra/logic questions. We figure out more of such inconsistencies and implement the correct functionality in the new assumptions module.
Some wrong results in the new assumptions · Issue #7286 · sympy/sympy
In [6]: ask(Q.prime(x**y), Q.integer(x)&Q.integer(y))
Out[6]: False
In [7]: ask(Q.integer(abs(x)), ~Q.integer(x))
Out[7]: False
This is a broken part of the old assumption module we will fix.
Next we move on to satask. The current approach for satask creates too many unnecessary clauses. We will be investigating the facts which are not required for the query and fix it. For example, Q.positive(x) doesn't depend on facts like Q.diagonal(x) or Q.invertible(x) yet is used.
The logic module is essentially built around the assumptions system and is an integral part of the assumptions mechanism. Currently, it lacks the capability of forming complete FOL queries. This restricts the power of SymPy. The issue below highlights this task.
Universal quantification in sympy.logic · Issue #17446 · sympy/sympy
The following PR is relevant for the above task https://github.com/sympy/sympy/pull/7608
In the GSoC project last year, this work was started with an aim of working on the name conflicts with basic SymPy objects along with implementation of some functionalities. However, some files of assumptions, logic and printing had compatibility issues. Debugging is also required of the functionalities implemented. I will solve all these issues along with working on any new functionalities as per requirement. The work done last year is below.
https://github.com/sympy/sympy/pull/17069
With a new system supporting Relationals it would reduce the user's dependency on old syntax for such queries. Currently, the queries like ask(Q.positive(x+y+z),assumptions) takes into account the assumptions on each variable and proceeds through a set of cases to arrive at the result. This system may give incorrect results with the ignorance of some cases. This is what the old assumptions is doing and the current implementation of new assumptions has followed.
Partial work on this issue is done and the link for which is below. We will complete the work in this PR and add additional functionalities.
https://github.com/sympy/sympy/pull/17392
Phase 2:
First, we complete the work done in querying Relational expressions. Now we start working on ‘ask’ handlers. The deduction system for ask was two parted. First it would use some basic logic deduction using a SAT solver and some known facts (like Implies(Q.positive, Q.real)). Second, each fact would have a handler object, which defines methods on how to determine its truth for specific classes given some assumptions, like
class AskPositiveHandler(Handler):
@staticmethod
def Pow(expr, assumptions):
This Returns True, False, or None if expr, which is a Pow, is positive, given the facts in assumptions.
We will keep the new assumption close to the old one. A two argument function.
Next, we work on ‘refine’ function. The idea behind refine() is that since assumptions are separate from objects, the simplifications would need to be done in a function,
which takes the object and the assumptions. So it would work like
refine(sqrt(x**2), Q.positive(x)) -> x. Refine also has a handler system similar to ‘ask’ to handle writing all these simplification routines. The traditional way this was done is in the core, meaning the simplification happens when the object is created, by checking the assumptions on x. It reduces speed to a good extent and makes simplification of function of functions difficult. This is important to boost the performance. Old assumptions are used heavily in the core for automatic simplification which slows things down. Refine is the new way of doing so which uses new assumptions. We can use refine to do the auto simplification and make it optional.
https://github.com/sympy/sympy/issues/17052
The above link contains issues which we will be fixing, as a starting point of our work in refine function.
Now, we work on implementing sat handlers for matrices. We will start off by migrating from handlers/matrices.py to sathandlers.py. We implement scalar matrices, non negative elements, real elements. I have seen mathematical problems involving complex terms in matrices as well. I will give an attempt in implementing something in this area as well. We will also try to implement a more generalised expression to make assumptions about all the elements of a matrix.
Phase 3:
Now, we focus on the simplification and refinement code. Currently all auto simplification and refinement code is procedural and ad-hoc. We build a kind of general simplification engine responsible for simplifications. The attempt here is to build a sustainable code here which encodes simplification rules in a more declarative fashion. We attempt to do it using LogPy. This work replaces sympy's unify module with an external dependency. The work involves adding logpy to setup.py, using logpy from tuple methods, import variables from logpy directly. Here, compatibility may be a problem. I think the latest version of logpy will be compatible. Since the basic flow of work is the same, if logpy does not work out, we will use other libraries based on discussion with mentors.
Timeline: (As per the latest schedule sent by GSoC authorities)
Community Bonding Period: (Present - May 31)
Analysis of the implementation of the assumption module across the codebase.
Discussion of Proof of concept for the new assumptions with the mentors.
Discussion of the work done in the project GSoC 2019 Improving-Assumptions.
Discussion of the issues mentioned in project details.
Discussion of the proposed workflow and changing the order of work if required.
Phase 1:
June 1 to June 7: (Week 1)
Remove the inconsistencies in both old and new assumption modules based on the workflow in project details.
PRs for the work done by June 7 latest.
June 8 to June 21: (Week 2 and 3)
Start exploring the logic model and FOL queries in satask. Study the codebase and all the implementation done by June 9.
Work on the compatibility issues and the PRs linked with the PR #17069 as mentioned.
PRs for the work done by June 15 latest.
Work on the missing implementations in PR #17069 and send a PR for additional implementation after discussion.
First set of PRs for the work done by June 19 latest.
June 22 to June 28: (Week 4)
Second set of PRs for the work done by June 22 latest.
Add support for querying Relational expressions as mentioned in project details.
Add missing functionalities in PR #17392.
PR for the work done by June 26 latest.
Add functionalities for querying Relational expressions.
June 29 to July 5: (Week 5)
Documentation of the work done in phase 1. Fixing bugs and issues on the same.
Phase 1 evaluation submission by July 1 latest
Phase 2:
July 6 to July 19: (Week 6 and 7)
PR for adding functionalities for querying Relational expressions by July 6 latest.
Start the work for ‘ask’ handlers.
Implementing basic logic deduction using a SAT solver and some known facts.
PR for the work done by July 11 latest.
Work on handler objects as mentioned in project details.
PRs for the work done by July 17 latest
July 20 to August 2: (Week 8 and 9)
Implementation and adding functionalities in refine function according to the workflow in project details.
PRs for the work done by July 23 latest
Implementing sat handlers for matrices as per project details.
Phase 2 evaluation submission by July 30 latest
PRs for the work done in sat handlers by August 2 latest
Phase 3:
August 3 to August 9: (Week 10)
Remove assumptions from the core by a module by module approach.
First set of PRs for the work done by August 6 latest.
August 10 to August 23: (Week 11 and 12)
Second set of PRs for the work done by August 10 latest.
First set of PRs for the work done by August 14 latest.
Second set of PRs for the work done by August 17 latest.
Working on Simplification and refinement code using LogPy.
First set of PRs for the work done by August 21 latest.
Second set of PRs for the work done by August 24 latest.
August 24 to August 31: (Week 13)
Finishing up documentation and blogs and work Report on SymPy wiki
Checking for any issues or conflicts unattended
Working on additional goals
Submission for Final Evaluation by August 28 latest.
Additional Goals: (Time Permitting)
In computer science and mathematical logic, the satisfiability modulo theories (SMT) problem is a decision problem for logical formulas with respect to combinations of background theories expressed in classical first-order logic with equality. Some work is done in this area in a previous GSoC project. The aim here is to study the previous work done and solve the issues (mainly compatibility and syntax ones) and work out a plan with mentors for additional implementation of the same.
Remarks:
I have studied the previous work done in the project I intend to do in depth. I have noticed that some of the work promised in the earlier proposals was not fulfilled due to the heavy nature of the work. I will try to be extra cautious so as to not delay my work and miss any PR deadline set by me. I expect to stay ahead of the deadline from the community bonding period to at least some part of phase 2. The extra time given by GSoC authorities in the community bonding time boosts my chances of doing so.
Just in case if I am unable to finish all the tasks, I will request the Mentors to allow me to work for a few days more. I have less coursework in college till around September 15 and these extra days should be sufficient to complete the remaining work.
If any aspect of the project is left out, I will make sure I discuss it with the mentors in the Community Bonding Period itself. Also the nature of work we intend to do is to some extent interdependent and has seen compatibility issues in the past. So, if there is a need to change the order of workflow i.e. for example, completing a phase 3 task in phase 1 itself, I will discuss in detail about it in the Community Bonding Period.
Benefit of my project to the community:
Wide variety of functionality and applications is the most important aspect for a developer. The implementation of new assumptions and addition of features are driven by the applications SymPy has in the real world. I hope my project will be not one, but quite some steps further in development of SymPy and in turn, the development of all fields in which SymPy is used.
References:
GSoC 2010 Report Christian Muise: Supercharging Assumptions · sympy/sympy Wiki
GSoC 2013 Application Angus Griffith: Assumptions · sympy/sympy Wiki
GSoC 2013 Application Tom Bachmann: Removing the old assumptions module · sympy/sympy Wiki
GSoC 2014 Application Shipra Banga Building the New Assumptions Module · sympy/sympy Wiki
GSoC 2015 Application Sudhanshu Mishra: Assumptions · sympy/sympy Wiki
GSoC 2016 Application Bill N. Acha: Port Sympy's new assumptions module to Cpp · sympy/sympy Wiki
GSoC 2019 Proposal Shubham Kumar Jha : Improving Assumptions · sympy/sympy Wiki
GSoC 2019 Report Shubham Kumar Jha: Improving Assumptions · sympy/sympy Wiki
Other Proposals on SymPy Wiki and other websites on the internet
Acknowledgment: (Needs Change)
I would like to thank Isuru Fernando and Shikhar Jaiswal for guiding me in this project proposal. I am grateful for the help extended by my mentors especially Aaron Meurer, S.Y. Lee, Gagandeep Singh and Christopher Smith till now and I hope all these people will continue to help me during GSoC.
Thank You So Much!
@Aaron Meurer