[Discussion] GSoC 2020 Stats module

162 views
Skip to first unread message

Smit Lunagariya

unread,
Jan 20, 2020, 3:13:46 AM1/20/20
to sympy
Hi,
I am Smit Lunagariya, from Mathematics and Computing, IIT-BHU.
I would like to work on the stats module during this summer, Currently stats module only a few type stochastic process, So I would like to add more stochastic process types.
Also the compound distributions are yet to implemented so I would work upon adding them in the stats module along with its different types to compound distributions following the API of current stats module as in the crv.py, drv.py, frv.py. I am also ready to implement some of new ideas that are yet to be added to stats module.

Smit Lunagariya

unread,
Jan 24, 2020, 2:43:57 PM1/24/20
to sympy

Currently, stats module support only markov chains and Bernoulli Process as the stochastic process. I would like to implement more of such stochastic process mention in this link : https://en.wikipedia.org/wiki/List_of_stochastic_processes_topics
 

Gagandeep Singh (B17CS021)

unread,
Jan 24, 2020, 2:57:11 PM1/24/20
to sy...@googlegroups.com
Well, a good starting point for the community bonding phase will be to test the current implementation, especially the query handler of Markov Chain and see if there is a scope of improvement, and implementing Random Walks. 
In fact a rough plan, like division of various ideas across the complete timeline, would be better for further discussion on this thread.

With Regards,
Gagandeep Singh
Github - https://www.github.com/czgdp1807
LinkedIn - https://www.linkedin.com/in/czgdp1807

On Sat, 25 Jan, 2020, 1:14 AM 'Smit Lunagariya' via sympy, <sy...@googlegroups.com> wrote:

Currently, stats module support only markov chains and Bernoulli Process as the stochastic process. I would like to implement more of such stochastic process mention in this link : https://en.wikipedia.org/wiki/List_of_stochastic_processes_topics
 

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/63b6d057-d2b1-4a0a-96ed-f68d5424f78e%40googlegroups.com.

Smit Lunagariya

unread,
Jan 28, 2020, 12:09:52 PM1/28/20
to sympy
Hi,
I would surely prepare a rough timeline plan of implementation within few days and provide an update. Can you please guide me to the updated ideas list, so I could also extract the ideas from them and add them to the plan?

Thanks,
Smit Lunagariya.

Smit Lunagariya

unread,
Feb 22, 2020, 1:04:27 AM2/22/20
to sympy


Hi,

I am Smit Lunagariya, an undergraduate student from Mathematics and Computing Engineering, Indian Institute Technology-BHU. I am programming in python for one year. I am interested in Mathematics and its symbolic computation, specifically in Statistics. I have experience in Probabilistic Machine Learning and Deep Learning.

I have undertaken several relevant Courses related such as Probability and Statistics, Abstract Algebra, Engineering Mathematics, NPTEL -Stochastic Process By Dr. S. Dharmaraja (IIT-Delhi), Data Structures and Information Technology Workshop (on Python). Currently, I am enrolled in several institute courses such as Algorithms, Numerical Techniques, Operating Systems, and Mathematical Methods.

I have been contributing to sympy since December 2019 and got quite familiar with the contributing guidelines and workflow.

I would like to discuss related to the idea for GSoC 2020 in the stats module. I have prepared a rough timeline regarding this summer project.

 Community Bonding Period :

As many distributions can be added in the stats module under Discrete and Continuous Random variable, I would like to add them as some of them might be useful in further implementation of Joint Multivariate Distributions. They are:

1. Borel (Discrete)

2. Conway-Maxwell-Poisson (Discrete)

3. Gauss-Kuzmin (Discrete)

4. Lomax (Continuous)

5. Feller-Pareto (Continuous)

6. Bounded Pareto (Continuous)

7. Symmetric Pareto (Continuous)

8. Logit Normal (Continuous)

9. Inverse Gaussian (Continuous)

10. Inverse Chi-squared (Continuous)

Also, I would add the `.doit()` method in class Probability.

While adding these distributions I would also work upon increasing the code coverage by adding tests and also tests of missing lines from the `crv.py`, `drv.py`, `frv.py`, `drv_types.py`, `crv_types.py` and `frv_types.py`.

 Phase 1 :

Currently, the stats module supports Markov chains and Bernoulli Process as the stochastic processes. I would like to add more of such stochastic processes which include:

1. Poisson Process

2. Birth-Death Process

3. Wiener Process

5. Random Walks

6. Gamma Process

7. Queueing Process

While adding the above process, I would also work upon adding their related tests and increasing the code coverage of the `stochastic_process_types.py`.

 Phase 2 :

During the beginning of this phase, I would try to clean up the remaining part of Phase 1 and would then start implementing the following portions:

1. Work upon Adding assumptions of the dependence of random variables.

2. Work upon Adding support of Compound Distributions and adding more examples related to it.

3. Adding densities of Circular ensembles in Random Matrices.

While discussing the API and implementing it, I would ensure to add the necessary tests and work on increasing the code coverage.

 Phase 3 :

During the beginning of this phase, I would try to clean up the remaining part of Phase 2 and would then start implementing the following portions:

1. Currently, Joint distribution lacks a well-defined framework, I would work upon changing the specific portions and make it more general for more distributions to add upon it.

2. Adding more multivariate distributions which include: 

    1. Wishart

    2. Matrix Gamma

    3. Normal Inverse Gamma

    4. Inverse Wishart

    5. Normal Wishart

    6. Normal Inverse Wishart

    7. Inverse Matrix Gamma

3. Adding sampling methods to Continuous Random variables from external libraries such as pyc3, NumPy, and scipy.

While adding these distributions I would also work upon increasing the code coverage by adding tests and also tests of missing lines from the `joint_rv.py` and `joint_rv_types.py`.

Finally, I would complete the remaining work before the final evaluation.

I have provided the rough timeline which I would like to follow during this project. Changes and the addition of ideas and suggestions are appreciated.

Thank you.

Gagandeep Singh (B17CS021)

unread,
Feb 22, 2020, 12:10:58 PM2/22/20
to sy...@googlegroups.com
Hi Smit,

1. Community Bonding - It looks like many distributions are going to be added in the stats module. I would suggest to reduce the number of new distributions to at most 5. Instead you can include testing and improving the joint distributions, multivariate distributions, the current implementation of Markov chain(it will help you to understand the design better) which will form the basis of your work during the coding period. The algorithm which I implemented for handling queries in Markov Chains was my own creation so it needs to be tested for generality and may be fixed if it fails. You can think of some cases where it fails and include them in your proposal. Adding `.doit()` method is a nice idea to `Probability` class. You can include your approach for the same in your proposal.

2.  Phase 1 - This phase looks good to me. However make sure that the order in which you try to add new Stochastic processes should have general ones first and the special cases later on. For example, birth-death is a special case of continuous time markov process. 

3. Phase 2 - In this phase as you are planning, adding compound distributions should be the most important part and should be done at the earliest. It is stalled since 2018 as far as I know. You should pick up the already open PRs for compound distributions and try to complete them. 

4. Phase 3 - I think you can shift the work on Joint Distributions to Phase 2 and add the work on random matrices in this phase from Phase 2. The random matrices part is quite immature right now in master and need work for improvements. Adding more multivariate distributions is quite good as it needs attention and has very little support as compared to the univariate counter parts.

There are many permutations possible for doing things and changes will keep on happening during the coding period according to the bugs encountered.

Best wishes.

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.


--
With regards,
Gagandeep Singh

Smit Lunagariya

unread,
Feb 23, 2020, 2:12:28 PM2/23/20
to sympy
Hi Gagandeep,
Thanks for your response. I would surely go with the suggestion and change the timeline accordingly.

Smit Lunagariya

unread,
Mar 27, 2020, 3:59:46 AM3/27/20
to sympy


Hello Everyone,
I am attaching the link to my draft proposal here. I have shared it with Sympy. Please review it and provide the comments to improve it further.
Thanking you.
Regards,
Smit Lunagariya
Reply all
Reply to author
Forward
0 new messages