Lots of problems with Stats module

245 views
Skip to first unread message

bonc...@udel.edu

unread,
Mar 19, 2017, 12:41:14 PM3/19/17
to sympy
During the course of writing a textbook, I discovered many problems with the Stats module.  I've attached an ipynb file showing the problems.  Here they are in brief.

1.   This gives the correct answer, but not in the best form.  I can't figure out how to use Sympy to simplify the condition from $e^{-y} \le 1$ to $y>0$.

u, x, y = symbols('u x y', positive=True, real=True)
U = Uniform("u",0,1)
Y = -log(U)
simplify(density(Y)(y))

... Piecewise((exp(-y), exp(-y) <= 1), (0, True))

2. Conditional probabilities with the Uniform distribution don't work. These give NaNs when the answers are simple.

P(U<0.3, U<0.5), P(U<S(1)/3, U<S(1)/2)

... NaN, NaN

3. These answers are wrong. The condition should be $y \le 0.5$, not $y \le 1$, and same for $u$.

Y = given(U, U<1/2)
density(Y)(y), density(U,U<1/2)(u)

... (2.0*Piecewise((1, y <= 1), (0, True)), 2.0*Piecewise((1, u <= 1), (0, True)))

4. These are also wrong. Both should be 0.25.

E(U, U<1/2), E(given(U,U<1/2))

... NaN, NaN

5. This gives a Python error. The correct answer is easy to calculate.

density(1/U)(u)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-f8cdc2170015> in <module>()
----> 1 density(1/U)(u)

/Users/boncelet/anaconda/lib/python3.6/site-packages/sympy/stats/rv.py in density(expr, condition, evaluate, numsamples, **kwargs)
    717                 **kwargs)
    718 
--> 719     return Density(expr, condition).doit(evaluate=evaluate, **kwargs)
    720 
    721 

/Users/boncelet/anaconda/lib/python3.6/site-packages/sympy/stats/rv.py in doit(self, evaluate, **kwargs)
    667             isinstance(pspace(expr), SinglePSpace)):
    668             return expr.pspace.distribution
--> 669         result = pspace(expr).compute_density(expr, **kwargs)
    670 
    671         if evaluate and hasattr(result, 'doit'):

/Users/boncelet/anaconda/lib/python3.6/site-packages/sympy/stats/crv.py in compute_density(self, expr, **kwargs)
    400             raise ValueError("Can not solve %s for %s"%(expr, self.value))
    401         fx = self.compute_density(self.value)
--> 402         fy = sum(fx(g) * abs(g.diff(y)) for g in gs)
    403         return Lambda(y, fy)
    404 

TypeError: 'Complement' object is not iterable

6. This gives the same Python error (error message deleted):

X = Exponential("x",1)
density(1/X)(x)



SympyStatsProblems.ipynb

Francesco Bonazzi

unread,
Mar 19, 2017, 12:53:13 PM3/19/17
to sympy
Point 1 is a limitation in SymPy (not currently implemented). I would have set it as a milestone for version 1.0, but the community has already decided to get version 1.0.

There are some known problems with the stats module. One that I know of, sometimes the arguments of the integration set are passed in the wrong way to the integral function.

Unfortunately being SymPy an open source project it relies on people volunteering to add code to it, so fixing bugs has to wait until someone has time to do it.

bonc...@udel.edu

unread,
Mar 19, 2017, 4:37:00 PM3/19/17
to sympy
I'm well aware Sympy is open source.  My hope is someone more familiar than I with the Stats code might be able to fix these.

My other hope is someone proposes to do a GSOC project on fixing the current sympy faults, rather than implementing some new feature that few people need.

Charlie

Francesco Bonazzi

unread,
Mar 20, 2017, 7:11:05 AM3/20/17
to sympy


On Sunday, 19 March 2017 21:37:00 UTC+1, bonc...@udel.edu wrote:
My other hope is someone proposes to do a GSOC project on fixing the current sympy faults, rather than implementing some new feature that few people need.

Do you feel like applying for a GSoC project?

Jason Moore

unread,
Mar 25, 2017, 5:49:21 PM3/25/17
to sy...@googlegroups.com
It would be great if each of these could be opened as an issue on the Github repo. It allows us to better organize and track these. Also, feel free to create and idea on the GSoC wiki ideas page so that applicants will notice this.

--
You received this message because you are subscribed to the Google Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sympy+unsubscribe@googlegroups.com.
To post to this group, send email to sy...@googlegroups.com.
Visit this group at https://groups.google.com/group/sympy.
To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/013ca15d-f62d-4333-b9dc-9d63f5c9a0b7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vedarth Sharma

unread,
Mar 26, 2017, 4:55:52 AM3/26/17
to sympy
I am applying for G-SoC 17 and haven't decided on a project yet. I have plenty of time and have no commitments. Although I admit I am not too familiar with sympy but I love fixing bugs. Can I use this as my GSoC proposal? Is someone ready to mentor it?

Francesco Bonazzi

unread,
Mar 26, 2017, 9:18:55 AM3/26/17
to sympy


On Sunday, 26 March 2017 10:55:52 UTC+2, Vedarth Sharma wrote:
I am applying for G-SoC 17 and haven't decided on a project yet. I have plenty of time and have no commitments. Although I admit I am not too familiar with sympy but I love fixing bugs. Can I use this as my GSoC proposal? Is someone ready to mentor it?

Hi, how much do you know about probability and statistics?

I think that fixing the bugs is either not enoughfor a GSoC project or unrelated to the stats module. If you want, we can think about expanding the capabilities of the stats module, I have some ideas in mind that could work.

Vedarth Sharma

unread,
Mar 26, 2017, 11:37:15 AM3/26/17
to sympy
I was a competitive programmer before. I know few concepts and algorithms and their implementation. If you are talking in terms of pure mathematics then my basics of Probability and statistics are pretty strong. I have studied many higher level topics for competitive programming. Please tell me your ideas.

Francesco Bonazzi

unread,
Mar 26, 2017, 2:43:18 PM3/26/17
to sympy


On Sunday, 26 March 2017 17:37:15 UTC+2, Vedarth Sharma wrote:
Please tell me your ideas.

Some of my ideas include:

  • Managing expressions containing random symbols,
  • random matrices (Matthew Rocklin already wrote some stuff about this),
  • random indexed symbols (could be useful to represent stochastic processes symbolically),
  • investigate how to deal with dependent random variables (how to define them, how to calculate the probabilities),
  • support for hyperparameters in statistical distributions.

Vedarth Sharma

unread,
Mar 26, 2017, 6:18:10 PM3/26/17
to sympy
@Francesco you are a life saver. I am very interested in Random Matrices. Do you think I can propose a project on it? Please tell me where can I learn more about it? Has any work been done related to this? Please guide me how to start.



Francesco Bonazzi

unread,
Mar 27, 2017, 4:32:47 AM3/27/17
to sympy


On Monday, 27 March 2017 00:18:10 UTC+2, Vedarth Sharma wrote:
@Francesco you are a life saver. I am very interested in Random Matrices. Do you think I can propose a project on it? Please tell me where can I learn more about it? Has any work been done related to this? Please guide me how to start.



https://sympystats.wordpress.com/2011/07/19/multivariate-normal-random-variables/

This was a working project 6 years ago. It has not been merged. I think it would not be enough for a GSoC project, as most of the work has been done.

I suggest to try to apply for multiple improvements to the stats module.

Vedarth Sharma

unread,
Mar 27, 2017, 4:51:38 AM3/27/17
to sympy
What are the multiple improvements you are talking about like fixing the bugs? I am willing to do a project that is going to make a real, noticeable and positive impact to the organization. Please guide me. In ideas page i had interest in multiple topics therefore i wasn't able to decide which one to choose. But now deadline is very near. I need to submit the proposal to you guys for feedback as well...
Can i submit proposals on multiple ideas?
Will it be okay?

Francesco Bonazzi

unread,
Mar 27, 2017, 7:28:00 AM3/27/17
to sympy

On Monday, 27 March 2017 10:51:38 UTC+2, Vedarth Sharma wrote:
What are the multiple improvements you are talking about like fixing the bugs? I am willing to do a project that is going to make a real, noticeable and positive impact to the organization. Please guide me. In ideas page i had interest in multiple topics therefore i wasn't able to decide which one to choose. But now deadline is very near. I need to submit the proposal to you guys for feedback as well...

Starting with this problem here could be a nice project:
http://stackoverflow.com/questions/32443970/conditional-probability-with-sympy

This guy wants to use a random variable as a parameter for another random variable (normal distribution). Obviously the resulting distribution is not normal. We could add support for this kind of computation.

I'd also like to add support for indexed random variables, but that's harder. A naive way to proceed would be to allow IndexedBase to accept a random variable/expression as its base, and then connect it with an API to express randomness.

Can i submit proposals on multiple ideas?

I don't know the rules, but I believe you're supposed to submit once.

szymon.m...@gmail.com

unread,
Mar 27, 2017, 9:19:49 AM3/27/17
to sympy

Can i submit proposals on multiple ideas?

I don't know the rules, but I believe you're supposed to submit once.

Hi Vedarth,
You can create maximally five proposal, but frankly speaking, I don't know if you can send more than one to one organization. I've read many Sympy material corresponding GSoC, but there wasn't any information about that.

Vedarth Sharma

unread,
Mar 27, 2017, 12:23:10 PM3/27/17
to sympy
I know about the 5 proposal limit. But i don't know about the conditions i.e. if all of them are supposed to be for different organizations.

Aaron Meurer

unread,
Mar 27, 2017, 1:58:24 PM3/27/17
to sy...@googlegroups.com
You can submit more than one proposal to SymPy if you want. However, I
recommend focusing on quality over quantity. Also be aware that if we
decide to accept you, it is our discretion as to which proposal to
accept.

Aaron Meurer

On Mon, Mar 27, 2017 at 12:23 PM, Vedarth Sharma
<vedarth...@gmail.com> wrote:
> I know about the 5 proposal limit. But i don't know about the conditions i.e. if all of them are supposed to be for different organizations.
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sympy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/59a74caa-8e89-42dd-8f95-4bf04f1713bb%40googlegroups.com.

Vedarth Sharma

unread,
Mar 27, 2017, 2:17:52 PM3/27/17
to sympy
@Aaron I was just so overwhelmed when I saw so many ideas that I wasn't able to decide which one to choose. How about this, I am gonna write my proposals and give it to you guys to review and get feedback and I will submit the best one. I think that will be the best option.

Vedarth Sharma

unread,
Mar 27, 2017, 2:20:30 PM3/27/17
to sympy
@Francesco Can you please tell me how can we add support for computations where resulting distribution is not normal? Thanks a lot for the idea :)

Francesco Bonazzi

unread,
Mar 27, 2017, 3:19:17 PM3/27/17
to sympy


On Monday, 27 March 2017 20:20:30 UTC+2, Vedarth Sharma wrote:
@Francesco Can you please tell me how can we add support for computations where resulting distribution is not normal? Thanks a lot for the idea :)

 
Do you mean the example on the link?

http://stackoverflow.com/questions/32443970/conditional-probability-with-sympy

In such a case we could define the new density and create a new random variable. There are tools in sympy.stats for this.

Vedarth Sharma

unread,
Mar 27, 2017, 4:14:02 PM3/27/17
to sympy
Actually how should I begin my proposal and what all things to cover?

Francesco Bonazzi

unread,
Mar 27, 2017, 4:22:59 PM3/27/17
to sympy


On Monday, 27 March 2017 22:14:02 UTC+2, Vedarth Sharma wrote:
Actually how should I begin my proposal and what all things to cover?

Just write a draft, we can then review it.

By the way, what is your knowledge level of statistics and probability? Can you give us some more details?

Vedarth Sharma

unread,
Mar 28, 2017, 8:44:31 AM3/28/17
to sympy
Well, I know the basic stuff like calculating probability using set relations, Binomial distribution, Poission distribution, Law of independent events etc. I have implemented some algorithms involving these basic topics in competitive programming. In statistics I know graphs, histogram, Mean, median, mode, standard deviation, variance, etc. Again my basics are strong so I am good at what I know and learn new implementations and concepts pretty quickly. I have implemented some algorithms involving these topics as well but not that complex when compared to probability.

Akash Vaish

unread,
Mar 14, 2018, 5:11:18 PM3/14/18
to sympy
Hey. I have prepared a proposal for improving the probability module for GSoC 2018. The ideas I have taken up are the ones mentioned on the ideas page for 2018, and I was wondering if you would be mentoring the project this year. Also, is someone else going to be a co-mentor for the project as well?
Reply all
Reply to author
Forward
0 new messages