Brief Intro and Poisson sampling

57 views
Skip to first unread message

christophere...@gmail.com

unread,
Jan 31, 2018, 8:05:53 AM1/31/18
to sympy
Hi all,

I'm Chris, British grad student at MIT. Circa 2 years experience in Python but that is mainly in small scripts written for research purposes - am looking to get some experience writing production level code. Have an advanced PhD level understanding of physics/maths so comfortable creating algorithms but a newcomer to sympy and open source contributions - therefore, the guidance I'm looking for will mainly be in moving my code into the "real world" (doctests/docstrings/workflow/improving code readability etc). Happy to work on almost any problems and big fan of diversity so any collaboration requests would be welcome. 

I've dived right into the poisson sampling problem - https://github.com/sympy/sympy/pull/13943. I have a working algorithm, similar to what was proposed by Leonid Kovalev. The algorithm finds an upper bound on the cdf then uses a bisection to refine the root. It works well for high values of lambda - the upper bound calculation is very low cost and bisection is O(log2(lambda)). 

It's only limitation comes from the maximum recursion depth for high lambda cdf calculations which I think might be an issue considering in itself. I am getting extremely variable behaviour in computing the cdf for high values of lambda.
I'm not sure if anyone else who is more knowledgable in this area can suggest why this might be? The behaviour I'm observing is that calculating the cdf for values of lambda greater than ~200 gives a maximum recursion depth error. However, 
if the cdf is first calculated for all the lower values of lambda (say 100,120,140,160 etc) during debugging then it has no problem evaluating the cdf up to 200 and even higher (which is what is done in the previous algorithm).
I'm assuming these values are temporarily stored in memory somewhere and so it requires less recursive calls for these higher values but I'm slightly at a loss as to how and why it is doing this. Any pointers appreciated. 
Possible solutions/workarounds could be using normal approximations (should be a good approximation above say lambda = 100, particularly considering the discrete nature of the sampling we're doing) or perhaps writing a new function
specifically for the poisson cdf as it doesn't have a (particularly useful) closed form. 

As a side note - another very useful algorithm for sampling of random distributions is the rejection sampling method - https://en.wikipedia.org/wiki/Rejection_sampling. It does require bounds being placed on the values
so is not suitable for every situation but it is fast and easy to implement. Perhaps, worth considering for method overloading in situations where there is a difficult cdf to invert and the user can provide bounds? Open 
to suggestions but just putting the idea out there. 

Looking forward to working with you all.
Cheers,
Chris 


Leonid Kovalev

unread,
Jan 31, 2018, 11:31:34 AM1/31/18
to sympy
The recursion issue was caused by the evaluation of incomplete gamma functions being recursive. That was changed in PR #14021 which was merged two days ago. Try to update the branch you are working with (the PR changes another module, so there should be no merge conflicts.)

Somewhat related: the current approach to Poisson CDF is to compute it by summing the PDF (symbolically), which results in a decent formula in terms of lower incomplete gamma function. However that formula has a couple of flaws: (1) it does not take the floor of x, which leads to wrong output for fractional x.
>>> cdf(Poisson('x', 5))(0).n()
0.00673794699908547
>>> cdf(Poisson('x', 5))(0.9).n()
0.0350993891118802

(2) It would be simpler and better for numerics to use the upper incomplete gamma function. 

So I suggest adding the Poisson CDF formula from Wikipedia and adding it as an internal _cdf method to the PoissonDistributionClass. This was recently done for many continuous distributions (Example), but not for discrete ones.  


Reply all
Reply to author
Forward
0 new messages