OT: Fun - Evolution of a Python programmer.py

josef...@gmail.com

unread,

Jun 12, 2015, 11:44:46 PM6/12/15

to pystatsmodels

"Evolution of a Python programmer.py"

https://gist.github.com/fmeyer/289467

https://gist.github.com/fmeyer/289467#comment-90097

AFAICS nobody is using scipy.special gamma directly, so it works also for reals and is fast for large numbers (scipy.factorial does)

https://gist.github.com/fmeyer/289467#comment-224494

:)

Josef

more OT

https://www.youtube.com/watch?v=FMEzFgCLv8c

(still going strong at 85, and I just saw that he died about a month ago.)

(Friday blues night on youtube :)

Javier Burroni

unread,

Jun 22, 2015, 9:28:07 PM6/22/15

to pystatsmodels

(regarding gamma)

You also have gammaln, which is even better for larger numbers

--

" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

Javier Burroni

unread,

Jun 22, 2015, 9:36:23 PM6/22/15

to pystatsmodels

It's funny, but I'm giving a course on Python for data Science and today I've implemented the sf for the hypergeometric distribution (for the exact's fisher test) using the gammaln

log_fact = lambda N: special.gammaln(N+1)

def log_binom(N, n):

return log_fact(N)- (log_fact(n) + log_fact(N-n))

def log_hyper_pmf(k, N, K, n):

return log_binom(K, k)+log_binom(N-K, n-k)-log_binom(N, n)

def log_hyper_sf(k, N, K, n):

ks = np.arange(k, K)

ks = ks.astype(float)

factors = np.log((K-ks)) + np.log(n-ks)- np.log((ks+1)*(N-K-(n-ks-1)))

log_p = log_hyper_pmf(k, N, K, n)

return log_p + factors.cumsum()

N = 1308172 + 2442374

n = 1308172

K = 1306970

k=int(float(n)/N*K)

b = log_hyper_sf(k, N, K, n)

np.exp(b).sum()

This is not the best way to do it, but it was useful to show up lot of numpy/scipy features

josef...@gmail.com

unread,

Jun 22, 2015, 9:48:20 PM6/22/15

to pystatsmodels

On Mon, Jun 22, 2015 at 9:36 PM, Javier Burroni <javier....@gmail.com> wrote:

It's funny, but I'm giving a course on Python for data Science and today I've implemented the sf for the hypergeometric distribution (for the exact's fisher test) using the gammaln

log_fact = lambda N: special.gammaln(N+1)

def log_binom(N, n):
return log_fact(N)- (log_fact(n) + log_fact(N-n))

def log_hyper_pmf(k, N, K, n):
return log_binom(K, k)+log_binom(N-K, n-k)-log_binom(N, n)

def log_hyper_sf(k, N, K, n):
ks = np.arange(k, K)
ks = ks.astype(float)
factors = np.log((K-ks)) + np.log(n-ks)- np.log((ks+1)*(N-K-(n-ks-1)))
log_p = log_hyper_pmf(k, N, K, n)
return log_p + factors.cumsum()

N = 1308172 + 2442374
n = 1308172
K = 1306970
k=int(float(n)/N*K)
b = log_hyper_sf(k, N, K, n)
np.exp(b).sum()

This is not the best way to do it, but it was useful to show up lot of numpy/scipy features

Interesting, Is it correct?

It's funny, someone opened an issue today for hypergeom.logsf

https://github.com/scipy/scipy/issues/4982

(numerical issues calculating logsf in the tail for large arguments when sf is almost zero)

Josef

Javier Burroni

unread,

Jun 23, 2015, 9:14:17 AM6/23/15

to pystatsmodels

That is funny!

It works but it has the same issue as the current implementation: I'm computing the log probabilities for the tails using the recurrent relation, exponentiate all the values and then sum it up and so the log is lost.

The code mentioned in the comment: http://fredrikj.net/blog/2014/06/easy-hypergeometric-series-at-unity/ is the right way to do it

(btw, I've found a litlle bug computing the arrays. It should be ks = np.arange(k, min(K, n)) instead of ks = np.arange(k, K)

best regards

jb

josef...@gmail.com

unread,

Jun 23, 2015, 9:41:34 AM6/23/15

to pystatsmodels

On Tue, Jun 23, 2015 at 9:14 AM, Javier Burroni <javier....@gmail.com> wrote:

That is funny!
It works but it has the same issue as the current implementation: I'm computing the log probabilities for the tails using the recurrent relation, exponentiate all the values and then sum it up and so the log is lost.
The code mentioned in the comment: http://fredrikj.net/blog/2014/06/easy-hypergeometric-series-at-unity/ is the right way to do it

Ok, I thought you might be doing better because you calculate in logs, I didn't try to understand what the last sum of exp is.

Aside: I haven't found an application yet where I care about whether a probability is 1e-100 or 1e-500

Josef

Reply all

Reply to author

Forward