OT: Fun - Evolution of a Python programmer.py

32 views
Skip to first unread message

josef...@gmail.com

unread,
Jun 12, 2015, 11:44:46 PM6/12/15
to pystatsmodels
"Evolution of a Python programmer.py"
https://gist.github.com/fmeyer/289467


AFAICS nobody is using scipy.special gamma directly, so it works also for reals and is fast for large numbers (scipy.factorial does)


:)

Josef
more OT
(still going strong at 85, and I just saw that he died about a month ago.)
(Friday blues night on youtube :)

Javier Burroni

unread,
Jun 22, 2015, 9:28:07 PM6/22/15
to pystatsmodels
(regarding gamma)
You also have gammaln, which is even better for larger numbers
--
" To be is to do " ( Socrates )
" To be or not to be " ( Shakespeare )
" To do is to be " ( Sartre )
" Do be do be do " ( Sinatra )

Javier Burroni

unread,
Jun 22, 2015, 9:36:23 PM6/22/15
to pystatsmodels
It's funny, but I'm giving a course on Python for data Science and today I've implemented the sf for the hypergeometric distribution (for the exact's fisher test) using the gammaln 

log_fact = lambda N: special.gammaln(N+1)

def log_binom(N, n):
    return log_fact(N)- (log_fact(n) + log_fact(N-n))

def log_hyper_pmf(k, N, K, n):
    return log_binom(K, k)+log_binom(N-K, n-k)-log_binom(N, n)

def log_hyper_sf(k, N, K, n):
    ks = np.arange(k, K)
    ks = ks.astype(float)
    factors = np.log((K-ks)) + np.log(n-ks)- np.log((ks+1)*(N-K-(n-ks-1)))
    log_p = log_hyper_pmf(k, N, K, n)
    return log_p + factors.cumsum()

N = 1308172  + 2442374
n = 1308172
K = 1306970
k=int(float(n)/N*K)
b = log_hyper_sf(k, N, K, n)
np.exp(b).sum()


This is not the best way to do it, but it was useful to show up lot of numpy/scipy features

josef...@gmail.com

unread,
Jun 22, 2015, 9:48:20 PM6/22/15
to pystatsmodels
On Mon, Jun 22, 2015 at 9:36 PM, Javier Burroni <javier....@gmail.com> wrote:
It's funny, but I'm giving a course on Python for data Science and today I've implemented the sf for the hypergeometric distribution (for the exact's fisher test) using the gammaln 

log_fact = lambda N: special.gammaln(N+1)

def log_binom(N, n):
    return log_fact(N)- (log_fact(n) + log_fact(N-n))

def log_hyper_pmf(k, N, K, n):
    return log_binom(K, k)+log_binom(N-K, n-k)-log_binom(N, n)

def log_hyper_sf(k, N, K, n):
    ks = np.arange(k, K)
    ks = ks.astype(float)
    factors = np.log((K-ks)) + np.log(n-ks)- np.log((ks+1)*(N-K-(n-ks-1)))
    log_p = log_hyper_pmf(k, N, K, n)
    return log_p + factors.cumsum()

N = 1308172  + 2442374
n = 1308172
K = 1306970
k=int(float(n)/N*K)
b = log_hyper_sf(k, N, K, n)
np.exp(b).sum()


This is not the best way to do it, but it was useful to show up lot of numpy/scipy features

Interesting, Is it correct?

It's funny, someone opened an issue today for hypergeom.logsf 

(numerical issues calculating logsf in the tail for large arguments when sf is almost zero)

Josef

Javier Burroni

unread,
Jun 23, 2015, 9:14:17 AM6/23/15
to pystatsmodels
That is funny!
It works but it has the same issue as the current implementation: I'm computing the log probabilities for the tails using the recurrent relation, exponentiate all the values and then sum it up and so the log is lost.
The code mentioned in the comment: http://fredrikj.net/blog/2014/06/easy-hypergeometric-series-at-unity/ is the right way to do it

(btw, I've found a litlle bug computing the arrays. It should be ks = np.arange(k, min(K, n)) instead of ks = np.arange(k, K)
best regards
jb

josef...@gmail.com

unread,
Jun 23, 2015, 9:41:34 AM6/23/15
to pystatsmodels
On Tue, Jun 23, 2015 at 9:14 AM, Javier Burroni <javier....@gmail.com> wrote:
That is funny!
It works but it has the same issue as the current implementation: I'm computing the log probabilities for the tails using the recurrent relation, exponentiate all the values and then sum it up and so the log is lost.
The code mentioned in the comment: http://fredrikj.net/blog/2014/06/easy-hypergeometric-series-at-unity/ is the right way to do it

Ok, I thought you might be doing better because you calculate in logs, I didn't try to understand what the last sum of exp is.

Aside: I haven't found an application yet where I care about whether a probability is 1e-100 or 1e-500

Josef
Reply all
Reply to author
Forward
0 new messages