It's funny, but I'm giving a course on Python for data Science and today I've implemented the sf for the hypergeometric distribution (for the exact's fisher test) using the gammaln
def log_binom(N, n):
return log_fact(N)- (log_fact(n) + log_fact(N-n))
def log_hyper_pmf(k, N, K, n):
return log_binom(K, k)+log_binom(N-K, n-k)-log_binom(N, n)
def log_hyper_sf(k, N, K, n):
ks = np.arange(k, K)
ks = ks.astype(float)
factors = np.log((K-ks)) + np.log(n-ks)- np.log((ks+1)*(N-K-(n-ks-1)))
log_p = log_hyper_pmf(k, N, K, n)
return log_p + factors.cumsum()
N = 1308172 + 2442374
n = 1308172
K = 1306970
k=int(float(n)/N*K)
b = log_hyper_sf(k, N, K, n)
This is not the best way to do it, but it was useful to show up lot of numpy/scipy features