On Fri, Sep 30, 2016 at 8:25 PM, Stuart Reynolds
<
stu...@stuartreynolds.net> wrote:
> Here's a good (Bayesian) way of think about it:
>
>
http://stats.stackexchange.com/questions/13225/what-is-the-distribution-of-the-binomial-distribution-parameter-p-given-a-samp
>
http://www.sumsar.net/blog/2014/01/bayesian-first-aid-binomial-test/
> (which referenence to R package)
>
> It (I think reasonably) requires specifying a prior distribution for
> prob(p).
> If we assume, with no observations of k, that all p's are equally likely,
> then after seeing k=3, n=3, you'll get prob(p)>0 for all p, not (p=k/n,
> which seems very unlikely in the world).
> From this point of view, when should see prob(p)=1 or prob(p)=0, or
> confidence intervals with width 0, only with very unreasonable priors.
>
> Would be nice to have a functions to measure:
> prob(p>x| n,k,prior)
> prob(p<x| n,k,prior)
> prob( p in [lower, upper] | n,k,prior)
> ci( n,k, prior )
>
> ... for binominal.
>
> Its a little surprising that libraries for this are so hard to find. It
> seems essential for generating confidence bounds on classification algorithm
> performance with low n.
There are many Bayesian packages in python, so there is no real need
for statsmodels to cover it, with some possible exceptions. (We are
also introducing "priors" through penalization, but we still go for
inference based on sampling distribution.) (*)
beta is the natural conjugate for binomial, so it is relatively easy
to work with the beta-binomial pair.
However, if your prior is uniform on [0, 1], then you should be a bit
surprised to see (3, 3) since half of your prior probability is below
0.5.
(IIRC: using scipy.stats
prior
>>> stats.beta.pdf(np.linspace(0,1,11), 1, 1)
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
posterior sf after seeing 3,3
>>> stats.beta.sf(np.linspace(0,1,11), 4, 1)
array([ 1. , 0.9999, 0.9984, 0.9919, 0.9744, 0.9375, 0.8704,
0.7599, 0.5904, 0.3439, 0. ])
>>> stats.beta.pdf(np.linspace(0,1,11), 4, 1)
array([ 0. , 0.004, 0.032, 0.108, 0.256, 0.5 , 0.864, 1.372,
2.048, 2.916, 4. ])
90% credibility (posterior) interval (because of monotonicity)
>>> stats.beta.isf(0.9, 4, 1)
0.56234132519034907
or something like this
)
(*) I thought it would be useful to have conjugate prior and sample
distributions somewhere given that they don't need the heavy MCMC
machinery, but it never looked exciting enough to me to get it higher
on my own priorities. And for everything except for the simple cases
the heavy Bayesian computational machine is necessary.
Also, there are 37 open issues for classical proportions
https://github.com/statsmodels/statsmodels/search?q=proportion&state=open&type=Issues&utf8=%E2%9C%93
Josef