power tests for proportions?

1,076 views
Skip to first unread message

Skipper Seabold

unread,
Nov 11, 2013, 9:58:24 AM11/11/13
to pystat...@googlegroups.com
Do we have this available yet in stats.power.py? It's not obvious to
me yet. I.e., what's the sample size necessary to be able to detect
going from a .1 success rate in a study to .11 (a 10% increase) given
an alpha of .05 and a desired power of .8.

Also are there any references for this? It looks like Stata got power
and sample size functions in Stata 13, but I don't have consistent
access to this.

R also has, for example, power.prop.test.

Skipper

josef...@gmail.com

unread,
Nov 11, 2013, 10:13:19 AM11/11/13
to pystatsmodels
http://stackoverflow.com/questions/15204070/is-there-a-python-scipy-function-to-determine-parameters-needed-to-obtain-a-ta/18379559#18379559

http://jpktd.blogspot.ca/2013/03/statistical-power-in-statsmodels.html
section on "Tests for Proportions"

Difficult to find because it doesn't have a convenience wrapper.

Since I haven't gotten any feedback on usability, I didn't work very
hard on the user interface.

I don't see the power for binom_test, I thought it's there, but I
spend more time on TOST,
http://statsmodels.sourceforge.net/devel/stats.html#proportion

Josef




>
>
> Skipper

Skipper Seabold

unread,
Nov 11, 2013, 10:15:46 AM11/11/13
to pystat...@googlegroups.com
On Mon, Nov 11, 2013 at 3:13 PM, <josef...@gmail.com> wrote:
> On Mon, Nov 11, 2013 at 9:58 AM, Skipper Seabold <jsse...@gmail.com> wrote:
>>
>> Do we have this available yet in stats.power.py? It's not obvious to
>> me yet. I.e., what's the sample size necessary to be able to detect
>> going from a .1 success rate in a study to .11 (a 10% increase) given
>> an alpha of .05 and a desired power of .8.
>>
>> Also are there any references for this? It looks like Stata got power
>> and sample size functions in Stata 13, but I don't have consistent
>> access to this.
>>
>> R also has, for example, power.prop.test.
>
>
> http://stackoverflow.com/questions/15204070/is-there-a-python-scipy-function-to-determine-parameters-needed-to-obtain-a-ta/18379559#18379559
>
> http://jpktd.blogspot.ca/2013/03/statistical-power-in-statsmodels.html
> section on "Tests for Proportions"
>
> Difficult to find because it doesn't have a convenience wrapper.
>
> Since I haven't gotten any feedback on usability, I didn't work very
> hard on the user interface.

I'm looking at possible improvements and more consistent docs.

>
> I don't see the power for binom_test, I thought it's there, but I
> spend more time on TOST,
> http://statsmodels.sourceforge.net/devel/stats.html#proportion
>

Thanks,

Skipper

josef...@gmail.com

unread,
Nov 11, 2013, 10:38:26 AM11/11/13
to pystatsmodels
On Mon, Nov 11, 2013 at 10:15 AM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Mon, Nov 11, 2013 at 3:13 PM, <josef...@gmail.com> wrote:
>> On Mon, Nov 11, 2013 at 9:58 AM, Skipper Seabold <jsse...@gmail.com> wrote:
>>>
>>> Do we have this available yet in stats.power.py? It's not obvious to
>>> me yet. I.e., what's the sample size necessary to be able to detect
>>> going from a .1 success rate in a study to .11 (a 10% increase) given
>>> an alpha of .05 and a desired power of .8.
>>>
>>> Also are there any references for this? It looks like Stata got power
>>> and sample size functions in Stata 13, but I don't have consistent
>>> access to this.
>>>
>>> R also has, for example, power.prop.test.
>>
>>
>> http://stackoverflow.com/questions/15204070/is-there-a-python-scipy-function-to-determine-parameters-needed-to-obtain-a-ta/18379559#18379559
>>
>> http://jpktd.blogspot.ca/2013/03/statistical-power-in-statsmodels.html
>> section on "Tests for Proportions"
>>
>> Difficult to find because it doesn't have a convenience wrapper.
>>
>> Since I haven't gotten any feedback on usability, I didn't work very
>> hard on the user interface.
>
> I'm looking at possible improvements and more consistent docs.

Also a wishlist for useful wrappers for specific hypothesis tests
would be helpful.

http://statsmodels.sourceforge.net/devel/stats.html#power-and-sample-size-calculations
implements normal, t, F and chisquare power calculations, which covers
most of the hypothesis tests.

But they don't tell you to use the NormalndPower for the test of proportions.

Josef

Skipper Seabold

unread,
Nov 11, 2013, 11:41:03 AM11/11/13
to pystat...@googlegroups.com
Something's not right vs. R. I have a function that replicates their
solution, but I haven't looked into the details of yours yet.


[59]: es = sm.stats.proportion_effectsize(.1, .11)

[60]: from statsmodels.stats import power

[61]: power.NormalIndPower().solve_power(es, power=.8, alpha=.05)
[61]: 14744.104836925611


R

power.prop.test(p1=.1, p2=.11, power=.8)

Two-sample comparison of proportions power calculation

n = 14751
p1 = 0.1
p2 = 0.11
sig.level = 0.05
power = 0.8
alternative = two.sided

Skipper Seabold

unread,
Nov 11, 2013, 11:54:48 AM11/11/13
to pystat...@googlegroups.com
Stata 11 has sampsi:

. sampsi .1 .11, power(.8)

Estimated sample size for two-sample comparison of proportions

Test Ho: p1 = p2, where p1 is the proportion in population 1
and p2 is the proportion in population 2
Assumptions:

alpha = 0.0500 (two-sided)
power = 0.8000
p1 = 0.1000
p2 = 0.1100
n2/n1 = 1.00

Estimated required sample sizes:

n1 = 14951
n2 = 14951

. sampsi .1 .11, power(.8) nocont

Estimated sample size for two-sample comparison of proportions

Test Ho: p1 = p2, where p1 is the proportion in population 1
and p2 is the proportion in population 2
Assumptions:

alpha = 0.0500 (two-sided)
power = 0.8000
p1 = 0.1000
p2 = 0.1100
n2/n1 = 1.00

Estimated required sample sizes:

n1 = 14751
n2 = 14751

josef...@gmail.com

unread,
Nov 11, 2013, 12:01:25 PM11/11/13
to pystatsmodels
Look at the stackoverflow link, I'm matching the `pwr` package.
power.prop.test might drop one tail, but I don't have R open right
now. check the `strict` option in power.prop.test

If that's not it, then I have to look.

Josef

josef...@gmail.com

unread,
Nov 11, 2013, 12:03:37 PM11/11/13
to pystatsmodels
The other possibility is that the effectsize calculation uses a
slightly different transformation to normality.

Josef

josef...@gmail.com

unread,
Nov 11, 2013, 12:47:37 PM11/11/13
to pystatsmodels
sm.stats.proportion_effectsize is missing other options that were not
needed to match `pwr`


>>> prop = [0.1, 0.11]
>>> prop_pooled = np.mean(prop)
>>> es2 = (prop[0] - prop[1]) / np.sqrt((1-prop_pooled) * prop_pooled)
>>> es = sm.stats.proportion_effectsize(.1, .11)
>>> es
-0.03262940076737697
>>> es2
-0.032620741805734127
>>> power.NormalIndPower().solve_power(es2, power=.8, alpha=.05)
14751.93332736434

> power.prop.test(p1=.1, p2=.11, power=.8, strict=TRUE)$n
[1] 14750.75
> power.prop.test(p1=.1, p2=.11, power=.8, strict=FALSE)$n
[1] 14750.79

I don't know where the remaining difference comes from.


other options for variance assumptions

>>> es3 = (prop[0] - prop[1]) / np.sqrt((1-prop[0]) * prop[0])
>>> power.NormalIndPower().solve_power(es3, power=.8, alpha=.05)
14127.94891680154
>>> es4 = (prop[0] - prop[1]) / np.sqrt((1-prop[1]) * prop[1])
>>> power.NormalIndPower().solve_power(es4, power=.8, alpha=.05)
15368.068877513018

see option `prop_var`
http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.proportion.proportions_ztest.html


(too many options in how to treat proportions)

Josef

>
> Josef

josef...@gmail.com

unread,
Nov 11, 2013, 12:51:39 PM11/11/13
to pystatsmodels
Just another comment about the code:

Large parts of it is based on the SAS manual and options that are
available in SAS,
I haven't looked at much in R besides `pwr`, and Stata is not so big
on power, but I don't have Stata 13 which seems to have a large
improvement.

Josef


>
> Josef
>
>>
>> Josef

Skipper Seabold

unread,
Nov 13, 2013, 5:51:14 AM11/13/13
to pystat...@googlegroups.com
Right.
Thanks, this is all helpful. Maybe I can whip up a notebook that (starts to) bring this information together.

Skipper Seabold

unread,
Nov 20, 2013, 2:30:14 PM11/20/13
to pystat...@googlegroups.com
FWIW, the source of the difference is that R's power (as opposed to
pwr) and stata do not use the arcsin transformation to get an effect
size. They use the direct comparison of p1 and p2 that's developed in
Fleiss, Levin, and Paik (2003) [1]. R doesn't include the option for
the continuity correction AFAIK, but FLP says to always use it.

Here it is if anyone comes looking (or we want to add it?). I'd tidy
it up / change the API if this were for public consumption.

nobs - size of sample from population 1
r*nobs - size of sample from population 2
p1, p2 - proportions
cont = continuity correction

def func(nobs, p1, p2, alpha, power, r=1., cont=True):
q1 = 1 - p1
q2 = 1 - p2

p_bar = (p1 + r*p2)/(r+1.)
q_bar = 1 - p_bar

z_alpha = stats.norm.ppf(1 - alpha/2.)
z_beta = stats.norm.ppf(power)

n_prime = ((z_alpha * np.sqrt((r+1) * p_bar * q_bar) +
z_beta * np.sqrt(r * p1 * q1 + p2 * q2))**2/
(r*(p2-p1)**2))
if cont:
n = n_prime/4. * (1 + np.sqrt(1 +
(2*(r+1))/(n_prime*r*np.abs(p2-p1))))**2
else:
n = n_prime
if nobs is None:
return n
else:
return nobs - n

#NOTE: if you refactor in terms of z_beta
# z_beta = (np.abs(p2-p1)-1./(2*n)*(r+1.)/r)/np.sqrt(p_bar*q_bar*(r
+ 1.)/(n*r))
# return stats.norm.cdf(z_beta) - power


# find sample size
func(None, .7,.85,.05,.8, cont=False)
# compare to stata sampsi .7 .85, nocont
func(None, .7,.85,.05,.8, cont=True)
# compare to stata sampsi .7 .85

# find power
n = func(None, .7, .85, .05, .8, cont=True)
rooter = lambda power : func(n, .7, .85, .05, power, r=1., cont=True)
optimize.brentq(rooter, 1e-6, 1)



I wrote it in terms of nobs but you can rearrange and solve for the
most common usage to avoid the root-finding.

[1] http://www.amazon.com/Statistical-Methods-Proportions-Joseph-Fleiss/dp/0471526290

josef...@gmail.com

unread,
Nov 20, 2013, 9:38:54 PM11/20/13
to pystatsmodels
From this it looks like under the alternative we have two different variances.
(Reading too much heteroscedasticity and binary and count models lately)

This is currently not handled in any of the power calculations. For
example t_test with different variances are not implemented. AFAIR

Might be the same for testing two Poisson proportions being equal,
which is also still missing.
I have everything set up to use explicit formulas for the sample size
calculations, but haven't implemented them yet. I just use the root
finding for everything except power.


You could add the function to statsmodels.stats.proportions, and open
an issue for writing the corresponding class.

(There is already a lonely
http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.proportion.samplesize_confint_proportion.html
which I added when I was reading and trying out some bootstrap. I
guess it's not for serious hypothesis testing in small samples.)


And just as a general warning: power of proportions (discrete
observations) are not monotonic, even if the normal approximation is.
http://jpktd.blogspot.ca/2013/04/binomial-proportions-equivalence-and.html
There are some hints in some power packages that try to get the
smallest sample size for given power in the sawtooth pattern, IIRC.
See the SAS link in case you are more interested in the explanation or details.

Josef

>
> [1] http://www.amazon.com/Statistical-Methods-Proportions-Joseph-Fleiss/dp/0471526290

Skipper Seabold

unread,
Nov 21, 2013, 4:57:44 AM11/21/13
to pystat...@googlegroups.com
On Thu, Nov 21, 2013 at 2:38 AM, <josef...@gmail.com> wrote:
> And just as a general warning: power of proportions (discrete
> observations) are not monotonic, even if the normal approximation is.
> http://jpktd.blogspot.ca/2013/04/binomial-proportions-equivalence-and.html
> There are some hints in some power packages that try to get the
> smallest sample size for given power in the sawtooth pattern, IIRC.
> See the SAS link in case you are more interested in the explanation or details.
>

Yeah, I saw this. Fun... I need to look more in detail at SAS.

Skipper

josef...@gmail.com

unread,
Nov 21, 2013, 7:50:48 AM11/21/13
to pystatsmodels
I opened a generic ticket https://github.com/statsmodels/statsmodels/issues/1197
but I'm planning to work mainly on "models" for 0.6.

Josef
Reply all
Reply to author
Forward
0 new messages