FWIW, the source of the difference is that R's power (as opposed to
pwr) and stata do not use the arcsin transformation to get an effect
size. They use the direct comparison of p1 and p2 that's developed in
Fleiss, Levin, and Paik (2003) [1]. R doesn't include the option for
the continuity correction AFAIK, but FLP says to always use it.
Here it is if anyone comes looking (or we want to add it?). I'd tidy
it up / change the API if this were for public consumption.
nobs - size of sample from population 1
r*nobs - size of sample from population 2
p1, p2 - proportions
cont = continuity correction
def func(nobs, p1, p2, alpha, power, r=1., cont=True):
q1 = 1 - p1
q2 = 1 - p2
p_bar = (p1 + r*p2)/(r+1.)
q_bar = 1 - p_bar
z_alpha = stats.norm.ppf(1 - alpha/2.)
z_beta = stats.norm.ppf(power)
n_prime = ((z_alpha * np.sqrt((r+1) * p_bar * q_bar) +
z_beta * np.sqrt(r * p1 * q1 + p2 * q2))**2/
(r*(p2-p1)**2))
if cont:
n = n_prime/4. * (1 + np.sqrt(1 +
(2*(r+1))/(n_prime*r*np.abs(p2-p1))))**2
else:
n = n_prime
if nobs is None:
return n
else:
return nobs - n
#NOTE: if you refactor in terms of z_beta
# z_beta = (np.abs(p2-p1)-1./(2*n)*(r+1.)/r)/np.sqrt(p_bar*q_bar*(r
+ 1.)/(n*r))
# return stats.norm.cdf(z_beta) - power
# find sample size
func(None, .7,.85,.05,.8, cont=False)
# compare to stata sampsi .7 .85, nocont
func(None, .7,.85,.05,.8, cont=True)
# compare to stata sampsi .7 .85
# find power
n = func(None, .7, .85, .05, .8, cont=True)
rooter = lambda power : func(n, .7, .85, .05, power, r=1., cont=True)
optimize.brentq(rooter, 1e-6, 1)
I wrote it in terms of nobs but you can rearrange and solve for the
most common usage to avoid the root-finding.
[1]
http://www.amazon.com/Statistical-Methods-Proportions-Joseph-Fleiss/dp/0471526290