Sample size of a poison distribution when an over-dispersion parameter is given

157 views
Skip to first unread message

Mm76923 .

unread,
Feb 14, 2014, 10:00:13 AM2/14/14
to medstats

 

Hi all,


Does anyone know how to calculate a sample size of a poison distribution when an over-dispersion parameter is given (when the mean is different to the variance) please?

I have the following problem (see reference below):


“Based on an average of two falls per year with an SD of 1·5 and a 25% rate of attrition, a sample size of 352 would have 90% power to detect a 30% reduction in the rate of falls from 2·0 to 1·4 in the intervention group with a probability of p<0·05.”

 

I have used the formula below to calculate the sample size WITHOUT using the SD of 1.5 given. This gave me different results: 372 instead of 352 on the paper.

 

n = [(u+v)^2(u1+u0)]/[(u1+u0)^2]

Where

u= one-sided percentage point of the normal distribution corresponding to 100% - the power. e.g. if power = 90 %, (100% - power) = 10% and = 1.28

= percentage point of the normal distribution corresponding to the (two-sided) significance level. e.g. if significance level = 5%, = 1.96

u0= 1.4 - fall rate in intervention group (assuming 30% reduction)

u1 = 2 - fall rate in control group

 

Question: does anyone know which formula to use which incorporate the

over-dispersion parameter (SD) given?

 

Reference: Close J, et al. Prevention of falls in the elderly trial (PROFET): a randomised controlled trial. Lancet 1999; 353: 93–97

 

Many thanks in advance


Jeka, M

Statistician & Economist

MacLennan, Graeme

unread,
Feb 14, 2014, 11:10:42 AM2/14/14
to meds...@googlegroups.com

Hello, it is not clear why you want to do this.  But digging up the Lancet paper, perhaps you want to re-create their sample size calculation?  They did not power on Poisson (over dispersed or otherwise); they have used the sample size calculation for continuous normal distribution as follows:

 

To detect a change from 2 to 1.4 with an assumed pooled standard deviation of 1.5 (alpha 0.05, beta 0.10) gives 132 per arm or 264 in total. Inflating for 25% attrition gives the magical 352 in the Lancet 1999 paper.  

 

If you want to power a new study, then to jump in ahead of Bendix, simulate. If you have access to Stata see:

 

Joseph Hilbe. "Creation of Synthetic Discrete Response Regression Models" Stata Journal 10.1 (2010): 104-124. Available at: http://works.bepress.com/joseph_hilbe/2

 

It is quite straightforward to then estimate power for a variety of scenarios.

 

For formulae try these papers (caveat is these are in “my pile to read soon” that clutters my office)

 

Matsui S. Sample size calculations for comparative clinical trials with over-dispersed Poisson process data. Statistics in Medicine 2005; 24:1339–1356.

 

Haiyuan Zhu, Hassan Lakkis Sample size calculation for comparing two negative binomial rates. Statistics in Medicine 2014; 33:376-387.

 

HTH,

g

 

Graeme MacLennan

Senior Statistician

Health Services Research Unit

Health Sciences Building

University of Aberdeen

Foresterhill

Aberdeen AB25 2ZD

Tel: +44 (0)1224 438147

Fax +44 (0)1224 438165

Email: g.mac...@abdn.ac.uk

Web: www.abdn.ac.uk/hsru

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The University of Aberdeen is a charity registered in Scotland, No SC013683.

BXC (Bendix Carstensen)

unread,
Feb 14, 2014, 12:59:06 PM2/14/14
to meds...@googlegroups.com

I seems that you have a model which is

 

Y|m ~ Poisson(m)

m ~ [some distribution on the positive axis with SD 1.5 and mean 1.4/2.0]   

I do not know exactly what the attrition of 25% refers to, but surely it will be in the model.

 

The answer to your question as all other questions is to simulate from the target distribution, and then analyse each of the simulated datasets which eventually will give you the power in a range of circumstances. And at the same time give the much more relevant expected precision of you estimates.

 

Programming is quite simple, but it might take some computing time to walk through a relevant range of scenarios. But I have found that my computer happily works even when I am out on the town enjoying a good dinner. An sure yours will too. So the complicated thing is really to find a good restaurant.

 

Regards

Bendix Carstensen

 

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of Mm76923 .


Sent: 14. februar 2014 16:00
To: medstats

--

John Whittington

unread,
Feb 14, 2014, 1:30:39 PM2/14/14
to meds...@googlegroups.com
At 17:59 14/02/2014 +0000, BXC (Bendix Carstensen) wrote:
>I do not know exactly what the attrition of 25% refers to, but surely it
>will be in the model.

It's simply an allowance to take into account the fact that not all
subjects who enter a trial will generate (any or full) data. Having
estimated, by whatever means, the number of subjects' data required to
achieve the required power to detect a certain magnitude of effect, one
simply adds on such an allowance (typically 20-25%) to achieve an estimate
of a fairly 'safe' number of subjects to enter into the trial to achieve
the desired power. ... just like you ordering 25 light bulbs when you has
estimated/calculated that you actually need 20 (satisfactory ones!), but
also knows that up to 20% of those ordered may be broken during delivery to
you!

It's not really something which needs to be modelled. It's just an
'real-world adjustment' applied to the estimate of required sample size
(which assumes all subjects generate data).

Kind Regards,


John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

mm7...@gmail.com

unread,
Feb 17, 2014, 6:04:45 AM2/17/14
to meds...@googlegroups.com
Good suggestion Bendix


Do you, by any chance, have a R program which can simulate the target distribution and do the test statistics please? I presume you are suggesting to: simulate data, do the test to accept or reject the null hypothesis /or alternative hypothesis given a calculated critical value. if the answer to this is yes, can the Z test be used to do the test of hypothesis if not what do you suggest.

Many thanks

Nick   

--
--
To post a new thread to MedStats, send email to Med...@googlegroups.com .

mm7...@gmail.com

unread,
Feb 17, 2014, 6:13:51 AM2/17/14
to meds...@googlegroups.com

Hi Graeme

Many thanks for this.

In the Lancet paper, the data have an over-dispersed  poison distribution. Is using the sample size calculation for continuous normal distribution  correct in this case?

Many thanks

--
--
To post a new thread to MedStats, send email to Med...@googlegroups.com .


MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

BXC (Bendix Carstensen)

unread,
Feb 17, 2014, 6:23:48 AM2/17/14
to meds...@googlegroups.com
I don’t have a ready cooked program, but you would write a functions like

sim1 <-
function( N, m, sd )
{
# Generate a log-normal
# Here you have to work out how Mn and Sd depend on m and sd
MM <- exp( rnorm(N,Mn,Sd) )
YY <- rpois( MM )
< do the analysis, returning pval and sd.est >
return( p=pval, se=sd.est )
}

# Then set up an array for the simulations scheme

dnam <- list( iter=1:1000,
N=c(100,200,500),
mn=c(1.3,1.5,1.7),
sd=c(1.3,1.5,1.7),
res=c("P","se") )
pwrarr <- array( NA, dimnames=dnam, dim=sapply(dnam,length) )

for( I in 1:1000 )
for( n in dimnames(pwrarr)[[1]] )
for( m in dimnames(pwrarr)[[2]] )
for( s in dimnames(pwrarr)[[3]] )
pwarr[i,n,m,s,] <- sim1( N=n, m=m, sd=s )

---- and then you can print (using ftable) or plot the entries in pwarr, or more likely some result(s) of:

apply( pwarr, 2:5, <some relevant function> )

Hope this helps - its only a rudimentary scheme, the basic idea is tio have a function that simulates one dataset and analyses it, and the set up an array to collect 1000s of analysis results in variuous scenarios.

Bets regards
Bendix Carstensen
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

BXC (Bendix Carstensen)

unread,
Feb 17, 2014, 6:30:24 AM2/17/14
to meds...@googlegroups.com

I am constantly baffled by 1) authors that bother and 2) editors that accept papers using algebraic approximations to derive sample sizes. The computer was invented and widely available some 25 years ago, so you do not need to resort to approximations to some problem that only approximates yours any more.

 

I gather it is because funding bodies still accept applications with inadequate sample size calculations.

 

Bendix Carstensen

 

 

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

mm7...@gmail.com

unread,
Feb 17, 2014, 7:15:17 AM2/17/14
to meds...@googlegroups.com

Many thanks Bendix for the codes.
Reply all
Reply to author
Forward
0 new messages