MEDSTATS: Re: Sample size planning - underestimation of SD from previous evidence

Robert Newcombe

unread,

Feb 10, 2006, 7:36:05 AM2/10/06

to MedS...@googlegroups.com

I hadn't seen the Vickers reference, which makes a very telling point. But there appear to be two slightly different issues here. Suppose we look at a previous study (study 1 - which may have been designated as a 'pilot study' or may have been of definitive intent) and extract a point estimate for the SD (or equivalently variance), with lower and upper confidence limits L and U calculated in the usual way based on the chi-square distribution. Then it is certainly true that alternative sample size calculations for the study we're planning (study 2) using L and U in place of the point estimate would serve as a sensitivity analysis for power calculation. Using U corresponds to a worst-case scenario, using L corresponds to a best-case scenario. Indeed, I would expect that if study 1 was anything other than large, the required sample sizes for study 2 based on U and based on L would be wildly different. I guess the classical two-sided 95% interval, with 2.5% non-coverage in each tail, is too wide for this purpose per se. A correct assessment would involve an integrated model for studies 1 and 2.

Nevertheless, saying this disregards two important issues. Firstly, though the simplest analysis for study 2, an unpaired t-test, is pretty robust, and the usual sample size planning method for a continuous outcome corresponds to this, yet the chi-square-based CI for the SD or variance from study 1 is very far from robust. The best way to see this is to consider that sporadic 'outliers' may equally be manifestations of unexpectedly large variation or of non-Gaussian distributional form. I would only trust such a CI if there is clear evidence, either from study 1 or from other sources, that the distribution is close to Gaussian.

Also, and perhaps even more tellingly, I don't think the above corresponds exactly to what is happening in Vickers' article - although a cursory reading of his abstract only might suggest that considering a CI would obviate the issue. Vickers adduces two explanations. One is the skewness of the distribution of the SD (or indeed variance) about the true value, which applies particularly when study 1 is small. His other point is as follows. "A long tail of very high sample sizes is shown in Fig. 1. Such sample sizes might be considered unfeasible and trials would not be conducted. This would have the effect of inflating the proportion of reported sample size calculations that underestimate SD." True, and this is closely analogous to the phenomenon of publication bias. Vickers seems to present this observation as an artefact effect interfering with his data. But it appears to go deeper than this. Often, researchers base a sample size calculation for the planned study (study 2) on an SD extracted from a previous, published study (study 1). This should of course be much more reliable, in a CI sense, than an SD based on a very small pilot study. Nevertheless it needs to be borne in mind that the fact that study 1 was published implies publication bias with regard to estimation of variation, just as much as with regard to estimation of the mean difference. It has been said that studies in which the play of chance conveniently exaggerates the size of the difference tend to be preferred for publication. Such publication bias is promoted by the behaviour of editorial teams, and consequently also by researchers who recognise that journals operate this way. It also affects which journal the study gets published in, i.e. how prestigious, and for work from non-English-speaking countries, whether in an English-language journal. It is the bane of the meta-analysis community. But exactly the same argument applies to variation. Selection is according to the p-value, so it is equally true that studies in which the play of chance conveniently underestimates the degree of variation tend to be preferred for publication. Thus, using figures taken directly from previous work to form either or both of (a) the target difference for study 2 and (b) the anticipated variation is inadvisable on account of publication bias.

The upshot of all this is that it would make sense to shade upwards any SD estimate taken from published work (primarily on account of publication bias), and also any SD estimate taken from a pilot study (on account of distributional skewness). This is a further reason to increase a planned sample size above the number indicated by a naive power calculation - to join the list of better-known reasons such as attrition and contamination.

Robert G. Newcombe PhD CStat FFPH
Professor of Medical Statistics
Wales College of Medicine
Cardiff University
Heath Park
Cardiff CF14 4XN

PLEASE NOTE NEW LOCATION AND PHONE NUMBER FROM 13/2/2006
4th floor, Neuadd Meirionnydd
(North-East corner of Heath Park site)
029 2068 7247

http://www.cardiff.ac.uk/medicine/epidemiology_statistics/research/statistics/newcombe.htm

>>> t.s.b...@lboro.ac.uk 10/02/06 11:33 >>>

It is worth noting that pilot studies tend to underestimate the SD and
that it is worth correcting the sample size estimates to avoid
underpowered studies. A nice reference on this is Vickers (2003) in the
Journal of Clinical Epidemiology.

Of course you may know this, Jeremy, hence wanting to construct the CI
for the SD.

Alexandre Santos Aguiar

unread,

Feb 12, 2006, 8:25:35 PM2/12/06

to MedS...@googlegroups.com

Em Sex 10 Fev 2006 10:36, Robert Newcombe escreveu:
> The upshot of all this is that it would make sense to shade upwards any SD
> estimate taken from published work (primarily on account of publication
> bias), and also any SD estimate taken from a pilot study (on account of
> distributional skewness).

There is at least one such advice published (that I know of).

Browne, RH. On the use of a pilot sample for sample size determination.
Statistics in Medicine. 1995, 14, 1933-40.

The author recommeds using at least 80% upper one-sided confidence limit
rather than estimate itself.

--

Alexandre Santos Aguiar, MD
- independent consultant for health research -
R Botucatu, 591 cj 81 - 04037-005
São Paulo - SP - Brazil
tel +55-11-9320-2046
fax +55-11-5549-8760
www.spsconsultoria.com

Thom

unread,

Feb 13, 2006, 4:56:57 AM2/13/06

to MedStats

Vickers (2003) cites Browne with regard to the necessary correction for
pilot sample sizes when estimating SD for power.

Professor Newcombe makes some important points. A practical issue is
the degree of precision one needs in sample size estimation. In some
situations precision is very important (e.g., if the cost of treatment
or risk of treatment is high) whereas in others a rough estimate will
do (e.g., if the cost or risk is neglible). One of my worries is that
people obsess about the precision of one detail in the estimate and
ignore (typically much larger variations elsewhere).

At a guess I'd suggest that the largest source of error is in
specifying the size of effect require for clinical or practical
importance. The temptation is always is overestimate this. (To be fair,
medical statistics is probably one of the better fields for this,
because researchers tend to have access to and draw on a large resource
of clinical research and clinical experience).

Thom

John Whittington

unread,

Feb 13, 2006, 8:58:07 AM2/13/06

to MedS...@googlegroups.com

At 01:56 13/02/06 -0800, Thom wrote (in part):

>At a guess I'd suggest that the largest source of error is in
>specifying the size of effect require for clinical or practical
>importance. The temptation is always is overestimate this.

This is one of my pet topics, and I do not lose any opportunity when
talking/lecturing to groups of clinicians to point out that overestimation
of the 'minimum clinically important effect size' ('delta') is a very good
way of 'shooting themselves in the foot'.

The problem is that many clinicians/investigators have discovered that
overestimating 'delta' is a very good way of reducing the sample size
estimate (and show me an investigator who doesn't want to be able to use a
smaller sample size!!). However, by over-estimating delta, the risk
obviously is that a study will end up with a result which, in truth, IS
'clinically important' but (statistically) 'non-significant', because their
small study was not adequately powered to detect it as 'statistically
significant'. This comes into the section entitled "Why not to fool or
bully the Statistician into producing a smaller sample size estimate" which
I include in talks whenever I can !!

Kind Regards,

John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK medis...@compuserve.com
----------------------------------------------------------------

Reply all

Reply to author

Forward