> We are in the process of designing a trial to compare two groups of
> patients seeing either their GP or practice nurse. We are currently
> looking at recruiting around 10 practices. The primary outcome
> measure is how enabled the patient felt after his/her consultation.
> The PI wants to individually randomise patients to intervention or
> control. This is not therefore a designed cluster RCT, however, from
> the very scant literature, there have been reasonably high ICCs
> reported (up to 0.15 at a health professional level) for this outcome,
> when used in a cluster design.
>
> What is the consensus about adjusting for the 'natural' clustering at
> the design stage? Clearly there are several potential levels of
> clustering, but at the moment we are primarily interest in the
> practice level (i.e. the "centre"). Some authors advocate reducing
> the sample size according to the magnitude of the centre effect,
> which, with an ICC of 0.1 or 0.2, could be quite a substantial
> reduction. For some reason, I don't feel all that comfortable with
> this!
A while back I took a course on this topic and the instructor said
something very profound. He said that if you have a random effect and
fail to account for it, your standard errors will be incorrect. That I
already knew, but he took it a bit further. He said that the standard
errors for between cluster comparisons would be too big and the standard
errors for withing cluster comparisons would be too small. That's very
logical if you think about it for a while, but it had never occurred to
me to think of it in that way before.
Here's another bit of intuition that might help. Randomizing within each
center will effectively make the treatment effect and the center effect
orthogonal to one another. If you account for and remove a source of
uncertainty due to the center effect, that reduces your noise. Since it
is orthogonal, you don't have to worry about any collinearity effects
mucking things up.
If you're still uncomfortable, use a very small value for the ICC in
your sample size calculations. You could also argue that the truly
conservative approach would be to set the ICC to zero. You're guaranteed
then to not have a sample size that's too small no matter what the
center effect. Being conservative, of course, means that your experiment
will cost more, but sometimes the comfort of choosing a conservative
approach outweighs the extra expense.
I hope this helps.
--
Steve Simon, Standard Disclaimer
Free statistics webinar, Wed, Oct 14, 10am CDT.
"P-values, confidence intervals, and the Bayesian alternative"
Details at www.pmean.com/webinars
I might be digging holes for myself to fall into, but:
1...I was thinking/writing in terms of the total sample size, and did not
address the (important) question of the ideal number of 'clusters' and
distribution of that total sample size between the 'clusters'.
2..I am finding it difficult to get my head around the generality of the
conceptual suggestion that if each additional subject within some group
adds 'less new information' that one consequently needs larger sample
sizes. I would have thought that one of the most common situations in
which 'each additional subject adds little new information' (in relation to
the estimate of an effect) arises when the effect of interest has a low
variance - and it obviously would be the antithesis of the truth to suggest
that decreasing variability of the effect requires an increasing sample size.
I suspect that I am probably missing something!
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------
Thanks, Bruce - but, unfortunately, I am unable to see pages 39-137 on
Google Books.
I would point out that I am not disputing that, in some senses (in relation
to location), higher correlation means that each additional subject
'contributes less information'; it is the suggested link between this and
sample size requirements with which I'm having a bit of a problem.
This is really an issue of non-independence that results from
grouping. The ICC is a measure of that independence. If the ICC is
high, then subjects within each cluster are more similar to one
another--that is they share the influence of cluster on the outcome of
interest. They are not independent. (And thus the aforementioned
impact on SEs.)
If the ICC is perfect, then there is perfect dependence among
responses within clusters--even though the subjects are "different" we
can't count the responses as different--lowering the effective sample
size to the N of clusters. If the ICC is 0, then even though these
subjects are clustered, their responses are independent and we could
count each response individually. This is not unlike the issue of
pseudo replication in experiments--e.g. repeated measurements that are
not accounted for.
Brett
> I must confess that I am very puzzled by the comments from Steve &
> John. According to everything I've read, the higher the ICC, the less
> new information contributed by each additional subject within a
> cluster. Therefore, as the ICC increases, you need more subjects, not
> less. (However, the better approach when the ICC is high is to
> increase the number of clusters rather than the number of subjects
> within clusters.) So I don't see how using a very small value of the
> ICC for the sample size estimate can be viewed as conservative. I
> would have said the opposite, i.e., that using a higher value of the
> ICC is conservative. (I assume by "conservative", we all mean sure to
> provide a large enough sample size estimate.)
It is counter intuitive. But you're thinking of a cluster randomized
trial where all the patients in a cluster receive the same treatment. In
an individually randomized trial, there are both treatments within a
cluster. The contrast within a cluster removes some of the shared
variation, and the amount of variation removed is larger when the ICC is
larger.
If it helps, think about clusters of size 2, then the test statistic is
the paired t-test, which requires fewer patients as the ICC increases.
--
Steve Simon, Standard Disclaimer
Sign up for The Monthly Mean at www.pmean.com/news
Steve,
The problem is hidden in your cluster size 2 example, unfortunately.
Ostensibly, those 2 within cluster subjects would be randomized to two
different treatments. That's great. Now, take that same pair (if the
ICC is 1, that pair represents the cluster well) and measure them 5,
10, 50 times. Is your sample size really increasing within that
cluster. No. That is pseudo-replication and is analogous to the
problem with clustering when the subjects are homogeneous due to
non-independence. If the ICC is low on the other hand, then those
replications make sense as individual observations because their is no
evidence of lack of independence among subjects within a cluster.
Is that not what Steve and I (and the original poster) have been saying -
or, at least, implying?
>He then gives another example WITH cluster randomisation, and says,
>"the 'conservative' procedure is almost perfect, while the 'liberal'
>procedure leads to a huge underestimation of the required sample
>size" (Twisk 2006, p. 130).
Depending upon what is being regarded as the 'liberal' procedure, that
sounds reasonable, but it was not the situation which we (at least, not
Steve and I) were considering.
Kind Regards,
John
At 14:24 14/10/2009 -0500, Brett Magill wrote:
>Steve,
>
>The problem is hidden in your cluster size 2 example, unfortunately.
>Ostensibly, those 2 within cluster subjects would be randomized to two
>different treatments. That's great. Now, take that same pair (if the
>ICC is 1, that pair represents the cluster well) and measure them 5,
>10, 50 times. Is your sample size really increasing within that
>cluster. No. That is pseudo-replication and is analogous to the
>problem with clustering when the subjects are homogeneous due to
>non-independence. If the ICC is low on the other hand, then those
>replications make sense as individual observations because their is no
>evidence of lack of independence among subjects within a cluster.
Well, yes, that much I had assumed. However, in the present context, the
important question is the nature of the variance in question. To be
meaningful in context, it presumably would have to be the variance of the
between-treatments difference; is that what you/Twisk intend?
Kind Regards,
http://stat.gamma.rug.nl/PowerSampleSizeMultilevel.pdf
Here is a quote from that article.
"In case 2, a level-one variable without between-group variation, the
multilevel design is always more efficient. This efficiency of
within-subject designs is a well-known phenomenon. For estimating a
population mean (case 1) or the effect of a level-two
variable (case 4), on the other hand, the multilevel design always is
less efficient, and more seriously so as the cluster size and the
intraclass correlation are larger" p 5.
He also discusses the case of level-one variables with between cluster
variation and random slopes within clusters, where the issue get more
complicated...So, I'll be the first to admit that I oversimplified in
my previous response. If there is no between group (referring to
clusters, not treatment) variation this is the typical repeated
measures design with increased efficiency. However, when there is a
group effect, then the multilevel structure and intraclass correlation
are important in determining the efficiency of the design and the
effect can be either positive or negative.
Brett
The outcome variable, Y, is a function of an overall mean (MU), a
treatment effect (B), a random effect for each center (C) and an error
term for each subject (E). The subscripts i, j, and k, represent levels
of treatment, center, and patient.
Yijk = MU + Bi + Cj + Eijk.
Also let n represent the number of centers and 2m represent the number
of patients per center.
The estimate of treatment effect is
YBAR1 - YBAR2 = MU + B1 - B2 + SUM jk E1jk/n*m - SUM jk E2jk/n*m.
Note that the variation due to center does not appear in this model
because the subtraction cancels out the center effect. This is true
whether there are 2 subjects per center or 20 subjects per center.
I call this a "random intercept" model since it assumes that the
intercept (MU + Cj) shifts randomly up or down from center to center.
Now that's the model I was considering when I made my earlier comments.
But there is another reasonable model
Yijk = MU + Bi + Cj + Dij + Eijk
where Dij is a "random slope" term, representing the tendency for a
treatment effect to vary from center to center (Bi + Dij).
In the random slope model the estimate of treatment effect includes a
new term
SUM j D1j/n - SUM j D2j/n
Under this model the problems with pseudo replication occur and failure
to account for the "random slope" effect can cause you to underestimate
the required sample size. Note that the variation in this term does not
go down as the number of patients per center increases.
So the concerns of Bruce Weaver and others may be valid depending on
whether you have a "random intercepts" or a "random slopes" model. The
size of the variations in the random slopes compared to variations in
the random intercepts are also critical here.
Does this help clarify things?
--
Steve Simon, Standard Disclaimer
Indeed. I think that helps to crystallise very well some of the concepts
that some of us (i.e. myself!) have been making rather a meal of! Like
you, I had been thinking in terms of what you call the 'random intercepts'
model. I think this also explains the concerns which arose from the
statements about the effect of 'ICC' on power/required sample size - since
the value of ICC, per se, tells us nothing about which of your models one
is dealing with, general statements about the effects of ICC on sample size
requirements surely cannot be made.
Cannot these concepts also be expressed in different, perhaps more
familiar, terms, by reference to interactions? If I'm not mistaken (albeit
I often am!), your 'random intercepts' model is the simplest possible, with
only main terms, whilst your 'random slopes' model includes a
treatment*centre interaction. Standard default methods of estimating
sample size requirements (for testing hypotheses relating to main effects)
assume the absence of interactions and it is hardly surprising that such
calculations under-estimate required sample size in the presence of
interactions between main effects, increasingly so as those interactions
become larger. Is that a correct/reasonable way of looking at it?
> Cannot these concepts also be expressed in different, perhaps more
> familiar, terms, by reference to interactions? If I'm not mistaken (albeit
> I often am!), your 'random intercepts' model is the simplest possible, with
> only main terms, whilst your 'random slopes' model includes a
> treatment*centre interaction. Standard default methods of estimating
> sample size requirements (for testing hypotheses relating to main effects)
> assume the absence of interactions and it is hardly surprising that such
> calculations under-estimate required sample size in the presence of
> interactions between main effects, increasingly so as those interactions
> become larger. Is that a correct/reasonable way of looking at it?
Thanks. I like the term "treatment*centre interaction" better than
"random slopes" (though I do spell "center" slightly differently on this
side of the Atlantic).