Planning an individually randomised trial with natural clusters

Siobhan Creanor

unread,

Oct 13, 2009, 9:51:35 AM10/13/09

to MedStats

Dear All,

We are in the process of designing a trial to compare two groups of
patients seeing either their GP or practice nurse. We are currently
looking at recruiting around 10 practices. The primary outcome
measure is how enabled the patient felt after his/her consultation.
The PI wants to individually randomise patients to intervention or
control. This is not therefore a designed cluster RCT, however, from
the very scant literature, there have been reasonably high ICCs
reported (up to 0.15 at a health professional level) for this outcome,
when used in a cluster design.

What is the consensus about adjusting for the 'natural' clustering at
the design stage? Clearly there are several potential levels of
clustering, but at the moment we are primarily interest in the
practice level (i.e. the "centre"). Some authors advocate reducing
the sample size according to the magnitude of the centre effect,
which, with an ICC of 0.1 or 0.2, could be quite a substantial
reduction. For some reason, I don't feel all that comfortable with
this!

There seems to have been quite a bit of discussion recently about
'therapist' effects in trials but I'm struggling to come up with
something comparable to our proposal.

Any thoughts gratefully received!

Kind regards,
Siobhan

Steve Simon, P.Mean Consulting

unread,

Oct 13, 2009, 10:24:48 AM10/13/09

to meds...@googlegroups.com

Siobhan Creanor wrote:

> We are in the process of designing a trial to compare two groups of
> patients seeing either their GP or practice nurse. We are currently
> looking at recruiting around 10 practices. The primary outcome
> measure is how enabled the patient felt after his/her consultation.
> The PI wants to individually randomise patients to intervention or
> control. This is not therefore a designed cluster RCT, however, from
> the very scant literature, there have been reasonably high ICCs
> reported (up to 0.15 at a health professional level) for this outcome,
> when used in a cluster design.
>
> What is the consensus about adjusting for the 'natural' clustering at
> the design stage? Clearly there are several potential levels of
> clustering, but at the moment we are primarily interest in the
> practice level (i.e. the "centre"). Some authors advocate reducing
> the sample size according to the magnitude of the centre effect,
> which, with an ICC of 0.1 or 0.2, could be quite a substantial
> reduction. For some reason, I don't feel all that comfortable with
> this!

A while back I took a course on this topic and the instructor said
something very profound. He said that if you have a random effect and
fail to account for it, your standard errors will be incorrect. That I
already knew, but he took it a bit further. He said that the standard
errors for between cluster comparisons would be too big and the standard
errors for withing cluster comparisons would be too small. That's very
logical if you think about it for a while, but it had never occurred to
me to think of it in that way before.

Here's another bit of intuition that might help. Randomizing within each
center will effectively make the treatment effect and the center effect
orthogonal to one another. If you account for and remove a source of
uncertainty due to the center effect, that reduces your noise. Since it
is orthogonal, you don't have to worry about any collinearity effects
mucking things up.

If you're still uncomfortable, use a very small value for the ICC in
your sample size calculations. You could also argue that the truly
conservative approach would be to set the ICC to zero. You're guaranteed
then to not have a sample size that's too small no matter what the
center effect. Being conservative, of course, means that your experiment
will cost more, but sometimes the comfort of choosing a conservative
approach outweighs the extra expense.

I hope this helps.
--
Steve Simon, Standard Disclaimer
Free statistics webinar, Wed, Oct 14, 10am CDT.
"P-values, confidence intervals, and the Bayesian alternative"
Details at www.pmean.com/webinars

Bruce Weaver

unread,

Oct 14, 2009, 7:18:36 AM10/14/09

to MedStats

On Oct 13, 10:24 am, "Steve Simon, P.Mean Consulting "
<n...@pmean.com> wrote:

--- snip ---

> If you're still uncomfortable, use a very small value for the ICC in
> your sample size calculations. You could also argue that the truly
> conservative approach would be to set the ICC to zero. You're guaranteed
> then to not have a sample size that's too small no matter what the
> center effect. Being conservative, of course, means that your experiment
> will cost more, but sometimes the comfort of choosing a conservative
> approach outweighs the extra expense.
>
> I hope this helps.

I must confess that I am very puzzled by the comments from Steve &
John. According to everything I've read, the higher the ICC, the less
new information contributed by each additional subject within a
cluster. Therefore, as the ICC increases, you need more subjects, not
less. (However, the better approach when the ICC is high is to
increase the number of clusters rather than the number of subjects
within clusters.) So I don't see how using a very small value of the
ICC for the sample size estimate can be viewed as conservative. I
would have said the opposite, i.e., that using a higher value of the
ICC is conservative. (I assume by "conservative", we all mean sure to
provide a large enough sample size estimate.)

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

John Whittington

unread,

Oct 14, 2009, 10:36:34 AM10/14/09

to meds...@googlegroups.com

At 04:18 14/10/2009 -0700, Bruce Weaver wrote:
>I must confess that I am very puzzled by the comments from Steve &
>John. According to everything I've read, the higher the ICC, the less
>new information contributed by each additional subject within a
>cluster. Therefore, as the ICC increases, you need more subjects, not
>less. (However, the better approach when the ICC is high is to
>increase the number of clusters rather than the number of subjects
>within clusters.) So I don't see how using a very small value of the
>ICC for the sample size estimate can be viewed as conservative. I
>would have said the opposite, i.e., that using a higher value of the
>ICC is conservative. (I assume by "conservative", we all mean sure to
>provide a large enough sample size estimate.)

I might be digging holes for myself to fall into, but:

1...I was thinking/writing in terms of the total sample size, and did not
address the (important) question of the ideal number of 'clusters' and
distribution of that total sample size between the 'clusters'.

2..I am finding it difficult to get my head around the generality of the
conceptual suggestion that if each additional subject within some group
adds 'less new information' that one consequently needs larger sample
sizes. I would have thought that one of the most common situations in
which 'each additional subject adds little new information' (in relation to
the estimate of an effect) arises when the effect of interest has a low
variance - and it obviously would be the antithesis of the truth to suggest
that decreasing variability of the effect requires an increasing sample size.

I suspect that I am probably missing something!

Kind Regards,

John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

Bruce Weaver

unread,

Oct 14, 2009, 11:45:30 AM10/14/09

to MedStats

On Oct 14, 10:36 am, John Whittington <Joh...@mediscience.co.uk>
wrote:

John, I found one of the places where I read about each additional
subject contributing less information when the ICC > 0. It's in
"Applied Multilevel Analysis", by Jos Twisk. I was about to start
typing an excerpt, but then though better of it and checked to see if
the relevant pages are visible in Google Books. It turns out they
are. See pages 127-128.

Cheers,
Bruce

John Whittington

unread,

Oct 14, 2009, 12:28:32 PM10/14/09

to meds...@googlegroups.com

At 08:45 14/10/2009 -0700, Bruce Weaver wrote:
>John, I found one of the places where I read about each additional
>subject contributing less information when the ICC > 0. It's in
>"Applied Multilevel Analysis", by Jos Twisk. I was about to start
>typing an excerpt, but then though better of it and checked to see if
>the relevant pages are visible in Google Books. It turns out they
>are. See pages 127-128.

Thanks, Bruce - but, unfortunately, I am unable to see pages 39-137 on
Google Books.

I would point out that I am not disputing that, in some senses (in relation
to location), higher correlation means that each additional subject
'contributes less information'; it is the suggested link between this and
sample size requirements with which I'm having a bit of a problem.

Bruce Weaver

unread,

Oct 14, 2009, 2:32:31 PM10/14/09

to MedStats

On Oct 14, 12:28 pm, John Whittington <Joh...@mediscience.co.uk>
wrote:

> At 08:45 14/10/2009 -0700, Bruce Weaver wrote:
>
> >John, I found one of the places where I read about each additional
> >subject contributing less information when the ICC > 0. It's in
> >"Applied Multilevel Analysis", by Jos Twisk. I was about to start
> >typing an excerpt, but then though better of it and checked to see if
> >the relevant pages are visible in Google Books. It turns out they
> >are. See pages 127-128.
>
> Thanks, Bruce - but, unfortunately, I am unable to see pages 39-137 on
> Google Books.
>
> I would point out that I am not disputing that, in some senses (in relation
> to location), higher correlation means that each additional subject
> 'contributes less information'; it is the suggested link between this and
> sample size requirements with which I'm having a bit of a problem.
>
> Kind Regards,
>
> John

How about that, I had trouble seeing it again now too, but did manage
to get it back after some fiddling around. I'll upload those two
pages to the Files section for the group.

Brett Magill

unread,

Oct 14, 2009, 2:46:05 PM10/14/09

to meds...@googlegroups.com, Joh...@mediscience.co.uk, bwe...@lakeheadu.ca

Hi John,

This is really an issue of non-independence that results from
grouping. The ICC is a measure of that independence. If the ICC is
high, then subjects within each cluster are more similar to one
another--that is they share the influence of cluster on the outcome of
interest. They are not independent. (And thus the aforementioned
impact on SEs.)

If the ICC is perfect, then there is perfect dependence among
responses within clusters--even though the subjects are "different" we
can't count the responses as different--lowering the effective sample
size to the N of clusters. If the ICC is 0, then even though these
subjects are clustered, their responses are independent and we could
count each response individually. This is not unlike the issue of
pseudo replication in experiments--e.g. repeated measurements that are
not accounted for.

Brett

Steve Simon, P.Mean Consulting

unread,

Oct 14, 2009, 2:51:03 PM10/14/09

to meds...@googlegroups.com

Bruce Weaver wrote:

> I must confess that I am very puzzled by the comments from Steve &
> John. According to everything I've read, the higher the ICC, the less
> new information contributed by each additional subject within a
> cluster. Therefore, as the ICC increases, you need more subjects, not
> less. (However, the better approach when the ICC is high is to
> increase the number of clusters rather than the number of subjects
> within clusters.) So I don't see how using a very small value of the
> ICC for the sample size estimate can be viewed as conservative. I
> would have said the opposite, i.e., that using a higher value of the
> ICC is conservative. (I assume by "conservative", we all mean sure to
> provide a large enough sample size estimate.)

It is counter intuitive. But you're thinking of a cluster randomized
trial where all the patients in a cluster receive the same treatment. In
an individually randomized trial, there are both treatments within a
cluster. The contrast within a cluster removes some of the shared
variation, and the amount of variation removed is larger when the ICC is
larger.

If it helps, think about clusters of size 2, then the test statistic is
the paired t-test, which requires fewer patients as the ICC increases.

--
Steve Simon, Standard Disclaimer

Sign up for The Monthly Mean at www.pmean.com/news

Brett Magill

unread,

Oct 14, 2009, 3:24:48 PM10/14/09

to meds...@googlegroups.com

On Wed, Oct 14, 2009 at 1:51 PM, Steve Simon, P.Mean Consulting
<n...@pmean.com> wrote:
> It is counter intuitive. But you're thinking of a cluster randomized
> trial where all the patients in a cluster receive the same treatment. In
> an individually randomized trial, there are both treatments within a
> cluster. The contrast within a cluster removes some of the shared
> variation, and the amount of variation removed is larger when the ICC is
> larger.
>
> If it helps, think about clusters of size 2, then the test statistic is
> the paired t-test, which requires fewer patients as the ICC increases.

Steve,

The problem is hidden in your cluster size 2 example, unfortunately.
Ostensibly, those 2 within cluster subjects would be randomized to two
different treatments. That's great. Now, take that same pair (if the
ICC is 1, that pair represents the cluster well) and measure them 5,
10, 50 times. Is your sample size really increasing within that
cluster. No. That is pseudo-replication and is analogous to the
problem with clustering when the subjects are homogeneous due to
non-independence. If the ICC is low on the other hand, then those
replications make sense as individual observations because their is no
evidence of lack of independence among subjects within a cluster.

Bruce Weaver

unread,

Oct 14, 2009, 4:18:59 PM10/14/09

to MedStats

On Oct 14, 2:51 pm, "Steve Simon, P.Mean Consulting" <n...@pmean.com>
wrote:

> Bruce Weaver wrote:
> > I must confess that I am very puzzled by the comments from Steve &
> > John. According to everything I've read, the higher the ICC, the less
> > new information contributed by each additional subject within a
> > cluster. Therefore, as the ICC increases, you need more subjects, not
> > less. (However, the better approach when the ICC is high is to
> > increase the number of clusters rather than the number of subjects
> > within clusters.) So I don't see how using a very small value of the
> > ICC for the sample size estimate can be viewed as conservative. I
> > would have said the opposite, i.e., that using a higher value of the
> > ICC is conservative. (I assume by "conservative", we all mean sure to
> > provide a large enough sample size estimate.)
>
> It is counter intuitive. But you're thinking of a cluster randomized
> trial where all the patients in a cluster receive the same treatment. In
> an individually randomized trial, there are both treatments within a
> cluster. The contrast within a cluster removes some of the shared
> variation, and the amount of variation removed is larger when the ICC is
> larger.

The passage from Twisk (2006) that I uploaded to the Files page is not
talking about cluster randomisation. On a later page that was not
visible in Google Books, Twisk says that for such studies (i.e.,
without cluster randomisation), the conservative method overestimates
the needed sample size, and the liberal procedure underestimates it.

He then gives another example WITH cluster randomisation, and says,
"the 'conservative' procedure is almost perfect, while the 'liberal'
procedure leads to a huge underestimation of the required sample
size" (Twisk 2006, p. 130).

John Whittington

unread,

Oct 14, 2009, 10:50:15 PM10/14/09

to meds...@googlegroups.com

At 13:18 14/10/2009 -0700, Bruce Weaver wrote:
>The passage from Twisk (2006) that I uploaded to the Files page is not
>talking about cluster randomisation. On a later page that was not
>visible in Google Books, Twisk says that for such studies (i.e.,
>without cluster randomisation), the conservative method overestimates
>the needed sample size, and the liberal procedure underestimates it.

Is that not what Steve and I (and the original poster) have been saying -
or, at least, implying?

>He then gives another example WITH cluster randomisation, and says,
>"the 'conservative' procedure is almost perfect, while the 'liberal'
>procedure leads to a huge underestimation of the required sample
>size" (Twisk 2006, p. 130).

Depending upon what is being regarded as the 'liberal' procedure, that
sounds reasonable, but it was not the situation which we (at least, not
Steve and I) were considering.

John Whittington

unread,

Oct 15, 2009, 7:49:17 AM10/15/09

to meds...@googlegroups.com

In the course of discussing this matter with a colleague last night, we
both eventually came to the conclusion that we're not really sure what
'ICC' we are talking about - correlation between what and what? On the
face of it, it seems as if we could merely be talking about a situation in
which there was a high level of 'correlation' between treatment and
response - which obviously would have very different implications from
those which Brett (and others) are discussing.

Kind Regards,
John

At 14:24 14/10/2009 -0500, Brett Magill wrote:
>Steve,
>
>The problem is hidden in your cluster size 2 example, unfortunately.
>Ostensibly, those 2 within cluster subjects would be randomized to two
>different treatments. That's great. Now, take that same pair (if the
>ICC is 1, that pair represents the cluster well) and measure them 5,
>10, 50 times. Is your sample size really increasing within that
>cluster. No. That is pseudo-replication and is analogous to the
>problem with clustering when the subjects are homogeneous due to
>non-independence. If the ICC is low on the other hand, then those
>replications make sense as individual observations because their is no
>evidence of lack of independence among subjects within a cluster.

Bruce Weaver

unread,

Oct 15, 2009, 8:40:49 AM10/15/09

to MedStats

On Oct 15, 7:49 am, John Whittington <Joh...@mediscience.co.uk> wrote:
> In the course of discussing this matter with a colleague last night, we
> both eventually came to the conclusion that we're not really sure what
> 'ICC' we are talking about - correlation between what and what? On the
> face of it, it seems as if we could merely be talking about a situation in
> which there was a high level of 'correlation' between treatment and
> response - which obviously would have very different implications from
> those which Brett (and others) are discussing.
>
> Kind Regards,
> John
>

Here's Twisk's definition:

ICC = Var(between) / [Var(between) + Var(within)]

where between and within mean between and within clusters, of course.

John Whittington

unread,

Oct 15, 2009, 10:05:47 AM10/15/09

to meds...@googlegroups.com

At 05:40 15/10/2009 -0700, Bruce Weaver wrote:
>Here's Twisk's definition:
> ICC = Var(between) / [Var(between) + Var(within)]
>where between and within mean between and within clusters, of course.

Well, yes, that much I had assumed. However, in the present context, the
important question is the nature of the variance in question. To be
meaningful in context, it presumably would have to be the variance of the
between-treatments difference; is that what you/Twisk intend?

Kind Regards,

Brett Magill

unread,

Oct 15, 2009, 10:56:48 AM10/15/09

to meds...@googlegroups.com

For those interested, there is a really nice overview of power and
sample size in an article by Tom Snidjers available here...

http://stat.gamma.rug.nl/PowerSampleSizeMultilevel.pdf

Here is a quote from that article.

"In case 2, a level-one variable without between-group variation, the
multilevel design is always more efficient. This efficiency of
within-subject designs is a well-known phenomenon. For estimating a
population mean (case 1) or the effect of a level-two
variable (case 4), on the other hand, the multilevel design always is
less efficient, and more seriously so as the cluster size and the
intraclass correlation are larger" p 5.

He also discusses the case of level-one variables with between cluster
variation and random slopes within clusters, where the issue get more
complicated...So, I'll be the first to admit that I oversimplified in
my previous response. If there is no between group (referring to
clusters, not treatment) variation this is the typical repeated
measures design with increased efficiency. However, when there is a
group effect, then the multilevel structure and intraclass correlation
are important in determining the efficiency of the design and the
effect can be either positive or negative.

Brett

Steve Simon, P.Mean Consulting

unread,

Oct 15, 2009, 2:33:48 PM10/15/09

to meds...@googlegroups.com

It may help to write a model for the data. To simplify things, assume
that the size of each cluster is the same as any of cluster and is also
an even number and that exactly half of the people in each cluster are
randomized to each treatment.

The outcome variable, Y, is a function of an overall mean (MU), a
treatment effect (B), a random effect for each center (C) and an error
term for each subject (E). The subscripts i, j, and k, represent levels
of treatment, center, and patient.

Yijk = MU + Bi + Cj + Eijk.

Also let n represent the number of centers and 2m represent the number
of patients per center.

The estimate of treatment effect is

YBAR1 - YBAR2 = MU + B1 - B2 + SUM jk E1jk/n*m - SUM jk E2jk/n*m.

Note that the variation due to center does not appear in this model
because the subtraction cancels out the center effect. This is true
whether there are 2 subjects per center or 20 subjects per center.

I call this a "random intercept" model since it assumes that the
intercept (MU + Cj) shifts randomly up or down from center to center.

Now that's the model I was considering when I made my earlier comments.
But there is another reasonable model

Yijk = MU + Bi + Cj + Dij + Eijk

where Dij is a "random slope" term, representing the tendency for a
treatment effect to vary from center to center (Bi + Dij).

In the random slope model the estimate of treatment effect includes a
new term

SUM j D1j/n - SUM j D2j/n

Under this model the problems with pseudo replication occur and failure
to account for the "random slope" effect can cause you to underestimate
the required sample size. Note that the variation in this term does not
go down as the number of patients per center increases.

So the concerns of Bruce Weaver and others may be valid depending on
whether you have a "random intercepts" or a "random slopes" model. The
size of the variations in the random slopes compared to variations in
the random intercepts are also critical here.

Does this help clarify things?

--
Steve Simon, Standard Disclaimer

Bruce Weaver

unread,

Oct 15, 2009, 6:18:31 PM10/15/09

to MedStats

On Oct 15, 2:33 pm, "Steve Simon, P.Mean Consulting" <n...@pmean.com>
wrote:

Yes, I think so, Steve. Thanks.

John Whittington

unread,

Oct 16, 2009, 7:47:26 AM10/16/09

to meds...@googlegroups.com

At 13:33 15/10/2009 -0500, Steve Simon, P.Mean Consulting wrote (in very
small part):

>So the concerns of Bruce Weaver and others may be valid depending on
>whether you have a "random intercepts" or a "random slopes" model. The
>size of the variations in the random slopes compared to variations in
>the random intercepts are also critical here.
>Does this help clarify things?

Indeed. I think that helps to crystallise very well some of the concepts
that some of us (i.e. myself!) have been making rather a meal of! Like
you, I had been thinking in terms of what you call the 'random intercepts'
model. I think this also explains the concerns which arose from the
statements about the effect of 'ICC' on power/required sample size - since
the value of ICC, per se, tells us nothing about which of your models one
is dealing with, general statements about the effects of ICC on sample size
requirements surely cannot be made.

Cannot these concepts also be expressed in different, perhaps more
familiar, terms, by reference to interactions? If I'm not mistaken (albeit
I often am!), your 'random intercepts' model is the simplest possible, with
only main terms, whilst your 'random slopes' model includes a
treatment*centre interaction. Standard default methods of estimating
sample size requirements (for testing hypotheses relating to main effects)
assume the absence of interactions and it is hardly surprising that such
calculations under-estimate required sample size in the presence of
interactions between main effects, increasingly so as those interactions
become larger. Is that a correct/reasonable way of looking at it?

Steve Simon, P.Mean Consulting

unread,

Oct 16, 2009, 12:33:04 PM10/16/09

to meds...@googlegroups.com

John Whittington wrote:

> Cannot these concepts also be expressed in different, perhaps more
> familiar, terms, by reference to interactions? If I'm not mistaken (albeit
> I often am!), your 'random intercepts' model is the simplest possible, with
> only main terms, whilst your 'random slopes' model includes a
> treatment*centre interaction. Standard default methods of estimating
> sample size requirements (for testing hypotheses relating to main effects)
> assume the absence of interactions and it is hardly surprising that such
> calculations under-estimate required sample size in the presence of
> interactions between main effects, increasingly so as those interactions
> become larger. Is that a correct/reasonable way of looking at it?

Thanks. I like the term "treatment*centre interaction" better than
"random slopes" (though I do spell "center" slightly differently on this
side of the Atlantic).

Bruce Weaver

unread,

Oct 16, 2009, 2:54:24 PM10/16/09

to MedStats

On Oct 16, 12:33 pm, "Steve Simon, P.Mean Consulting "

This side of the Atlantic, and *that* side of the 49th parallel, I'd
say. Up here in Canada, we go along with the Brits on words like
"centre" and "colour". But like Americans, we use "z" rather than "s"
in words like "randomize". However, we do pronounce it as "zed", not
"zee".

In short, whoever made the crack about "two nations divided by a
common language" was underestimating the number of nations, and the
degree of division. ;-)

Reply all

Reply to author

Forward