We are in the process of designing a trial to compare two groups of
patients seeing either their GP or practice nurse. We are currently
looking at recruiting around 10 practices. The primary outcome
measure is how enabled the patient felt after his/her consultation.
The PI wants to individually randomise patients to intervention or
control. This is not therefore a designed cluster RCT, however, from
the very scant literature, there have been reasonably high ICCs
reported (up to 0.15 at a health professional level) for this outcome,
when used in a cluster design.
What is the consensus about adjusting for the 'natural' clustering at
the design stage? Clearly there are several potential levels of
clustering, but at the moment we are primarily interest in the
practice level (i.e. the "centre"). Some authors advocate reducing
the sample size according to the magnitude of the centre effect,
which, with an ICC of 0.1 or 0.2, could be quite a substantial
reduction. For some reason, I don't feel all that comfortable with
this!
There seems to have been quite a bit of discussion recently about
'therapist' effects in trials but I'm struggling to come up with
something comparable to our proposal.
Siobhan Creanor wrote: > We are in the process of designing a trial to compare two groups of > patients seeing either their GP or practice nurse. We are currently > looking at recruiting around 10 practices. The primary outcome > measure is how enabled the patient felt after his/her consultation. > The PI wants to individually randomise patients to intervention or > control. This is not therefore a designed cluster RCT, however, from > the very scant literature, there have been reasonably high ICCs > reported (up to 0.15 at a health professional level) for this outcome, > when used in a cluster design.
> What is the consensus about adjusting for the 'natural' clustering at > the design stage? Clearly there are several potential levels of > clustering, but at the moment we are primarily interest in the > practice level (i.e. the "centre"). Some authors advocate reducing > the sample size according to the magnitude of the centre effect, > which, with an ICC of 0.1 or 0.2, could be quite a substantial > reduction. For some reason, I don't feel all that comfortable with > this!
A while back I took a course on this topic and the instructor said something very profound. He said that if you have a random effect and fail to account for it, your standard errors will be incorrect. That I already knew, but he took it a bit further. He said that the standard errors for between cluster comparisons would be too big and the standard errors for withing cluster comparisons would be too small. That's very logical if you think about it for a while, but it had never occurred to me to think of it in that way before.
Here's another bit of intuition that might help. Randomizing within each center will effectively make the treatment effect and the center effect orthogonal to one another. If you account for and remove a source of uncertainty due to the center effect, that reduces your noise. Since it is orthogonal, you don't have to worry about any collinearity effects mucking things up.
If you're still uncomfortable, use a very small value for the ICC in your sample size calculations. You could also argue that the truly conservative approach would be to set the ICC to zero. You're guaranteed then to not have a sample size that's too small no matter what the center effect. Being conservative, of course, means that your experiment will cost more, but sometimes the comfort of choosing a conservative approach outweighs the extra expense.
I hope this helps. -- Steve Simon, Standard Disclaimer Free statistics webinar, Wed, Oct 14, 10am CDT. "P-values, confidence intervals, and the Bayesian alternative" Details at www.pmean.com/webinars
> If you're still uncomfortable, use a very small value for the ICC in
> your sample size calculations. You could also argue that the truly
> conservative approach would be to set the ICC to zero. You're guaranteed
> then to not have a sample size that's too small no matter what the
> center effect. Being conservative, of course, means that your experiment
> will cost more, but sometimes the comfort of choosing a conservative
> approach outweighs the extra expense.
> I hope this helps.
I must confess that I am very puzzled by the comments from Steve &
John. According to everything I've read, the higher the ICC, the less
new information contributed by each additional subject within a
cluster. Therefore, as the ICC increases, you need more subjects, not
less. (However, the better approach when the ICC is high is to
increase the number of clusters rather than the number of subjects
within clusters.) So I don't see how using a very small value of the
ICC for the sample size estimate can be viewed as conservative. I
would have said the opposite, i.e., that using a higher value of the
ICC is conservative. (I assume by "conservative", we all mean sure to
provide a large enough sample size estimate.)
>I must confess that I am very puzzled by the comments from Steve & >John. According to everything I've read, the higher the ICC, the less >new information contributed by each additional subject within a >cluster. Therefore, as the ICC increases, you need more subjects, not >less. (However, the better approach when the ICC is high is to >increase the number of clusters rather than the number of subjects >within clusters.) So I don't see how using a very small value of the >ICC for the sample size estimate can be viewed as conservative. I >would have said the opposite, i.e., that using a higher value of the >ICC is conservative. (I assume by "conservative", we all mean sure to >provide a large enough sample size estimate.)
I might be digging holes for myself to fall into, but:
1...I was thinking/writing in terms of the total sample size, and did not address the (important) question of the ideal number of 'clusters' and distribution of that total sample size between the 'clusters'.
2..I am finding it difficult to get my head around the generality of the conceptual suggestion that if each additional subject within some group adds 'less new information' that one consequently needs larger sample sizes. I would have thought that one of the most common situations in which 'each additional subject adds little new information' (in relation to the estimate of an effect) arises when the effect of interest has a low variance - and it obviously would be the antithesis of the truth to suggest that decreasing variability of the effect requires an increasing sample size.
I suspect that I am probably missing something!
Kind Regards,
John
---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk Buckingham MK18 4EL, UK ----------------------------------------------------------------
> >I must confess that I am very puzzled by the comments from Steve &
> >John. According to everything I've read, the higher the ICC, the less
> >new information contributed by each additional subject within a
> >cluster. Therefore, as the ICC increases, you need more subjects, not
> >less. (However, the better approach when the ICC is high is to
> >increase the number of clusters rather than the number of subjects
> >within clusters.) So I don't see how using a very small value of the
> >ICC for the sample size estimate can be viewed as conservative. I
> >would have said the opposite, i.e., that using a higher value of the
> >ICC is conservative. (I assume by "conservative", we all mean sure to
> >provide a large enough sample size estimate.)
> I might be digging holes for myself to fall into, but:
> 1...I was thinking/writing in terms of the total sample size, and did not
> address the (important) question of the ideal number of 'clusters' and
> distribution of that total sample size between the 'clusters'.
> 2..I am finding it difficult to get my head around the generality of the
> conceptual suggestion that if each additional subject within some group
> adds 'less new information' that one consequently needs larger sample
> sizes. I would have thought that one of the most common situations in
> which 'each additional subject adds little new information' (in relation to
> the estimate of an effect) arises when the effect of interest has a low
> variance - and it obviously would be the antithesis of the truth to suggest
> that decreasing variability of the effect requires an increasing sample size.
> I suspect that I am probably missing something!
> Kind Regards,
> John
John, I found one of the places where I read about each additional
subject contributing less information when the ICC > 0. It's in
"Applied Multilevel Analysis", by Jos Twisk. I was about to start
typing an excerpt, but then though better of it and checked to see if
the relevant pages are visible in Google Books. It turns out they
are. See pages 127-128.
>John, I found one of the places where I read about each additional >subject contributing less information when the ICC > 0. It's in >"Applied Multilevel Analysis", by Jos Twisk. I was about to start >typing an excerpt, but then though better of it and checked to see if >the relevant pages are visible in Google Books. It turns out they >are. See pages 127-128.
Thanks, Bruce - but, unfortunately, I am unable to see pages 39-137 on Google Books.
I would point out that I am not disputing that, in some senses (in relation to location), higher correlation means that each additional subject 'contributes less information'; it is the suggested link between this and sample size requirements with which I'm having a bit of a problem.
Kind Regards,
John
---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk Buckingham MK18 4EL, UK ----------------------------------------------------------------
> >John, I found one of the places where I read about each additional
> >subject contributing less information when the ICC > 0. It's in
> >"Applied Multilevel Analysis", by Jos Twisk. I was about to start
> >typing an excerpt, but then though better of it and checked to see if
> >the relevant pages are visible in Google Books. It turns out they
> >are. See pages 127-128.
> Thanks, Bruce - but, unfortunately, I am unable to see pages 39-137 on
> Google Books.
> I would point out that I am not disputing that, in some senses (in relation
> to location), higher correlation means that each additional subject
> 'contributes less information'; it is the suggested link between this and
> sample size requirements with which I'm having a bit of a problem.
> Kind Regards,
> John
How about that, I had trouble seeing it again now too, but did manage
to get it back after some fiddling around. I'll upload those two
pages to the Files section for the group.
This is really an issue of non-independence that results from grouping. The ICC is a measure of that independence. If the ICC is high, then subjects within each cluster are more similar to one another--that is they share the influence of cluster on the outcome of interest. They are not independent. (And thus the aforementioned impact on SEs.)
If the ICC is perfect, then there is perfect dependence among responses within clusters--even though the subjects are "different" we can't count the responses as different--lowering the effective sample size to the N of clusters. If the ICC is 0, then even though these subjects are clustered, their responses are independent and we could count each response individually. This is not unlike the issue of pseudo replication in experiments--e.g. repeated measurements that are not accounted for.
Brett
On Wed, Oct 14, 2009 at 11:28 AM, John Whittington
<Joh...@mediscience.co.uk> wrote: > I would point out that I am not disputing that, in some senses (in relation > to location), higher correlation means that each additional subject > 'contributes less information'; it is the suggested link between this and > sample size requirements with which I'm having a bit of a problem.
Bruce Weaver wrote: > I must confess that I am very puzzled by the comments from Steve & > John. According to everything I've read, the higher the ICC, the less > new information contributed by each additional subject within a > cluster. Therefore, as the ICC increases, you need more subjects, not > less. (However, the better approach when the ICC is high is to > increase the number of clusters rather than the number of subjects > within clusters.) So I don't see how using a very small value of the > ICC for the sample size estimate can be viewed as conservative. I > would have said the opposite, i.e., that using a higher value of the > ICC is conservative. (I assume by "conservative", we all mean sure to > provide a large enough sample size estimate.)
It is counter intuitive. But you're thinking of a cluster randomized trial where all the patients in a cluster receive the same treatment. In an individually randomized trial, there are both treatments within a cluster. The contrast within a cluster removes some of the shared variation, and the amount of variation removed is larger when the ICC is larger.
If it helps, think about clusters of size 2, then the test statistic is the paired t-test, which requires fewer patients as the ICC increases. -- Steve Simon, Standard Disclaimer Sign up for The Monthly Mean at www.pmean.com/news
On Wed, Oct 14, 2009 at 1:51 PM, Steve Simon, P.Mean Consulting
<n...@pmean.com> wrote: > It is counter intuitive. But you're thinking of a cluster randomized > trial where all the patients in a cluster receive the same treatment. In > an individually randomized trial, there are both treatments within a > cluster. The contrast within a cluster removes some of the shared > variation, and the amount of variation removed is larger when the ICC is > larger.
> If it helps, think about clusters of size 2, then the test statistic is > the paired t-test, which requires fewer patients as the ICC increases.
Steve,
The problem is hidden in your cluster size 2 example, unfortunately. Ostensibly, those 2 within cluster subjects would be randomized to two different treatments. That's great. Now, take that same pair (if the ICC is 1, that pair represents the cluster well) and measure them 5, 10, 50 times. Is your sample size really increasing within that cluster. No. That is pseudo-replication and is analogous to the problem with clustering when the subjects are homogeneous due to non-independence. If the ICC is low on the other hand, then those replications make sense as individual observations because their is no evidence of lack of independence among subjects within a cluster.
> Bruce Weaver wrote:
> > I must confess that I am very puzzled by the comments from Steve &
> > John. According to everything I've read, the higher the ICC, the less
> > new information contributed by each additional subject within a
> > cluster. Therefore, as the ICC increases, you need more subjects, not
> > less. (However, the better approach when the ICC is high is to
> > increase the number of clusters rather than the number of subjects
> > within clusters.) So I don't see how using a very small value of the
> > ICC for the sample size estimate can be viewed as conservative. I
> > would have said the opposite, i.e., that using a higher value of the
> > ICC is conservative. (I assume by "conservative", we all mean sure to
> > provide a large enough sample size estimate.)
> It is counter intuitive. But you're thinking of a cluster randomized
> trial where all the patients in a cluster receive the same treatment. In
> an individually randomized trial, there are both treatments within a
> cluster. The contrast within a cluster removes some of the shared
> variation, and the amount of variation removed is larger when the ICC is
> larger.
The passage from Twisk (2006) that I uploaded to the Files page is not
talking about cluster randomisation. On a later page that was not
visible in Google Books, Twisk says that for such studies (i.e.,
without cluster randomisation), the conservative method overestimates
the needed sample size, and the liberal procedure underestimates it.
He then gives another example WITH cluster randomisation, and says,
"the 'conservative' procedure is almost perfect, while the 'liberal'
procedure leads to a huge underestimation of the required sample
size" (Twisk 2006, p. 130).
>The passage from Twisk (2006) that I uploaded to the Files page is not >talking about cluster randomisation. On a later page that was not >visible in Google Books, Twisk says that for such studies (i.e., >without cluster randomisation), the conservative method overestimates >the needed sample size, and the liberal procedure underestimates it.
Is that not what Steve and I (and the original poster) have been saying - or, at least, implying?
>He then gives another example WITH cluster randomisation, and says, >"the 'conservative' procedure is almost perfect, while the 'liberal' >procedure leads to a huge underestimation of the required sample >size" (Twisk 2006, p. 130).
Depending upon what is being regarded as the 'liberal' procedure, that sounds reasonable, but it was not the situation which we (at least, not Steve and I) were considering.
Kind Regards,
John
---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk Buckingham MK18 4EL, UK ----------------------------------------------------------------
In the course of discussing this matter with a colleague last night, we both eventually came to the conclusion that we're not really sure what 'ICC' we are talking about - correlation between what and what? On the face of it, it seems as if we could merely be talking about a situation in which there was a high level of 'correlation' between treatment and response - which obviously would have very different implications from those which Brett (and others) are discussing.
Kind Regards, John
At 14:24 14/10/2009 -0500, Brett Magill wrote:
>Steve,
>The problem is hidden in your cluster size 2 example, unfortunately. >Ostensibly, those 2 within cluster subjects would be randomized to two >different treatments. That's great. Now, take that same pair (if the >ICC is 1, that pair represents the cluster well) and measure them 5, >10, 50 times. Is your sample size really increasing within that >cluster. No. That is pseudo-replication and is analogous to the >problem with clustering when the subjects are homogeneous due to >non-independence. If the ICC is low on the other hand, then those >replications make sense as individual observations because their is no >evidence of lack of independence among subjects within a cluster.
John
---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk Buckingham MK18 4EL, UK ----------------------------------------------------------------
On Oct 15, 7:49 am, John Whittington <Joh...@mediscience.co.uk> wrote:
> In the course of discussing this matter with a colleague last night, we
> both eventually came to the conclusion that we're not really sure what
> 'ICC' we are talking about - correlation between what and what? On the
> face of it, it seems as if we could merely be talking about a situation in
> which there was a high level of 'correlation' between treatment and
> response - which obviously would have very different implications from
> those which Brett (and others) are discussing.
> Kind Regards,
> John
Here's Twisk's definition:
ICC = Var(between) / [Var(between) + Var(within)]
where between and within mean between and within clusters, of course.
>Here's Twisk's definition: > ICC = Var(between) / [Var(between) + Var(within)] >where between and within mean between and within clusters, of course.
Well, yes, that much I had assumed. However, in the present context, the important question is the nature of the variance in question. To be meaningful in context, it presumably would have to be the variance of the between-treatments difference; is that what you/Twisk intend?
Kind Regards,
John
---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk Buckingham MK18 4EL, UK ----------------------------------------------------------------
"In case 2, a level-one variable without between-group variation, the multilevel design is always more efficient. This efficiency of within-subject designs is a well-known phenomenon. For estimating a population mean (case 1) or the effect of a level-two variable (case 4), on the other hand, the multilevel design always is less efficient, and more seriously so as the cluster size and the intraclass correlation are larger" p 5.
He also discusses the case of level-one variables with between cluster variation and random slopes within clusters, where the issue get more complicated...So, I'll be the first to admit that I oversimplified in my previous response. If there is no between group (referring to clusters, not treatment) variation this is the typical repeated measures design with increased efficiency. However, when there is a group effect, then the multilevel structure and intraclass correlation are important in determining the efficiency of the design and the effect can be either positive or negative.
> At 05:40 15/10/2009 -0700, Bruce Weaver wrote: >>Here's Twisk's definition: >> ICC = Var(between) / [Var(between) + Var(within)] >>where between and within mean between and within clusters, of course.
> Well, yes, that much I had assumed. However, in the present context, the > important question is the nature of the variance in question. To be > meaningful in context, it presumably would have to be the variance of the > between-treatments difference; is that what you/Twisk intend?
It may help to write a model for the data. To simplify things, assume that the size of each cluster is the same as any of cluster and is also an even number and that exactly half of the people in each cluster are randomized to each treatment.
The outcome variable, Y, is a function of an overall mean (MU), a treatment effect (B), a random effect for each center (C) and an error term for each subject (E). The subscripts i, j, and k, represent levels of treatment, center, and patient.
Yijk = MU + Bi + Cj + Eijk.
Also let n represent the number of centers and 2m represent the number of patients per center.
The estimate of treatment effect is
YBAR1 - YBAR2 = MU + B1 - B2 + SUM jk E1jk/n*m - SUM jk E2jk/n*m.
Note that the variation due to center does not appear in this model because the subtraction cancels out the center effect. This is true whether there are 2 subjects per center or 20 subjects per center.
I call this a "random intercept" model since it assumes that the intercept (MU + Cj) shifts randomly up or down from center to center.
Now that's the model I was considering when I made my earlier comments. But there is another reasonable model
Yijk = MU + Bi + Cj + Dij + Eijk
where Dij is a "random slope" term, representing the tendency for a treatment effect to vary from center to center (Bi + Dij).
In the random slope model the estimate of treatment effect includes a new term
SUM j D1j/n - SUM j D2j/n
Under this model the problems with pseudo replication occur and failure to account for the "random slope" effect can cause you to underestimate the required sample size. Note that the variation in this term does not go down as the number of patients per center increases.
So the concerns of Bruce Weaver and others may be valid depending on whether you have a "random intercepts" or a "random slopes" model. The size of the variations in the random slopes compared to variations in the random intercepts are also critical here.
Does this help clarify things? -- Steve Simon, Standard Disclaimer Free statistics webinar, Wed, Oct 14, 10am CDT. "P-values, confidence intervals, and the Bayesian alternative" Details at www.pmean.com/webinars
> It may help to write a model for the data. To simplify things, assume
> that the size of each cluster is the same as any of cluster and is also
> an even number and that exactly half of the people in each cluster are
> randomized to each treatment.
> The outcome variable, Y, is a function of an overall mean (MU), a
> treatment effect (B), a random effect for each center (C) and an error
> term for each subject (E). The subscripts i, j, and k, represent levels
> of treatment, center, and patient.
> Yijk = MU + Bi + Cj + Eijk.
> Also let n represent the number of centers and 2m represent the number
> of patients per center.
> The estimate of treatment effect is
> YBAR1 - YBAR2 = MU + B1 - B2 + SUM jk E1jk/n*m - SUM jk E2jk/n*m.
> Note that the variation due to center does not appear in this model
> because the subtraction cancels out the center effect. This is true
> whether there are 2 subjects per center or 20 subjects per center.
> I call this a "random intercept" model since it assumes that the
> intercept (MU + Cj) shifts randomly up or down from center to center.
> Now that's the model I was considering when I made my earlier comments.
> But there is another reasonable model
> Yijk = MU + Bi + Cj + Dij + Eijk
> where Dij is a "random slope" term, representing the tendency for a
> treatment effect to vary from center to center (Bi + Dij).
> In the random slope model the estimate of treatment effect includes a
> new term
> SUM j D1j/n - SUM j D2j/n
> Under this model the problems with pseudo replication occur and failure
> to account for the "random slope" effect can cause you to underestimate
> the required sample size. Note that the variation in this term does not
> go down as the number of patients per center increases.
> So the concerns of Bruce Weaver and others may be valid depending on
> whether you have a "random intercepts" or a "random slopes" model. The
> size of the variations in the random slopes compared to variations in
> the random intercepts are also critical here.
At 13:33 15/10/2009 -0500, Steve Simon, P.Mean Consulting wrote (in very small part):
>So the concerns of Bruce Weaver and others may be valid depending on >whether you have a "random intercepts" or a "random slopes" model. The >size of the variations in the random slopes compared to variations in >the random intercepts are also critical here. >Does this help clarify things?
Indeed. I think that helps to crystallise very well some of the concepts that some of us (i.e. myself!) have been making rather a meal of! Like you, I had been thinking in terms of what you call the 'random intercepts' model. I think this also explains the concerns which arose from the statements about the effect of 'ICC' on power/required sample size - since the value of ICC, per se, tells us nothing about which of your models one is dealing with, general statements about the effects of ICC on sample size requirements surely cannot be made.
Cannot these concepts also be expressed in different, perhaps more familiar, terms, by reference to interactions? If I'm not mistaken (albeit I often am!), your 'random intercepts' model is the simplest possible, with only main terms, whilst your 'random slopes' model includes a treatment*centre interaction. Standard default methods of estimating sample size requirements (for testing hypotheses relating to main effects) assume the absence of interactions and it is hardly surprising that such calculations under-estimate required sample size in the presence of interactions between main effects, increasingly so as those interactions become larger. Is that a correct/reasonable way of looking at it?
Kind Regards,
John
---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk Buckingham MK18 4EL, UK ----------------------------------------------------------------
John Whittington wrote: > Cannot these concepts also be expressed in different, perhaps more > familiar, terms, by reference to interactions? If I'm not mistaken (albeit > I often am!), your 'random intercepts' model is the simplest possible, with > only main terms, whilst your 'random slopes' model includes a > treatment*centre interaction. Standard default methods of estimating > sample size requirements (for testing hypotheses relating to main effects) > assume the absence of interactions and it is hardly surprising that such > calculations under-estimate required sample size in the presence of > interactions between main effects, increasingly so as those interactions > become larger. Is that a correct/reasonable way of looking at it?
Thanks. I like the term "treatment*centre interaction" better than "random slopes" (though I do spell "center" slightly differently on this side of the Atlantic). -- Steve Simon, Standard Disclaimer Sign up for The Monthly Mean at www.pmean.com/news
<n...@pmean.com> wrote:
> John Whittington wrote:
> > Cannot these concepts also be expressed in different, perhaps more
> > familiar, terms, by reference to interactions? If I'm not mistaken (albeit
> > I often am!), your 'random intercepts' model is the simplest possible, with
> > only main terms, whilst your 'random slopes' model includes a
> > treatment*centre interaction. Standard default methods of estimating
> > sample size requirements (for testing hypotheses relating to main effects)
> > assume the absence of interactions and it is hardly surprising that such
> > calculations under-estimate required sample size in the presence of
> > interactions between main effects, increasingly so as those interactions
> > become larger. Is that a correct/reasonable way of looking at it?
> Thanks. I like the term "treatment*centre interaction" better than
> "random slopes" (though I do spell "center" slightly differently on this
> side of the Atlantic).
This side of the Atlantic, and *that* side of the 49th parallel, I'd
say. Up here in Canada, we go along with the Brits on words like
"centre" and "colour". But like Americans, we use "z" rather than "s"
in words like "randomize". However, we do pronounce it as "zed", not
"zee".
In short, whoever made the crack about "two nations divided by a
common language" was underestimating the number of nations, and the
degree of division. ;-)