Nearly every protocol I review these days has a statement (under
sample size) that goes something like, "The required sample size is n
but we will recruit N to account for drop-outs."
It seems to me that this approach can well lead to data not missing at
random and biased treatment effect estimates.
So, what do others think of this "sample size inflation policy" that
seems quite common?
Kevin
Hi Kevin. Does the bias result from recruiting N rather than n, or
from doing a complete-case analysis (aka list-wise deletion)?
Bruce
I would say the latter, since it seems this practice is
to "protect" against missing data in the primary outcome.
I'm not bothered as much by drop-outs if the primary outcome
is collected. In the ITT analysis the treatment effect
could be attenuated, so perhaps a sample size calculation
should be based on the attenuated treatment effect.
Kevin
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
>>> "Kevin E. Thorpe" <kevin....@utoronto.ca> 10/31/2007 10:02 AM >>>
Kevin
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
I personally think it is highly desirable, and is nothing more than a
reflection of reality. If one's calculations estimate that an analysis of
data from n subjects would be required to achieve adequate power to render
the study scientifically adequate, to recruit only n patient when one 'knew
jolly well' that one was probably going end up with usable data on, say,
only 80% of them would, to my mind, not only be stupid but (per recent
discussions here) probably also unethical - since one would be undertaking
a study which one had very good reason to believe was going to end up being
statistically inadequate. That argument is at its strongest when, as is
often the case, drop-outs are (and can be predicted to be) primarily for
non-treatment-related reasons - but I personally think it remains
essentially valid in most situations.
Provided one DOES recruit the full "N" (the inflated version of n), I can
see no scope for any bias. After all, the (a priori) choice of sample size
is in many senses 'arbitrary', even if guided by estimations of the sample
size needed to achieve a desired level of power. Particularly if it is an
'ITT' analysis being undertaken, there are obviously decisions to be made
about what do about patients without usable data - but that's a whole
different issue.
The NON-decent way of doing this (which, fortunately, one doesn't see so
much these days) is a proposal to keep on recruiting subjects until one
actually 'just achieves' the target number of 'n' subjects with usable
data. Then there could well be scope for bias.
That's how I see it, anyway!
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------
> Nearly every protocol I review these days has a statement (under
> sample size) that goes something like, "The required sample size is n
> but we will recruit N to account for drop-outs."
>
> It seems to me that this approach can well lead to data not missing at
> random and biased treatment effect estimates.
>
> So, what do others think of this "sample size inflation policy" that
> seems quite common?
So if they didn't include such a calculation, would that prevent
patients from dropping out? It seems to me that these are independent
issues. If you recruit 100 patients and 80 complete the study, the bias
is no different than if you recruit 125 patients and 100 complete the
study.
If patients drop out for a reason associated with their general
prognosis then that has to be addressed whether you account for the
dropout rate in your sample size calculation or not.
By the way, I'm working on research tools for planning and monitoring
the accrual process in clinical trials. Too many researchers plan to
recruit 100 patients in a year, but two years after the start of the
study, they only have a dozen patients in the trial. They overpromise
and underdeliver on sample size/completion date of the study. The work
is very preliminary, but I have some relevant weblog entries on this
topic summarized at
www.childrensmercy.org/stats/category/AccrualProblems.asp
In particular, I want to develop a Bayesian model that accounts for
accrual in the presence of dropouts.
www.childrensmercy.org/stats/weblog2007/AlternateAccrual.asp
Steve Simon, ssi...@cmh.edu, Standard Disclaimer
My niece (Bachelor's degree in Chemical Engineering
from Carnegie Mellon) is looking for a job. Any help
or leads you can provide would be greatly appreciated.
Evidence Based Medicine gives my book 4/4.5 stars out of five!
Full text is at http://ebm.bmj.com/cgi/content/full/12/2/59
My work is mainly in RCTs. Where I would worry about bias
problems is if the drop-out is treatment related and results
in missing outcome data.
Statistical inference (certainly from a frequentist perspective)
is predicated on the idea of random sampling. In most clinical
research, consecutive patients are enrolled, so the samples
are (very likely) not random. The randomisation in a RCT
creates, in effect, a random sample of patients on each
treatment. Treatment related drop-outs leading to missing
outcome data leaves you, possibly, with a non-random
sample of patients for analysis. So, how valid is statistical
inference on such a subset?
In reality, I know there will be missing outcome data in
a trial. I just think that every effort should be made
to obtain complete data, irrespective of treatment drop-out.
Maybe I'm being pessimistic, and I have no desire to offend
anyone, but the "blanket" increase the sample size approach
appears to me to be an attempt to take the easy way out.
Kevin
_____________________________________________________
Doug Altman
Professor of Statistics in Medicine
Centre for Statistics in Medicine
Wolfson College Annexe
Linton Road
Oxford OX2 6UD
email: doug....@cancer.org.uk
Tel: 01865 284400 (direct line
01865 284401)
Fax: 01865 284424
www:
http://www.csm-oxford.org.uk/
EQUATOR Network - resources for reporting research
www:
http://www.equator-network.org/
Totally agreed - treatment-related dropouts present a major problem in
terms of inference (or estimation), and there''s no really ideal solution
to the problem, but that's a totally different issue from the one which.
Indeed, even dropouts which are UNrelated to treatment (but maybe related
to some other common factor) can also present problems in relation to
inference, at least in terms of the generalisability of the results, since
they can effectively modify/restrict the population from which the random
sample of completed patients has been drawn.
However, those issues are totally unrelated to sample size, let alone how
the sample size was decided upon.
>In reality, I know there will be missing outcome data in
>a trial. I just think that every effort should be made
>to obtain complete data, irrespective of treatment drop-out.
I'm not sure what that means. If a patient drops out of a trial before
reaching the point at which complete data has been generated, then no
amount of 'effort' will be able to make that data complete.
>Maybe I'm being pessimistic, and I have no desire to offend
>anyone, but the "blanket" increase the sample size approach
>appears to me to be an attempt to take the easy way out.
I remain confused. Like others have said, you seem to be talking about at
least two totally unrelated issues. No matter how one decides what sample
size to use (whether by guesswork, copying others, calculation,
'calculation plus 10%', or whatever) the problems related to
treatment-related dropouts (and even treatment-unrelated dropouts) will be
exactly the same. You say that you feel it is an 'easy way out', but an
easy way out of what? What do you feel is the (presumably 'more
difficult') alternative course one should be following?
>I agree that we should be concerned about the possibility that allowing
>for, say, 15% drop outs leads trialists to be more lax about trying to
>ensure the most complete follow up possible. I suppose this may happen
>sometimes, although I am not aware of such a case.
Nor me - although I'm not at all sure how one would ever know that one WAS
observing 'such a case', anyway. All one can know (if that!) is how
diligent the investigator was in attempting to ensure complete follow-up
with the trial 'as it was'; there is no way one can know how diligent
(s)he would have been had the sample size been estimated differently.
In any event, in RCTs of treatments, my experience is that 'ensuring the
most complete follow-up is generally not a problem. Apart from those
patients who 'just disappear' and/or refuse to be followed-up, patients who
drop out of trials usually ARE followed up to the extent that is
useful. The main problem, which no-one can do anything about, is not the
lack of 'follow-up' but, rather, the fact that 'dropped out patients' are
those who have prematurely ceased taking/using their randomised
treatment. If one is comparing the effects of two drugs over a 12-month
period and a patient 'drops out' and stops taking their trial medication
after two weeks, no amount fo 'follow-up' is going to generated further
usefulk information about the treatments.
>Incomplete outcome data usually make it impossible to perform a true
>intention to treat analysis (which fully preserves the randomisation)
>unless one imputes the missing data. Increasingly imputation is being done
>in trials as concerns over the assumptions of that analysis may be
>exceeded by concerns over bias associated with a complete case analysis.
What drug regulatory authorities increasingly seem to expect to see is
BOTH, with a discussion (maybe even a 'sensitivity analysis') of the
respective results of the two approaches. I would think that, the more
that is done, the more we are likely to learn about the relative behaviours
and merits of the two approaches - given always that neither approach is
ever going to be ideal.
Kindest Regards,
On Oct 31, 12:45 pm, John Whittington <Joh...@mediscience.co.uk>
wrote:
> At 09:17 31/10/2007 -0700, Kevin E. Thorpe wrote (in part):
>
> >In reality, I know there will be missing outcome data in
> >a trial. I just think that every effort should be made
> >to obtain complete data, irrespective of treatment drop-out.
>
> I'm not sure what that means. If a patient drops out of a trial before
> reaching the point at which complete data has been generated, then no
> amount of 'effort' will be able to make that data complete.
>
I see treatment refusal or early treatment termination as
distinct from patient follow-up. In discussions I've had
with researchers, they often equate treatment termination
with trial termination. I am advocating that you continue
to collect data (as per protocol schedule if possible)
even for patients who stop treatment.
> >Maybe I'm being pessimistic, and I have no desire to offend
> >anyone, but the "blanket" increase the sample size approach
> >appears to me to be an attempt to take the easy way out.
>
> I remain confused. Like others have said, you seem to be talking about at
> least two totally unrelated issues. No matter how one decides what sample
> size to use (whether by guesswork, copying others, calculation,
> 'calculation plus 10%', or whatever) the problems related to
> treatment-related dropouts (and even treatment-unrelated dropouts) will be
> exactly the same. You say that you feel it is an 'easy way out', but an
> easy way out of what? What do you feel is the (presumably 'more
> difficult') alternative course one should be following?
>
I see that by inflating recruitment target so that after
drop-outs you have the "desired" sample size, there is a
risk of not questioning the validity of the inference,
just because you achieved your "sample size."
Increasing the recruitment target is "easy" compared to
the effort of obtaining a complete a data set as possible.
Kevin
>I see treatment refusal or early treatment termination as
>distinct from patient follow-up. In discussions I've had
>with researchers, they often equate treatment termination
>with trial termination.
You are using potentially confusing terminology here - 'trial termination'
is usually taken to mean abandonment of the entire trial, whereas I
think/presume that you are talking about termination of an individual
subject's participation (which is usually called 'withdrawal').
However, it is indeed true that treatment termination often does
more-or-less equate with termination of a subject's participation in a
trial ('withdrawal') - either for practical or commonsense reasons. For a
start, in the case of subjects you mention who 'refuse treatment', they
will generally also 'refuse follow-up'.
>I am advocating that you continue
>to collect data (as per protocol schedule if possible)
>even for patients who stop treatment.
Now we are onto the 'commonsense' stuff, which I agree conflicts with the
concept of 'strict ITT' analyses. For example, in my opinion, in a study
comparing treatments A & B, very little other than obsessive compliance
with a strict 'ITT' concept would benefit from continuing to assess a
subject 'per protocol' for months and months after (s)he had stopped taking
either A or B. Indeed, if the assessments are intrusive, and certainly if
they carry any appreciable risk, it could even be regard as unethical to
continue them after a subject had stopped receiving the study treatment.
>I see that by inflating recruitment target so that after
>drop-outs you have the "desired" sample size, there is a
>risk of not questioning the validity of the inference,
>just because you achieved your "sample size."
As I said before, these are surely two totally separate issues. It is the
'incomplete data' which leads people to question the validity of the
inference, regardless of how the sample size was determined.
It sounds as if you may regard an 'estimated sample size requirement' as
being something far more magical and concrete than it is. We are arguing
about the desirability of adding maybe 10% or 20% onto an estimate 'to
allow for dropouts'. As you will be aware, those estimations of sample
size are often based on very tenuous guesstimates of variability,
essentially arbitrary decisions as to what magnitude of effect one wants
the study to have adequate power to detect (ideally the 'minimum clinically
relevant effect') and the, again arbitrary, decision as to what level of
power (80%, 90%, 95% or whatever) to ask for. Given that situation, in
very many cases I could produce perfectly defensible estimates of required
sample size which differed by more than a factor or 2 - probably far more
than a factor of 2 in many cases. Looked at in that context, what
difference does it make to one's 'questioning of the validity of inference'
whether one does, or does not, add on 10% or 20% to the result of whatever
calculation one has undertaken?
>Increasing the recruitment target is "easy" compared to
>the effort of obtaining a complete a data set as possible.
One needs to thing about the quality/value of data, as well as it's
'completeness'; an incomplete set of good data might well contain more
information than a complete set of data that contained a high proportion of
rubbish/noise! Except in the eyes of 'ITT purists', I cannot see the
merit of 'obtaining a complete set of data', if much of that set of data
does not actually relate to treatments (or whatever) being compared. In
practice, it's often even worse than that, since it will usually be the
case that not only does the data not relate to any of the study treatments,
but the 'withdrawn' subject will generally be given other effective
treatments which would (very reasonably) be prohibited for patients still
in the study.
To illustrate what I regard as one of the lunacies of 'obsessive ITT'
(which I am increasingly coming to think is what you are essentially
talking about), consider an RCT designed to compare the efficacy of new
drug X with established drug A (A being one of a group of A,B,C,D...., all
of which are known to be similarly effective). Consider the situation in
which drug X is actually useless. Most of the patients randomised to drug
X drop out early, because of the inadequate efficacy. Since they suffer
from a condition that needs to be treated, after stopping trial medication
those drop-outs are all start receiving treatment with A, B, C, D or
whatever. In your quest for 'as complete data as possible' you continue
assessing all of those drop-outs, maybe for months, 'per the protocol
schedule'. You then conduct an ITT analysis of that 'complete data' and,
hey presto, show that X is about as good as the established effective
treatment A (and B, C, D...), even though X is, in reality, totally
useless. I'm not sure that one could call that 'bias', but it's certainly
'just plain wrong'!
On Oct 31, 2:50 pm, John Whittington <Joh...@mediscience.co.uk> wrote:
> At 10:40 31/10/2007 -0700, Kevin E. Thorpe wrote (in part):
>
> >I see treatment refusal or early treatment termination as
> >distinct from patient follow-up. In discussions I've had
> >with researchers, they often equate treatment termination
> >with trial termination.
>
> You are using potentially confusing terminology here - 'trial termination'
> is usually taken to mean abandonment of the entire trial, whereas I
> think/presume that you are talking about termination of an individual
> subject's participation (which is usually called 'withdrawal').
Yes, poor choice of words on my part today.
> However, it is indeed true that treatment termination often does
> more-or-less equate with termination of a subject's participation in a
> trial ('withdrawal') - either for practical or commonsense reasons. For a
> start, in the case of subjects you mention who 'refuse treatment', they
> will generally also 'refuse follow-up'.
>
> >I am advocating that you continue
> >to collect data (as per protocol schedule if possible)
> >even for patients who stop treatment.
>
> Now we are onto the 'commonsense' stuff, which I agree conflicts with the
> concept of 'strict ITT' analyses. For example, in my opinion, in a study
> comparing treatments A & B, very little other than obsessive compliance
> with a strict 'ITT' concept would benefit from continuing to assess a
> subject 'per protocol' for months and months after (s)he had stopped taking
> either A or B. Indeed, if the assessments are intrusive, and certainly if
> they carry any appreciable risk, it could even be regard as unethical to
> continue them after a subject had stopped receiving the study treatment.
Good point. I did say if possible, but in the examples
you raise, I would say they are examples of where it was
not possible or advisable.
> >I see that by inflating recruitment target so that after
> >drop-outs you have the "desired" sample size, there is a
> >risk of not questioning the validity of the inference,
> >just because you achieved your "sample size."
>
> As I said before, these are surely two totally separate issues. It is the
> 'incomplete data' which leads people to question the validity of the
> inference, regardless of how the sample size was determined.
Yes. But my point is by inflating the the target you may be
setting yourself up from the start to have incomplete data
that could call the validity into question.
> It sounds as if you may regard an 'estimated sample size requirement' as
> being something far more magical and concrete than it is. We are arguing
I certainly do not regard the estimated sample size as
"magical." I know only too well the are based on rather
arbitrary parameters and insufficient data in the first
place. I'm surprised they work at all. :-)
True. However, in your example, patients quit X due to
inadequate efficacy. If adequate efficacy were your outcome,
those who quit X have the primary outcome measured, failure
in this case. No problem.
In the end, my point is that we should try to think of
creative ways to design trials that maximise their validity
that balance the realties of treating patients.
> > Now we are onto the 'commonsense' stuff, which I agree conflicts with the
> > concept of 'strict ITT' analyses. For example, in my opinion, in a study
> > comparing treatments A & B, very little other than obsessive compliance
> > with a strict 'ITT' concept would benefit from continuing to assess a
> > subject 'per protocol' for months and months after (s)he had stopped taking
> > either A or B. Indeed, if the assessments are intrusive, and certainly if
> > they carry any appreciable risk, it could even be regard as unethical to
> > continue them after a subject had stopped receiving the study treatment.
>
>Good point. I did say if possible, but in the examples
>you raise, I would say they are examples of where it was
>not possible or advisable.
...but there was nothing 'special' about my examples. I would have thought
that what I was saying applies to the large majority of all RCTs, certainly
the majority of those I've been involved with over the decades. maybe you
have a particular study, or type of study, in mind?
>Yes. But my point is by inflating the the target you may be
>setting yourself up from the start to have incomplete data
>that could call the validity into question.
I still don't really follow your logic. As I said before, I could just as
easily 'inflate the target' (sample size) by (totally arbitrarily)
specifying a higher level of desired power. Would you say that would also
'set oneself up to have incomplete data'?
Furthermore, a cautious study designer will tend to base his/her sample
size estimates on 'worst case scenarious' - e.g., given a range of values
for variability from historical studies, will tend to design on the basis
of the greatest observed variability, rather than some sort of average,
which would be closer to the 'best bet estimate'. Would you also call that
'inflating the target'. I would call it 'sensible caution', since there
are no prizes (yet many potential penalties!) for undertaking a study which
proves to have been underpowered.
>I certainly do not regard the estimated sample size as
>"magical." I know only too well the are based on rather
>arbitrary parameters and insufficient data in the first
>place. I'm surprised they work at all. :-)
Quite. So why is it so much worse to use, say, 110% of an iffy estimate
'based on rather arbitrary parameters' than 100% of that estimate? In the
real world, a figure 'inflated' by such modest 'mark up' is probably just
as likely to be close to the 'true' sample size requirement (if one had
perfect data on which to base it) as is the 'uninflated' figure.
However, getting more 'back to basics' your premise seems to be that there
is something very wrong about undertaking a study with a sample size
greater than the MINIMUM sample size that one has estimated to be
necessary. Indeed, you even refer to it as a 'target'. In reality, I
think one really should regard such estimates as estimates of the MINIMUM
sample size that would be expected to be adequate, given the specified
paraemeters. Within reason/limits, there are few statistical/scientific
downsides (only practical ones of time, effort, money, subject availability
etc. - and sometimes ethical considerations), yet plenty of upsides, of
undertaking studies which are appreciably larger than that 'minimum estimate'.
>True. However, in your example, patients quit X due to
>inadequate efficacy. If adequate efficacy were your outcome,
>those who quit X have the primary outcome measured, failure
>in this case. No problem.
You really can't have it all ways! I can't square that statement with your
previously stated desire to carry on following up drop-outs 'per protocol
schedules' in order to get 'complete data'. If whether or not someone
dropped out because of poor efficacy in the first two weeks was one's
primary outcome measure, then one wouldn't be conducting a study over many
months, presumably with some (probably serial) measured index of efficacy
making up the 'complete data' for those who didn't drop out. You can't
really have one definition of 'complete data' (and one definition of
'primary outcome measure') for those subjects who drop out early and
different definitions for those who don't!
>In the end, my point is that we should try to think of
>creative ways to design trials that maximise their validity
>that balance the realties of treating patients.
I would quibble with that objective at all, but I don't see that 'being
cautious' in planning the sample size has anything to do with it!
On Oct 31, 4:46 pm, John Whittington <Joh...@mediscience.co.uk> wrote:
> At 12:30 31/10/2007 -0700, Kevin E. Thorpe wrote:
>
> > > Now we are onto the 'commonsense' stuff, which I agree conflicts with the
> > > concept of 'strict ITT' analyses. For example, in my opinion, in a study
> > > comparing treatments A & B, very little other than obsessive compliance
> > > with a strict 'ITT' concept would benefit from continuing to assess a
> > > subject 'per protocol' for months and months after (s)he had stopped taking
> > > either A or B. Indeed, if the assessments are intrusive, and certainly if
> > > they carry any appreciable risk, it could even be regard as unethical to
> > > continue them after a subject had stopped receiving the study treatment.
>
> >Good point. I did say if possible, but in the examples
> >you raise, I would say they are examples of where it was
> >not possible or advisable.
>
> ...but there was nothing 'special' about my examples. I would have thought
> that what I was saying applies to the large majority of all RCTs, certainly
> the majority of those I've been involved with over the decades. maybe you
> have a particular study, or type of study, in mind?
Perhaps in a more pragmatic type of trial of some management
policy. But there you would not typically be doing frequent
follow-up (at least not more frequent than they get according
to their standard of care).
>
> >Yes. But my point is by inflating the the target you may be
> >setting yourself up from the start to have incomplete data
> >that could call the validity into question.
>
> I still don't really follow your logic. As I said before, I could just as
> easily 'inflate the target' (sample size) by (totally arbitrarily)
> specifying a higher level of desired power. Would you say that would also
> 'set oneself up to have incomplete data'?
>
> Furthermore, a cautious study designer will tend to base his/her sample
> size estimates on 'worst case scenarious' - e.g., given a range of values
> for variability from historical studies, will tend to design on the basis
> of the greatest observed variability, rather than some sort of average,
> which would be closer to the 'best bet estimate'. Would you also call that
> 'inflating the target'. I would call it 'sensible caution', since there
> are no prizes (yet many potential penalties!) for undertaking a study which
> proves to have been underpowered.
>
> >I certainly do not regard the estimated sample size as
> >"magical." I know only too well the are based on rather
> >arbitrary parameters and insufficient data in the first
> >place. I'm surprised they work at all. :-)
>
> Quite. So why is it so much worse to use, say, 110% of an iffy estimate
> 'based on rather arbitrary parameters' than 100% of that estimate? In the
> real world, a figure 'inflated' by such modest 'mark up' is probably just
> as likely to be close to the 'true' sample size requirement (if one had
> perfect data on which to base it) as is the 'uninflated' figure.
>
The difference, as I see it, is that in one case you inflate
your sample size with an expectation of achieving it while
in the other you inflate with the full expectation of missing
outcome data.
Maybe it makes absolutely no practical difference. Personally,
I just feel this is overused without regard to potential
consequences.
> However, getting more 'back to basics' your premise seems to be that there
> is something very wrong about undertaking a study with a sample size
> greater than the MINIMUM sample size that one has estimated to be
> necessary.
Not at all.
> Indeed, you even refer to it as a 'target'. In reality, I
> think one really should regard such estimates as estimates of the MINIMUM
> sample size that would be expected to be adequate, given the specified
> paraemeters. Within reason/limits, there are few statistical/scientific
> downsides (only practical ones of time, effort, money, subject availability
> etc. - and sometimes ethical considerations), yet plenty of upsides, of
> undertaking studies which are appreciably larger than that 'minimum estimate'.
I agree with this. Again, it may not make any practical
difference, but inflating sample size for drop-out does
not seem to me like planning a "bigger" study, since the
aim appears to me to be to achieve some minimum sample size.
> >True. However, in your example, patients quit X due to
> >inadequate efficacy. If adequate efficacy were your outcome,
> >those who quit X have the primary outcome measured, failure
> >in this case. No problem.
>
> You really can't have it all ways! I can't square that statement with your
> previously stated desire to carry on following up drop-outs 'per protocol
> schedules' in order to get 'complete data'. If whether or not someone
> dropped out because of poor efficacy in the first two weeks was one's
> primary outcome measure, then one wouldn't be conducting a study over many
> months, presumably with some (probably serial) measured index of efficacy
> making up the 'complete data' for those who didn't drop out. You can't
> really have one definition of 'complete data' (and one definition of
> 'primary outcome measure') for those subjects who drop out early and
> different definitions for those who don't!
>
To me, it is a question of choosing a sensible, measurable
outcome at the outset. Completeness for me is obtain that
primary outcome. If you have significantly greater early
treatment termination in one arm, that should certainly
not be ignored in the analysis.
> >In the end, my point is that we should try to think of
> >creative ways to design trials that maximise their validity
> >that balance the realties of treating patients.
>
> I would quibble with that objective at all, but I don't see that 'being
> cautious' in planning the sample size has anything to do with it!
>
Kevin
At 18:47 31/10/2007 -0700, Kevin E. Thorpe wrote (in very small part):
>On Oct 31, 4:46 pm, John Whittington <Joh...@mediscience.co.uk> wrote:
> > ...but there was nothing 'special' about my examples. I would have thought
> > that what I was saying applies to the large majority of all RCTs, certainly
> > the majority of those I've been involved with over the decades. maybe you
> > have a particular study, or type of study, in mind?
>
>Perhaps in a more pragmatic type of trial of some management
>policy. But there you would not typically be doing frequent
>follow-up (at least not more frequent than they get according
>to their standard of care).
Fair enough, but I would then suggest that it is _you_ who have very
special, and relatively unusual, types of trial in mind. It's not even too
clear to me what would be meant by 'drop-out' in that sort of trial. Also,
in relation to what appears to be your central point, if follow-up
consisted of no more than would be undertaken in terms of standard care,
it's quite difficult to see how 'failure to follow-up' and the consequent
'incomplete data' could actually arise in such a situation - unless the
subject 'dropped out' of healthcare altogether!
I've been trying to get my head around the underlying concept that you are
working with. As far as I can make out, your basic premise seems to be
that 'drop-out' and/or the collection of 'complete data' on those who do
'drop out' is likely to be influenced by sample size - in particular,
whether any allowance has been made for non-completers. In terms of the
sort of RCTs that I deal with (which I think are probably representative of
'the majority'), I just don't think that is the
case. 'Non-completion'/'Drop-out' (withdrawal of an individual subject
from a study) occurs for specific reasons (subject request, unacceptable
side effects or lack of efficacy, intercurrent illness, major protocol
violations etc.), and there is very little scope at all for that to be
influenced by anything, let alone sample size. As for whether one carries
on attempting to 'collect complete data, per protocol schedule' from a
subject who has 'withdrawn' and is no longer receiving trial treatment,
that really depends mainly on the type of analysis that is to be
undertaken. As I've said, if one is going to undertake a 'strict ITT'
analysis, then that is indeed precisely what one would attempt to do (which
I why I think you're essentially talking about such an analysis) - but,
despite all my reservations about that type of analysis, I still don't see
how the way in which the sample size has been estimated influences
that. At the other extreme (equally iffy), in which one was only going to
undertake a 'per protocol' analysis (i.e. non-completers would be
excluded), there would clearly be no point in continuing to collect data
from subjects after withdrawal.
In other words, I think your point probably comes down essentially to the
question of what sort of analysis one is going to undertake, regardless of
the sample size or how it was decided upon.
If you are thinking that the trial investigators will, in some way, be
influenced by knowledge that at allowance has been made for
'non-completers', I just don't think that's true in the vast majority of
cases. Indeed, even though it's probably there to be seen in the dark
corner of the protocol called the 'Statistics Section', I seriously doubt
that many investigators could even tell you the basis for the sample size
determination, or whether it included any allowance for
non-completers. All they will usually know, or care about, is the number
of subjects they have been asked/told to recruit.
>The difference, as I see it, is that in one case you inflate
>your sample size with an expectation of achieving it while
>in the other you inflate with the full expectation of missing
>outcome data.
>Maybe it makes absolutely no practical difference. Personally,
>I just feel this is overused without regard to potential
>consequences.
It's the nature of those 'consequences' you perceive that remains unclear
to me. As I and others have said, there are certainly major, and
difficult, issues which arise because of 'non-completers' and 'incomplete
data', but I still don't see how these can be viewed as consequences of the
nature of the sample size determination.
Maybe this is all about language. What if, rather than talking about
'allowance for dropouts', I told you that I was going to add 10% or 20%
onto the result of my 'sample size calculation' as a 'safety factor', in
view of our knowledge that such calculations are invariably
imprecise. Would you still have the same concerns? ....
>I agree with this. Again, it may not make any practical
>difference, but inflating sample size for drop-out does
>not seem to me like planning a "bigger" study, since the
>aim appears to me to be to achieve some minimum sample size.
... which suggests that your answer to the question I've just posed is
probably 'no'. If that is the case, then I think we quite probably are
only arguing about 'language'.
>To me, it is a question of choosing a sensible, measurable
>outcome at the outset. Completeness for me is obtain that
>primary outcome. If you have significantly greater early
>treatment termination in one arm, that should certainly
>not be ignored in the analysis.
In that case, there is clearly room for discussion about what you mean by
'complete data'!! Probably without intending it, I think you are getting
very close to agreeing with some of my serious reservations about a 'strict
ITT' approach to most trials!!
A trial can only really have one 'primary outcome measure' in relation to a
particular aspect of the data (e.g. efficacy). If a trial involves weeks
or months of treatment, with (quite probably serial) assessment of some
numerical index of outcome, then that measure (either throughout or at the
end of the treatment period) is likely to be that primary outcome
measure. If one is committed to a 'strict ITT' analysis of the data, then
one would carry on assessing subjects after they had stopped receiving
trial treatment (and quite possibly been changed onto alternative non-trial
treatment) and would include all that data in the ITT analysis, just as if
the subject had remained on the trial treatment. I know that it is
probably offensive to some ITT enthusiasts, and also that it can sometimes
be argued to be the lesser of the evils, but that word 'lunacy' keeps
coming into my mind when I think about this in relation to many types of
trial! Analysis of differential rates of early treatment termination
would, of course, usually be included as a secondary outcome/analysis - but
that won't alter the fact that the 'primary analysis' could be extremely
biased because of the ITT approach; as I illustrated before, such an
approach has the capacity to make a useless treatment look effective, which
is hardly a good thing!
As an aside .... I suspect this may turn into an 'ITT debate', particularly
if there are any ITT enthusiasts around! I accept that some people regard
it as 'the lesser of evils', and they might be right. I also accept that
there is a definite place for 'strict ITT' in very pragmatic (usually
'late') trials of a treatment. Such trials directly and appropriately
address clinically relevant questions such as "Will patients do better if I
decide to prescribe drug A than if I decide to prescribe drug B, regardless
of what happens after I have written the prescription". That's fair enough
but, to my mind, the mistake is to then claim or believe that ones results
necessarily relate to a comparison of the efficacy of A and B. Just as
with my previous example, 'prescribing drug A' could prove to result in the
greater benefit, despite A being useless, if most patients who had been
prescribed drug A very quickly 'dropped out' and were instead treated with
an effective drug 'C'.
No, that's not what I'm trying to say at all. Sorry I
have not made my point clear to you yet. Let me try it
this way.
I compute a sample size based on the agreed upon arbitrary
parameters of the day, and write a few sentences for the
grant proposal. The proposal comes back to me with the
"inflation" statement added by the PI. I think it is
good that the PI is thinking about incomplete data. However,
my worry is that what the PI is saying is, "It doesn't
really matter if we have drop-outs or incomplete outcome
assessment, just as long as in the end we get the number
Kevin computed."
It seems from our discussion that maybe I'm being
uncharitable in my interpretation of the this practice and
that I should not get so fussed about it. In that case,
this is a useful discussion.
As I just wrote above, it is the PI that adds the "inflation"
in the sample size section. If you are right that knowing
there is wiggle room does not influence the diligence of
the trial team, then I would be relieved.
> >The difference, as I see it, is that in one case you inflate
> >your sample size with an expectation of achieving it while
> >in the other you inflate with the full expectation of missing
> >outcome data.
> >Maybe it makes absolutely no practical difference. Personally,
> >I just feel this is overused without regard to potential
> >consequences.
>
> It's the nature of those 'consequences' you perceive that remains unclear
> to me. As I and others have said, there are certainly major, and
> difficult, issues which arise because of 'non-completers' and 'incomplete
> data', but I still don't see how these can be viewed as consequences of the
> nature of the sample size determination.
It is not related to the nature of the sample size
calculation directly. If the investigators know there
is a "buffer", perhaps they will not work as hard to get
the outcome data of that "difficult" patient. However,
you are probably right that the investigators in the
field are not thinking about the sample size requirements
of the trial, and I'm concerned about nothing.
>
> Maybe this is all about language. What if, rather than talking about
> 'allowance for dropouts', I told you that I was going to add 10% or 20%
> onto the result of my 'sample size calculation' as a 'safety factor', in
> view of our knowledge that such calculations are invariably
> imprecise. Would you still have the same concerns? ....
>
No. I'm not against having a larger sample size. My
concern was that the gap between the computation and the
"inflation due to drop-out" is viewed as expendable data.
Again, your other comments suggest that this is unlikely.
> >I agree with this. Again, it may not make any practical
> >difference, but inflating sample size for drop-out does
> >not seem to me like planning a "bigger" study, since the
> >aim appears to me to be to achieve some minimum sample size.
>
> ... which suggests that your answer to the question I've just posed is
> probably 'no'. If that is the case, then I think we quite probably are
> only arguing about 'language'.
>
> >To me, it is a question of choosing a sensible, measurable
> >outcome at the outset. Completeness for me is obtain that
> >primary outcome. If you have significantly greater early
> >treatment termination in one arm, that should certainly
> >not be ignored in the analysis.
>
> In that case, there is clearly room for discussion about what you mean by
> 'complete data'!! Probably without intending it, I think you are getting
> very close to agreeing with some of my serious reservations about a 'strict
> ITT' approach to most trials!!
I don't consider myself to be an ITT zealot. Blind
application of ITT is as dangerous as blind application
of any technique.
I have no argument with any of this. Let me ask you,
how would you approach the analysis in those cases where
ITT is clearly the wrong thing to do?
Kevin
>However, my worry is that what the PI is saying is, "It doesn't
>really matter if we have drop-outs or incomplete outcome
>assessment, just as long as in the end we get the number
>Kevin computed."
>It seems from our discussion that maybe I'm being
>uncharitable in my interpretation of the this practice and
>that I should not get so fussed about it. In that case,
>this is a useful discussion.
I think that is, indeed, the crux of it.
Underlying what you are saying must presumably be the belief that the
investigator has appreciable control over how much missing data there is
- depending upon whether, on one hand, (s)he 'tries harder' or, on the
other hand, is in some way 'lazy' or, at least, 'less than
diligent'. However, as I explained in a previous message, at least in
terms of the sort of RCTs I have been involved with for a very long time,
the great majority of 'dropouts' and any consequential 'missing data' are
due to factors over which the investigator has no control.
Investigators have minimal control over 'dropouts'. Indeed, if (inn the
name of trying to avoid missing data) they tried to dissuade subjects from
dropping out, or failed to withdraw subjects when they should do, they
would be in extremely deep ethical water!! The question of whether or not
subjects will continue to be followed-up/assessed (if they are willing to
be assessed) after 'dropout' (premature discontinuation of trial treatment)
will be defined in the protocol and what it says about that will, as I've
been explaining, depend upon what sort of analysis is specified in the
protocol (i.e. whether ITT or not) - it's not something left to the whim if
the investigator. All clinical research, whether commercially sponsored or
not, is now meant to be undertaken in compliance with GCP principles, which
are now formalised in both EU and UK law - and that requires the
investigator to undertake whatever assessments or follow-up is specified in
the protocol, unless there is a very good reason why it can't be done (like
patient refusal or 'disappearance'). If that's not incentive enough, if we
were talking of commercially-sponsored trials (which constitute a
substantial proportion of the whole), the investigator would not get paid
if he failed, without good reason, to comply with the requirements of the
protocol. Furthermore, if a trial is being conducted to proper GCP
standards, a monitor should be continually harassing the investigator about
any failures to undertake the required follow-up/assessment - just in case
(s)he had 'forgotten'!
So, the bottom line is that, although I now understand your concern, I
think it is now a generally unfounded one.
On the other hand, the problem of how do deal with trials (almost all of
them!) in which there is 'missing data' (through no fault of the
investigator) is a very real one indeed, but that's a separate matter!
Well, for a start, one has to acknowledge that there are those who probably
would not agree that there are "cases where
ITT is clearly the wrong thing to do"
We all know that there is no real 'right answer', since all of the
alternatives have clear downsides. As I wrote in response to Doug Altman's
comment a couple of days ago, one increasingly common approach is to 'do it
both ways' and compare the results. Whilst 'analysing the data in two (or
more) ways' clearly raises statistical issues (not the least about Type I
error rate) this does then facilitate a discussion about interpretation
and, where possible, the application of common sense. For example, in a
less extreme version of the sort of example I cited above, a situation in
which a drug seemed much better in an ITT analysis than in a 'per protocol'
one would lead one to suspect that the apparent efficacy seen in the ITT
analysis could actually be the efficacy of something other than the
treatment under test!
Interestingly, if I take off my statisticians hat and put on my clinician's
one, I tend to see things rather differently, and tend swing quite strongly
in favour of 'per protocol' / 'completers only' analyses. Sure, to a
statistician they are very biased, since they are only looking at how good
a treatment is 'in those patients who don't have problems with the
treatment'. However, the clinician is also aware of that (and that 'no one
treatment suits everyone'), but (s)he also knows that those patients who
have problems with that treatment will be switched to something different;
what the clinician therefore really wants to know is how well the treatment
works 'in those in whom it works' (and is tolerated) - i.e. those patients
who, in practice, would probably be kept on the treatment.
There's a lot of potential discussion to be had here, but those are just a
few opening words!
On Nov 1, 4:43 pm, John Whittington <Joh...@mediscience.co.uk> wrote
(in part):
>
> So, the bottom line is that, although I now understand your concern, I
> think it is now a generally unfounded one.
Thanks. The discussion has been useful.
>
> On the other hand, the problem of how do deal with trials (almost all of
> them!) in which there is 'missing data' (through no fault of the
> investigator) is a very real one indeed, but that's a separate matter!
>
Very true!
Kevin
I think that to some extent the question you wish to
answer is related to the analysis. The question as you
put it above, "how well the treatment works 'in those in
whom it works' (and is tolerated)" is a perfectly valid
question. I would agree that an on-treatment type of
analysis is reasonable.
On the other end of the spectrum is the question a policy
maker might ask. Something like, "if we recommend treatment
X for the masses, will the public, on average, be better
off?" In that case ITT may be more appropriate.
Consider a, rather unlikely, example. Imagine that an
on-treatment analysis showed that you could not say
treatment X was better than standard, but the ITT did.
One person may look at that and say, "No way I'm
prescribing X, it doesn't work." On the other hand,
someone else might say, "If I start my patients on X first,
on average they will do better."
Is one of these "more correct" than the other?
I don't know.
Kevin
>I think that to some extent the question you wish to
>answer is related to the analysis. The question as you
>put it above, "how well the treatment works 'in those in
>whom it works' (and is tolerated)" is a perfectly valid
>question. I would agree that an on-treatment type of
>analysis is reasonable.
Exactly. I would actually say that it's more than 'to some extent'. I
think the type of analysis is crucially dependent on the question being asked.
>On the other end of the spectrum is the question a policy
>maker might ask. Something like, "if we recommend treatment
>X for the masses, will the public, on average, be better
>off?" In that case ITT may be more appropriate.
Indeed. That's almost exactly as I said before, and it's particularly
relevant in terms of any self-administered 'treatments' (e.g.
over-the-counter drugs, dietary/lifestyle advice etc.). However, my major
caveats come when we are talking about 'prescribed' treatments - or, at
least, treatments that are in some sense overseen by healthcare
professionals (who, hopefully, will not allow an ineffective treatment to
carry on being used). The example you go on to give illustrates this quite
well. You write:
>Consider a, rather unlikely, example. Imagine that an
>on-treatment analysis showed that you could not say
>treatment X was better than standard, but the ITT did.
>One person may look at that and say, "No way I'm
>prescribing X, it doesn't work." On the other hand,
>someone else might say, "If I start my patients on X first,
>on average they will do better."
I talked about this in my last message. If an ITT analysis produces
'better' results for the test treatment than does a per-protocol analysis,
this strongly suggests that what one is seeing in the ITT analysis are the
effects of something other than the treatment one is evaluating. Most
likely, as I've said before, if a substantial proportion of patients
abandon treatment X early because it is not working, and instead are given
a known effective treatment', then what one's ITT analysis is largely
seeing are the effects of that 'known effective treatment', not of treatment X.
It's almost a joke to end up with the conclusion that "If I start my
patients on X first, on average they will do better." if the reason that is
true is that treatment X is so useless that anyone started on it will be
rapidly changed to something more effective! If the facts were as I've
suggested, I would be much more interested to know that far more patients
(than with standard treatment) had had to abandon treatment with X (and be
transferred onto some effective 'escape treatment') because it was ineffective.
Yes, in fact I would agree. I'm just trying not to be
too "declarative" in my language. :-)
>
> >Consider a, rather unlikely, example. Imagine that an
> >on-treatment analysis showed that you could not say
> >treatment X was better than standard, but the ITT did.
> >One person may look at that and say, "No way I'm
> >prescribing X, it doesn't work." On the other hand,
> >someone else might say, "If I start my patients on X first,
> >on average they will do better."
>
> It's almost a joke to end up with the conclusion that "If I start my
> patients on X first, on average they will do better." if the reason that is
> true is that treatment X is so useless that anyone started on it will be
> rapidly changed to something more effective! If the facts were as I've
> suggested, I would be much more interested to know that far more patients
> (than with standard treatment) had had to abandon treatment with X (and be
> transferred onto some effective 'escape treatment') because it was ineffective.
>
I can imagine an alternate "explanation" for such a finding.
Assuming X is a drug, suppose that administration of X,
while appearing ineffective, is actually priming the patient
for a better response when switched to "rescue" medication.
I don't know how likely this is, but I'm aware that there are
some cancer drugs that alone may not shrink tumours, but are
considered as possible combination therapies.
If that were the case, and X was otherwise "safe" and
"affordable" you _might_ consider a policy of X followed
by standard.
Obviously, one would not even entertain such scenarios if
only an ITT or only an on-treatment analysis were done.
Kevin
>I can imagine an alternate "explanation" for such a finding.
>Assuming X is a drug, suppose that administration of X,
>while appearing ineffective, is actually priming the patient
>for a better response when switched to "rescue" medication.
>I don't know how likely this is, but I'm aware that there are
>some cancer drugs that alone may not shrink tumours, but are
>considered as possible combination therapies.
>
>If that were the case, and X was otherwise "safe" and
>"affordable" you _might_ consider a policy of X followed
>by standard.
Sure, that's a theoretical possibility, but it would be an extremely rare
situation, and could only apply to a very limited number of
conditions/diseases being treated. Although, as you say, not unknown in
the sort of situations you mention, it is extremely unusual for a drug to
have a lasting effect on the efficacy of another drug given subsequently -
particularly if the first drug was so unless (or poorly tolerated) that
most patients were abandoning it very early.
... and, anyway, if one ever did think one might be dealing with that
situation, one probably would do 'the definitive study' - i.e to compare Y
alone with 'X followed by Y'.
There is another thing one needs to bear in mind in relation to the sort of
'ludicrously anomolous' results that can theoretically arise when ITT is
used to analyse totally useless drugs. Clinical trials are often very long
in the planning and the execution and the 'standard treatments' chosen as
comparators are often deliberately 'very well established' (a.k.a. old)o
ones. That means that the 'standard treatment' being used as a comparator
in a trial can often be many years behind the most effective of most recent
treatments. That means that if patients bail out of a study 'because X is
useless', and get put onto some sort of 'rescue therapy'' instead, that
'rescue therapy' might well be much more modern (and perhaps 'better') than
the standard treatment being used in a comparator in a trial. If that
happens, it obviously increases the chances that an ITT analysis will show
a totally useless treatment to be better than a 'standard treatment'
comparator.
Kindest Regards,
When I perform sample size calculation for researchers when they
develop their grant proposal, I usually take into account the
anticipated dropout rate and use the correction proposed by Lachin
(1981a, p. 97):
"the sample size required with dropouts N* will approximately
be given by N* = N/(1 - R)^2 where N is the sample size required with
no dropouts."
Derivation of the formula is explained in Lachin (1981b, p. 224).
When I read the sample size calculation part of some clinical
protocols (the majority, in fact!), the correction is generally the
following:
N* = N/(1 - R)
I don't want to "overcorrect" the sample size and thereby increase the
budget asked by the researchers.
What is the correct formula to use?
Best regards,
François Harel
Service de biostatistique
Centre de Recherche en oncologie de L'Université Laval
CHUQ - L'Hôtel-Dieu de Québec
Québec (Qc) - Canada
------------------------------------------------------
> email: doug.alt...@cancer.org.uk
> Tel: 01865 284400 (direct line 01865 284401)
> Fax: 01865 284424
> www: http://www.csm-oxford.org.uk/
>
> EQUATOR Network - resources for reporting research
> www: <http://www.equator-network.org/>http://www.equator-network.org/- Masquer le texte des messages précédents -
>
> - Afficher le texte des messages précédents -
Lachin (1981a):
Lachin JM
Control Clin Trials. 1981 Jun;2(2):93-113.
"Introduction to sample size determination and power analysis for
clinical trials."
PMID: 7273794 [PubMed - indexed for MEDLINE]
Lachin (1981b):
Lachin JM, Marks JW, Schoenfield LJ, Tyor MP, Bennett PH, Grundy SM,
Hardison WG,
Shaw LW, Thistle JL, Vlahcevic ZR.
Control Clin Trials. 1981 Sep;2(3):177-229.
"Design and methodological considerations in the National Cooperative
Gallstone
Study: a multicenter clinical trial."
PMID: 7326939 [PubMed - indexed for MEDLINE]
On 8 nov, 10:51, "Francois Harel - CRHDQ Québec (Qc) Canada"
> > www: <http://www.equator-network.org/>http://www.equator-network.org/-Masquer le texte des messages précédents -
>
> > - Afficher le texte des messages précédents -- Masquer le texte des messages précédents -
>I have a question concerning the computation of the sample size
>corrected for the dropout rate.
>
>When I perform sample size calculation for researchers when they
>develop their grant proposal, I usually take into account the
>anticipated dropout rate and use the correction proposed by Lachin
>(1981a, p. 97):
>"the sample size required with dropouts N* will approximately
>be given by N* = N/(1 - R)^2 where N is the sample size required with
>no dropouts."
>Derivation of the formula is explained in Lachin (1981b, p. 224).
>
>When I read the sample size calculation part of some clinical
>protocols (the majority, in fact!), the correction is generally the
>following:
>N* = N/(1 - R)
>
>I don't want to "overcorrect" the sample size and thereby increase the
>budget asked by the researchers.
>What is the correct formula to use?
I would certainly agree that the latter is the adjust one normally
sees/expects. If one simply undertakes an 'analysis of completers' in
exactly the same way one would have done for a 'no dropouts' situation,
then that surely is adequate, since it ends up with the same analysed
sample size as if there were no dropouts. I'll have to try to look at
Lachin's paper, since it sounds as if he must be assuming some 'adjustment'
of his analysis when dropouts have occurred.
Kind Regards,
It will be interesting to learn what you discover, John.
In the admittedly somewhat extreme situation of a 90% droput,
a planned sample size of 100 would end up with 10,000 subjects
recruited, if one uses the N/(1-R)^2 formula (and an expected
number 1000 of completers)! So there must be some other
reasoning, or conditions, in there ...
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Nov-07 Time: 19:12:36
------------------------------ XFMail ------------------------------