sample size determination for comparing the "change from baseline" scores between two groups

1,417 views
Skip to first unread message

shima younespour

unread,
Jan 21, 2013, 11:39:27 AM1/21/13
to meds...@googlegroups.com
Dear all Medstats,
 
I'd like to know how to calculate the sample size needed for comparing the "change from baseline" scores between two groups?
 
 
Thank you in advance for your consideration
Shima

Neil Shephard

unread,
Jan 21, 2013, 11:47:44 AM1/21/13
to meds...@googlegroups.com
On 21 January 2013 16:39, shima younespour <shima.yo...@gmail.com> wrote:
>
> I'd like to know how to calculate the sample size needed for comparing the
> "change from baseline" scores between two groups?
>

Not a direct answer, but you may wish to consider a different approach
to your outcome in light of the following...

@article{vickers2001,
abstract = {{BACKGROUND:Many randomized trials involve measuring a
continuous outcome - such as pain, body weight or blood pressure - at
baseline and after treatment. In this paper, I compare four
possibilities for how such trials can be analyzed: post-treatment;
change between baseline and post-treatment; percentage change between
baseline and post-treatment and analysis of covariance (ANCOVA) with
baseline score as a covariate. The statistical power of each method
was determined for a hypothetical randomized trial under a range of
correlations between baseline and post-treatment scores.RESULTS:ANCOVA
has the highest statistical power. Change from baseline has acceptable
power when correlation between baseline and post-treatment scores is
high;when correlation is low, analyzing only post-treatment scores has
reasonable power. Percentage change from baseline has the lowest
statistical power and was highly sensitive to changes in variance.
Theoretical considerations suggest that percentage change from
baseline will also fail to protect from bias in the case of baseline
imbalance and will lead to an excess of trials with non-normally
distributed outcome data.CONCLUSIONS:Percentage change from baseline
should not be used in statistical analysis. Trialists wishing to
report this statistic should use another method, such as ANCOVA, and
convert the results to a percentage change by using mean baseline
scores.}},
address = {Integrative Medicine Service, Biostatistics Service,
Memorial Sloan-Kettering Cancer Center, 1275 York Avenue New York, New
York 10021, USA. vick...@mskcc.org},
author = {Vickers, Andrew},
citeulike-article-id = {895057},
citeulike-linkout-0 = {http://dx.doi.org/10.1186/1471-2288-1-6},
citeulike-linkout-1 = {http://view.ncbi.nlm.nih.gov/pubmed/11459516},
citeulike-linkout-2 = {http://www.hubmed.org/display.cgi?uids=11459516},
doi = {10.1186/1471-2288-1-6},
issn = {1471-2288},
journal = {BMC Medical Research Methodology},
keywords = {methodology, percentagechange, statistics},
number = {1},
pages = {6+},
pmid = {11459516},
posted-at = {2011-06-08 11:32:49},
priority = {2},
title = {{The use of percentage change from baseline as an outcome
in a controlled trial is statistically inefficient: a simulation
study}},
url = {http://dx.doi.org/10.1186/1471-2288-1-6},
volume = {1},
year = {2001}
}


Neil

--
"To kill an error is as good a service as, and sometimes even better
than, the establishing of a new truth or fact" - Charles Darwin

Neil Shephard
Clinical Trials Research Unit
University of Sheffield

Marc Schwartz

unread,
Jan 21, 2013, 11:52:48 AM1/21/13
to meds...@googlegroups.com
It will depend upon how you are going to test the difference between the two groups and I am presuming that you have one baseline and one follow up measure for each subject. In other words, this is not a repeated measures scenario, in which case, simulation is the likely approach.

If you are going to use an ANCOVA based approach (see: http://pubmedcentralcanada.ca/pmcc/articles/PMC1121605/pdf/1123.pdf), see this post on R-Help from 2010, which points to an online calculator and replicates the results in R.



Regards,

Marc Schwartz

Thompson,Paul

unread,
Jan 21, 2013, 11:55:52 AM1/21/13
to meds...@googlegroups.com

I use a SAS/IML macro powerlib, written by Keith Muller and his colleagues.

 

@ARTICLE{Johnson-2004-1,

  author = {Johnson, Jacqueline L. and Muller,Keith M. and Slaughter,James C.

               and Gurka,Matthew J. and Gribbin,Matthew J. and Simpson,Sean L. },

  title = {{POWERLIB}: {SAS/IML} Software for Computing Power in Multivariate

               Linear Models},

  journal = {Journal of Statistical Software},

  year = {2004},

  volume = {30},

  pages = {1-27},

  number = {5},

  owner = {THOMPSOP},

  timestamp = {2010.10.10}

}

 

It can be used for a variety of cases such as this.

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
 
 



-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

Frank Harrell

unread,
Jan 22, 2013, 9:16:01 AM1/22/13
to meds...@googlegroups.com
The fact that anyone is still computing change from baseline is a worrisome comment about statistical education in my opinion.
Frank

--
--
To post a new thread to MedStats, send email to Med...@googlegroups.com .

BXC (Bendix Carstensen)

unread,
Jan 22, 2013, 9:39:31 AM1/22/13
to meds...@googlegroups.com
The fact that anyone is still using formulae instead of a simple simulation approach for power (and precision!) calculations is a worrisome comment about computational education in my opinion.

Bendix Carstensen
To post a new thread to MedStats, send email to MedS...@googlegroups.com .

Thompson,Paul

unread,
Jan 22, 2013, 9:54:59 AM1/22/13
to meds...@googlegroups.com
Powerlib actually does linear models with covariances between dependent values. I didn't make it clear that it is not doing change from baseline, but RM.

John Whittington

unread,
Jan 22, 2013, 10:01:39 AM1/22/13
to meds...@googlegroups.com
At 15:39 22/01/2013 +0100, BXC (Bendix Carstensen) wrote:
>The fact that anyone is still using formulae instead of a simple
>simulation approach for power (and precision!) calculations is a worrisome
>comment about computational education in my opinion.

I totally agree with you often-voiced view that simulation is the ideal
way of determining power (or estimating required sample size) - as you
always say, it is applicable to absolutely any situation, and has the great
advantage of forcing one to develop a full understanding of one's data and
how one is going to analyse it.

However, if the data and planned method of analysis are simple/standard,
and therefore accommodated by an available 'formula' (or, at least,
'computational method'), then I cannot really see that simulation would
offer any appreciable advantage (or a different answer), would it?

Kind Regards,


John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

david braunholtz

unread,
Jan 22, 2013, 10:08:18 AM1/22/13
to meds...@googlegroups.com
Wondered what software Bendix uses / recommends for doing sample-size simulations ?

David

----- Original Message -----
From: "BXC (Bendix Carstensen)" <b...@steno.dk>
To: meds...@googlegroups.com
Sent: Tuesday, 22 January, 2013 2:39:31 PM
Subject: RE: {MEDSTATS} sample size determination for comparing the "change from baseline" scores between two groups

The fact that anyone is still using formulae instead of a simple simulation approach for power (and precision!) calculations is a worrisome comment about computational education in my opinion.

Bendix Carstensen

BXC (Bendix Carstensen)

unread,
Jan 22, 2013, 12:40:03 PM1/22/13
to meds...@googlegroups.com
Personally I use R, but I guess that Stata and SAS are equally well suited, I'm just not very familiar with simulation in those packages.

Basically you make a few loops over the scenarios you are interested in, and inside those a loop of say 1:500 each generating a dataset and an analysis of it, collecting the relevant p-values, estimates and se.s of estimates.

Once that is collected it is easy to show how power and precision depend on the scenarios covered in the outer loops.

The reason I use R is that the results is most easily collected in a multidimensional array, and R has some very powerful tools to handle arrays, particularly of you want to make computations or plots of slices of them. Which is what is needed here.

But a few data steps and proc tabulate in SAS would probably do the job too.

If you look in http://bendixcarstensen.com/Teaching/SampSize/
you will find a .pdf that explains how to do it in R, and an R-program that does it, it should be self-contained, so you can tamper with it yourself.

b.r.
Bendix

Swank, Paul R

unread,
Jan 22, 2013, 12:43:31 PM1/22/13
to meds...@googlegroups.com
Yes, I use SAS to simulate data for power analyses and it is very straightforward. Because of "by" processing, it can be handled in the data step with a single procedure without the need for macros.

Dr. Paul R. Swank, Professor
Health Promotion and Behavioral Sciences
School of Public Health
University of Texas Health Science Center Houston


-----Original Message-----
From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of BXC (Bendix Carstensen)

Thompson,Paul

unread,
Jan 22, 2013, 12:48:00 PM1/22/13
to meds...@googlegroups.com
I'm gonna come down on the other side of this. Simulation is general purpose, and applicable to many situations. It is also not available to most users. Anybody reading this list is not "most users". Many applied users could not write a simulation-based method. I hope that many applied users can do their own power analyses for simple, cut-n-dried situations.

In addition, while a simulation is general purpose, it is also more difficult to audit, evaluate, and examine. There are plenty of places in writing a simulation in which small choices have big effects, and it is not always easy to determine if the simulation is done correctly. Thus, given the choice between a defined solution to solve a problem and a simulation based approach, I would be inclined to favor the defined solution.

That's not to say that there are closed-form defined methods for all situations. In fact, with the exploding power of computation methods, there are many situations in which simulation is all that we have. But defined special tools have advantages as well.

-----Original Message-----
From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of BXC (Bendix Carstensen)
Sent: Tuesday, January 22, 2013 11:40 AM
To: meds...@googlegroups.com

Greg Snow

unread,
Jan 22, 2013, 3:40:18 PM1/22/13
to meds...@googlegroups.com
Paul (and others),

While I agree that canned procedures/formulas work fine for simple cases like t-tests, my experience with the real world is that projects quickly grow beyond the simple procedures. What starts as a simple comparison that could be done with a 2 sample t-test often gets expanded at the planning stage when someone says: maybe we should adjust for X or stratify on Y, etc. and you have to decide between using the simple formula and ignoring planned adjustments, or going beyond the simple formulas.

You (Paul) say " There are plenty of places in writing a simulation in which small choices have big effects" and I agree with that statement, though I think that it argues more for simulation than for canned procedures. A while back I had a coworker who was trying to determine sample size for a moderately complex design, he had found 2 different tools for this but they had given him 2 very different answers. I did a simulation and came up with a 3rd answer. We agreed that the differences were probably due to different assumptions made, but we could not find anywhere for the 1st 2 methods what assumptions they had made (closed source software) and therefore had no idea if those assumptions were reasonable for his project. On the other hand we knew exactly which assumptions had gone into my simulations (he suggested a couple tweaks to some of the parameters and we reran the simulations for the final determination). I expect that there are many people who have used 1 of the 2 original programs and designed studies based on the results without even realizing what assumptions were being made, let alone thinking about if those assumptions were reasonable for their situation.

I think that it is easier to determine if a simulation is done correctly (provided the simulation code is available, which is generally not the problem) than for some defined formula versions.

Though I do admit that I still use the power.t.test function in R rather than a simulation if the analysis will really just be a t-test.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg...@imail.org
801.408.8111

Steve Simon, P.Mean Consulting

unread,
Jan 23, 2013, 10:20:00 AM1/23/13
to meds...@googlegroups.com, shima younespour
On 1/21/2013 10:39 AM, shima younespour wrote:

> I'd like to know how to calculate the sample size needed for
> comparing the "change from baseline" scores between two groups?

In an odd coincidence, I just talked about this in the latest issue of
my newsletter, The Monthly Mean. Take a look at
--> http://www.pmean.com/news/201211.html#2

It doesn't show you the formula, but it points out the critical need for
additional information in a study like this. If you get the intraclass
correlation or the within subject variation or the standard deviation of
the change score, and if you commit to one particular analysis approach,
I can show you the formula that you would use.

In another odd coincidence, I highlighted a book that talks about some
of the controversies associated with change scores versus analysis of
covariance in the same newsletter:
--> http://www.pmean.com/news/201211.html#book

And I have to disagree with Neil Shepard and Frank Harrell who believe
that change scores are inappropriate. The fact that ANCOVA is more
efficient is only part of the story. Also important is clinical
relevance. Suppose you are looking at an exercise program intended to
help people lose weight. You weigh people at baseline (w0), randomize
half to the intervention and half to control, and then weight them again
six months later(w1). The ANCOVA analysis would answer the question "Is
the average w1-beta*w0 value significantly different between the two
groups?". And I can guarantee that beta will be some crazy number like
0.763. A change score, w1-w0, measures something of direct interest to
both the patients and the doctors: how much weight did you lose?
Clinical relevance trumps efficiency here in my opinion.

In most studies, change scores have a simple and direct clinical
interpretation: how much less pain are you experiencing? how much did
your cholesterol levels drop? how much better is your quality of life?
ANCOVA produces a more troublesome clinical interpretation.

Now if you wanted to, you could model the change score and include the
baseline as a covariate. It is possible, for example, that the people
who were the most overweight noticed the greatest degree of weight loss.
And it will be identical to the ANCOVA analysis, except that your beta
will be shifted by exactly 1.0 units.

So the way I think about it, is that change scores are the primary
focus, and you might also look at an adjusted change score if you
believe that the benefit varies by the severity at baseline. But to
argue for ANCOVA instead of change scores is to argue that w1-0.763*w0
actually has a clinical meaning.

Steve Simon, n...@pmean.com, Standard Disclaimer.
Sign up for the Monthly Mean, the newsletter that
dares to call itself average at www.pmean.com/news

Frank Harrell

unread,
Jan 23, 2013, 3:57:48 PM1/23/13
to meds...@googlegroups.com, shima younespour
With all respect I could not disagree more with the statement that change scores can be clinically relevant.  The reason I feel that way is that in most situations a change score cannot be interpreted without reference to the baseline anyway.  The weight example is a particularly good example of this.  Weight doesn't change by an additive increment, nor does it change by %.  Therefore the goal of an RCT should be to estimate the difference in final weight between treatments, adjusted for baseline (which may even have a nonlinear relationship).  Besides dependence on baseline [e.g., it is statistically incorrect to analyze change from baseline when not adjusting for baseline as a covariate, usually], change scores assume the variable is perfectly transformed.  the onlyl thing worse change change from baseline is percent change from baseline.

Frank

John Sorkin

unread,
Jan 23, 2013, 4:28:41 PM1/23/13
to meds...@googlegroups.com
Professor Harrell,
I often need to defend the statements that you make in your email message. Can you suggest any references that I can show to my boss when he insists I do an analysis "his way"?
Thank you,
John
 
 
 
 
John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)>>> Frank Harrell <harr...@gmail.com> 1/23/2013 3:57 PM >>>
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
 
 

Confidentiality Statement:

This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Frank Harrell

unread,
Jan 24, 2013, 8:12:18 AM1/24/13
to meds...@googlegroups.com
John - please see the following, which also contains references:http://biostat.mc.vanderbilt.edu/ManuscriptChecklist - topics Inappropriate choice of measure of change and Use of change scores in parallel-group designs

Frank
Reply all
Reply to author
Forward
0 new messages