Re: {MEDSTATS} ANCOVA

18 views
Skip to first unread message

Peter Flom

unread,
Nov 2, 2009, 4:44:04 PM11/2/09
to MedStats
Jeff <ebs...@gmail.com> wrote
>Greetings all,
>
>I have a question about ANCOVA which I've wondered about for some
>time. I've seen several sources (textbooks) which explicitly state
>that covariates used in this analysis must be of continuous scale.
>However, I've also seen a number of studies which have used gender as
>a covariate. My understanding is that ANCOVA uses linear regression
>to identify the variance associated with the covariate, and since
>regression allows the use of dichotomous independent variables, is it
>acceptable to use a dichotomous variable as a covariate in ANCOVA?
>
>Whatever input you have would be appreciated.
>

More confusion caused by the diverse origins of statistics.

ANCOVA, ANOVA, OLS regression are all the same model in matrix terms

Y = XB + e

the various X can be interval, or categorical or dichotomous, or whatever.

The ANOVA/ANCOVA stuff comes from agriculture, where they compared plots of land.
The regression stuff comes from astronomy and geography, where they measured distances

But it's all the same stuff.

Peter

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter: @peterflom

Ted Harding

unread,
Nov 2, 2009, 5:28:17 PM11/2/09
to meds...@googlegroups.com
It may be time to clarify what distinguishes ANCOVA *as such*
from (or rather within) general regression.

First of all, Jeff is on the right lines in observing that
ANCOVA is based on linear regression, and you can indeed use
any kind of variable (continuous, categorical) which you
can use in a linear regression.

The point about ANCOVA is the following. Say you have an
investigation whose objective is to study the relationship
between a dependent variable Y and various independent
variables X1, X2, ... (say, in the agricultural context,
you might want to know the effect of different fertiliser
mixes on crop yield; so you design an experiment using the
fertiliser mixes as treatments).

But you also have data on other variables, say Z1, Z2, ...
which are relevant to the yield but are not controlled in
the experiment -- say you have historical data providing an
index of general fertility of the soil in the different
experimental areas; you have rainfall, temperature and
sunlight data over the experimental period for the different
areas; etc.

These other variables Z are called Covariates: they influence
the outcome, but they are not part of the variables whose
influence you really want to study. So the aim of the Analysis
of Covariance is to allow for their effects, and obtain estimates
of your treatment effects uncorrupted by the influence of the
Covariates.

This can be worked out to be equivalent to regressing the
residuals *after fitting the yield Y to the covariates*
as Dependent Variable, on the Treatments as Independent
Variables.

You can also, with modern modelling techniques, do this in one
pass through a standard regression approach, *provided you set
up a suitable system of contrasts* so that the above "regression
of residuals on treatments* is achieved.

Of course, you have to be careful about possible confounding
of treatments with covariates. In agricultural experiements
this would be typically achieved by imposing the same design
on each of the several areas. In other domains of study, this
may not be so easy! And there is also the question of interaction
between treatments and covariates. It's not an entirely simple
matter ...

Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Nov-09 Time: 22:28:15
------------------------------ XFMail ------------------------------

Peter Flom

unread,
Nov 2, 2009, 6:27:32 PM11/2/09
to meds...@googlegroups.com
Ted.H...@manchester.ac.uk wrote

Well ... maybe *I'm* confused. Or maybe the terms are used differently in different fields. In my experience, in the social sciences and (a little less experience) in medicine, the terms are used as follows:

ANOVA - all the IVs are categorical
ANCOVA - some IVs are continuous, some categorical but there's not necessarily the distinctions Ted makes, above.
Regression - EITHER all the IVs are continuous OR there's a mix.

It's certainly true, in any field, that we may be interested in some variables only as covariates, but these variables need not be categorical. For example, in medicine, age is often a covariate (in this sense) in survival analysis. Everyone knows age is strongly related to death, so we better remove its effects. But, AFAIK, this can be done by simply including it in the regression.

Indeed, in Ted's example, it seems to me that the covariates are continuous, while the independent variable is categorical.

Very confusing!

And not made clearer by different people using terms in different ways.

BXC (Bendix Carstensen)

unread,
Nov 3, 2009, 4:01:59 AM11/3/09
to meds...@googlegroups.com
In the olden days before matrix inversion was readily available on computers, certain linear models could be fitted smartly by various algorithms designed for hand calculation.
One was called ANOVA, another ANCOVA and others were around too.

These terms are now irrelevant and obsolete, as anyone with a computer can fit any linear model.

My advice is therefore NEVER to use any of these obsolete terms, but instead make clear what linear model was used, how it was parametrized and what the derived estimates mean in relation to the subject matter.

My prejudice is that anyone using terms ANOVA and ANCOVA in describing an analysis hasn't really understood what goes on in his/her analysis. This is of course just an unfounded personal prejudice....

Best regards,
Bendix Carstensen
_______________________________________________

Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk http://www.biostat.ku.dk/~bxc
www.steno.dk

Bruce Weaver

unread,
Nov 3, 2009, 9:43:33 AM11/3/09
to MedStats
On Nov 2, 6:27 pm, Peter Flom <peterflomconsult...@mindspring.com>
wrote:
>
> Well ... maybe *I'm* confused.  Or maybe the terms are used differently in different fields.  In my experience, in the social sciences and (a little less experience) in medicine, the terms are used as follows:
>
> ANOVA - all the IVs are categorical
> ANCOVA - some IVs are continuous, some categorical but there's not necessarily the distinctions Ted makes, above.
> Regression - EITHER all the IVs are continuous OR there's a mix.

I think that for the classical definition of "ANCOVA", you'd have to
add that there are no interactions between/among categorical &
continuous explanatory variables.

I agree with Bendix that these names are around for historical reasons
connected with ease of hand computation, and we all ought to be
switching to more appropriate up-to-date names. Journal editors &
reviewers may not always agree, however. ;-)

Speaking of history, I remember being taught (in the 80's) that if
parallel lines did not provide a good fit to the data, one should not
use ANCOVA, and would have to revert to some kind of ANOVA model
(possibly with the covariate carved into a bunch of categories so that
it could be included as another factor). It seems not to have
occurred to some authors at that time that one could run a linear
model with a covariate x factor interaction, and compare groups at
selected values of the covariate, etc. Perhaps it was because the
software to run those models was still not widely available. We
certainly did not have access to stats packages as students, and did
all of our calculations with hand calculators (using computational
formulae for sums of squares, etc). Also, this was in psychology,
where ANOVA is king--at least for the experimentalists.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Helena Oakey

unread,
Nov 3, 2009, 5:51:43 PM11/3/09
to meds...@googlegroups.com
Hi Bendix

I agree that one should be explicit about the model that has been fitted.
However, the terms ANOVA and ANCOVA are fairly entrenched and familiar
particularly in agriculture. Having previously worked in that area for a
number of years I doubt that a move away from them is likely to happen
however appealing!

Cheers

Helena


Dr Helena Oakey 
Senior Statistician


ARCH: Australian Research Centre for Health of Women and Babies
Division:
Discipline of Obstetrics & Gynaecology
School of Paediatrics and Reproductive Health
The University of Adelaide

Level 1, Queen Victoria Building
Women's & Children's Hospital
72 King William Road
NORTH ADELAIDE SA 5006

Phone:     +61 8 8161 7620
Fax: +61 8 8161 7652
Email: helena...@adelaide.edu.au
Web: www.adelaide.edu.au/arch


CRICOS Provider Number 00123M
-----------------------------------------------------------
IMPORTANT: This email message is intended only for the addressee(s) and
contains information which may be confidential and/or copyright.  If you are
not the intended recipient please do not read, save, forward, disclose, or
copy the contents of this email. If this email has been sent to you in
error, please notify the sender by reply email and delete this email and any
copies or links to this email completely and immediately from your system. 
No  representation is made that this email is free of viruses.  Virus
scanning is recommended and is the responsibility of the recipient.

 
 ( Think green: read on the screen.

ehsan sabaghian

unread,
Nov 4, 2009, 6:29:36 AM11/4/09
to meds...@googlegroups.com
Hi
You can see wikipedia. It explains ANCOVA as following:

Analysis of covariance (ANCOVA) is a general linear model with one continuous outcome variable (quantitative) and one or more factor variables (qualitative). ANCOVA is a merger of ANOVA and regression for continuous variables. ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative predictors (covariates) account

The link is http://en.wikipedia.org/wiki/Analysis_of_covariance

Ehsan Sabaghian

Frank Harrell

unread,
Nov 4, 2009, 9:40:14 AM11/4/09
to MedStats
Bruce - I just want to note that the ad hoc remedies to non-
parallelism result in an analysis that is worse in many ways than an
ANCOVA that falsely assumed parallelism.

Frank
> bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/Home

BXC (Bendix Carstensen)

unread,
Nov 4, 2009, 2:38:02 PM11/4/09
to meds...@googlegroups.com
I agree that the outlook for the final burial of ANOVA and ANCOVA as terms are pretty bleak.

My main point was merely that these terms often are used instead of a clearly specification of the response, and in particular of the explanatory variables.

Best regards,
Bendix

Doug Altman

unread,
Nov 4, 2009, 2:52:49 PM11/4/09
to meds...@googlegroups.com
This is a problem with many labels. Think of "intention to treat analysis" or "double blind" for randomised trials, "prospective" or retrospective" for observational studies, "normal range", etc.

ANOVA or ANCOVA would always be insufficient as a description of the statistical analysis, but so would multiple regression. The principle should be the following:

“Describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results.”       [International Committee of Medical Journal Editors: www.icmje.org]

Doug

_____________________________________________________

Doug Altman
Professor of Statistics in Medicine
Centre for Statistics in Medicine
University of Oxford
Wolfson College Annexe
Linton Road
Oxford OX2 6UD

email:  doug....@csm.ox.ac.uk
Tel:    01865 284400 (direct line 01865 284401)
Fax:    01865 284424
www:     http://www.csm-oxford.org.uk/

EQUATOR Network - resources for reporting research
www: http://www.equator-network.org/



pjiman1

unread,
Jan 11, 2020, 6:49:14 PM1/11/20
to MedStats
hello all. first time poster. I came across this thread below and I would like to extend the original post. 

My comment is that in my field, psychology, there is a question about when to use ANOVA vs. ANCOVA. Ostensibly, ANOVA tests the differences in means among groups. ANCOVA includes a covariate in this analysis for statistical control. Because when we gather data, we often gather demographic variables along with our predictor and outcome variables. If the hypothesis is to test the difference among means among three groups, and the selected analysis is ANOVA, the question is raised, why not use an ANCOVA and include the covariates for statistical control? If that is the case, then that would make every analysis where the intent is to compare group means, an ANCOVA because why wouldn't you want to include covariates for statistical control?  In other words, assuming a sufficient sample size, why not conduct an ANCOVA every time instead of an ANOVA if you have covariates, especially demographics, where you can enter the covariates for statistical control? That would mean we would conduct ANCOVA’s every chance we get, i.e., whenever we have covariates. I would rather have the decision to use ANOVA vs. ANCOVA be based on conceptual and/or statistical grounds, but I can't seem to find such a justification for using ANOVA vs. ANCOVA. 


AFter reading this thread below, it seems that the terms ANOVA and ANCOVA are (a) relics from a past when hand computations were necessary, (b) ANOVA and ANCOVA are as I have come to realize, extensions of the OLS, similar to multiple regression, and (c) it is more important to specify the model rather than referring to the analysis as an ANOVA or ANCOVA. 


To answer my own question then, use ANCOVA rather than perseverate about the difference between ANOVA and ANCOVA, and focus attention on specifying the model. 


Thanks for your comments, and any articles or website references are appreciated. 

Peter

Allan Reese

unread,
Jan 12, 2020, 6:20:53 AM1/12/20
to meds...@googlegroups.com
Peter 1 has spotted a basic truth, that many *users* of statistics are stuck in the past and have never escaped the blinkers of the first applied statistics module they endured while studying their real interest: statistics learned as a cookbook even if that was directly not the teacher's intention. But ANOVA etc are not "extensions" of OLS, just special cases, while OLS is itself a special case of GLM (as shown by Nelder & Wedderspoon 1972; for a general reference see "Generalized Linear Models" McCullagh & Nelder; Chapman & Hall), which in turn are a subset of statistical models. Hence, indeed, the distinction arose on the basis of naming hand, or at least paper-based, calculations.

The problem of jargon creating silos of knowledge is exemplified by Peter 1 using the term "perseverate", which is in the dictionary as a psychological term that I've never met before. I have, however, often met the concept when scientists asked me to perform a statistical test I'd never heard of (leading them to assume me a charletan); research then found many were back-of-the-envelope eponymous tests published in some specific journal in the 1940s.

Allan

On 11/01/2020 23:49, pjiman1 (Peter 1) wrote:
AFter reading this thread below, it seems that the terms ANOVA and ANCOVA are (a) relics from a past when hand computations were necessary, (b) ANOVA and ANCOVA are as I have come to realize, extensions of the OLS, similar to multiple regression, and (c) it is more important to specify the model rather than referring to the analysis as an ANOVA or ANCOVA. 

To answer my own question then, use ANCOVA rather than perseverate about the difference between ANOVA and ANCOVA, and focus attention on specifying the model. 


On Monday, November 2, 2009 at 3:44:04 PM UTC-6, plf515 (Peter Flom) wrote:
...

More confusion caused by the diverse origins of statistics.

ANCOVA, ANOVA, OLS regression are all the same model in matrix terms

Y = XB + e

the various X can be interval, or categorical or dichotomous, or whatever.  

The ANOVA/ANCOVA stuff comes from agriculture, where they compared plots of land.
The regression stuff comes from astronomy and geography, where they measured distances

--
R Allan Reese, Dorchester Dorset

Peter Flom

unread,
Jan 12, 2020, 7:30:58 AM1/12/20
to meds...@googlegroups.com

ANOVA, ANCOVA and linear regression are all the same model. They are all

 

Y = b_0 + b_1x_1 + b_2x_2 + ….. b_px_p + e

 

It took a while for people to realize that all these models were the same because they developed in different fields and the people in those fields didn’t talk to each other.

 

In ANOVA the x variables are al categorical.  In ANCOVA, some are categorical and some continuous. But the linear model can hand x variables of any sort (although ordinal ones are tricky).

 

Also, the traditional output (from either hand calculations or computer programs) of ANOVA and regression look different but mean the same thing.

 

These days, in SAS, for example, PROC GLM is used a lot more than PROC ANOVA.  And, in R, the lm function handles all three types of analyses.

 

Peter

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/e34e3e07-214e-4e20-a3b2-a49d48eeb63d%40googlegroups.com.

Bruce Weaver

unread,
Jan 13, 2020, 10:30:52 AM1/13/20
to MedStats
Another book that may be more familiar to folks in psychology is Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences.  The first edition was co-authored by Jacob & Patricia Cohen.  Stephen G. West  & Leona S. Aiken are co-authors on the most recent edition, which is the 3rd I think. 
HTH.

SR Millis

unread,
Jan 13, 2020, 11:07:25 AM1/13/20
to meds...@googlegroups.com
For a unified and practical approach:

Generalized Linear Models & Extension (4th ed) by James Hardin & Jos. Hilbe



~~~~~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStat
Wayne State University
Detroit
斯科特·米利斯



--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.

Martin Holt

unread,
Jan 14, 2020, 12:56:29 PM1/14/20
to MedStats
Firstly.....

..."On Monday, November 2, 2009 at 3:44:04 PM UTC-6, plf515 wrote:"

Is this a typo....."2009" or 2019

Secondly....in my past experience Psychologists make very good Medical Statisticians so don't be put off with this being your first posting......

Kind Regards

Martin

Freelance Medical Statistician

If you can't explain it simply, you don't understand it well enough.....Einstein


Concise

Encyclopedia

of Biostatistics for

Medical Professionals 


Martin P. Holt

https://www.crcpress.com/Concise-Encyclopedia-of-Biostatistics-for-Medical-Professionals/Indrayan-Holt/9781482243871




--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages