ANCOVA via ez or equivalent?

David Braithwaite

unread,

Jul 15, 2012, 11:28:53 AM7/15/12

to ez...@googlegroups.com

Here's my situation: I have been using ez for my model and want to add a covariate. Ideally I could do this in ez, but I don't think it's possible, right? So, I want to find some other way to specify the model that will (1) allow me to add the covariate, but (2) give me the same results I've been getting with ez when I run it *without* the covariate. In other words, I want an ez-equivalent that can do ANCOVA.

The problem is made more difficult because (1) I want to use type 3 SS (to be consistent with analyses from similarly-formatted data on a previous study, in which I used SPSS) and (2) I have a repeated measures factor (as well as a between subjects factor).

I know car Anova allows type 3 SS but it doesn't seem to handle repeated measures well, or at least requires some weird reformatting of the data that I can't figure out before it can handle repeated measures. Is there another easier way? The whole reason I used ez was to avoid having to go through that kind of ***.

As a last resort I could just do this using SPSS but I'm trying to switch over to R so I'd like to do it in R if possible.

Here's the model spec in case it matters:

ezANOVA( data=D, dv=.(transfer), wid=.(id), between=.(traintype), within=.(questype), type=3 )

The covariate I want to add is pretest score.

Mike Lawrence

unread,

Jul 15, 2012, 11:51:08 AM7/15/12

to ez...@googlegroups.com

I'm confused as to why you want to use the baseline test score as a
covariate instead of simply adding a variable that codes for
baseline-vs-not as a within-Ss predictor. I fear you may be seeking to
use the baseline scores as a covariate because you have found that
your groups and/or conditions differ at baseline already, in which
case ANCOVA would be highly inappropriate as it has as one of its
assumptions that the covariate does not correlate or interact with any
of the other predictor variables.

As to why ANCOVA is not a feature of ezANOVA, the short answer is that
I didn't think of it when I first developed ezANOVA, and haven't had
the time/inclination to add it given that I've completely moved to
mixed effects modelling in my own practice and never use anova
anymore. The ezMixed function does permit specification of covariates.

Mike

David Braithwaite

unread,

Jul 15, 2012, 5:01:26 PM7/15/12

to ez...@googlegroups.com

Thank you for the quick reply!

I think I figured out my original issue - I was able to replicate my ezANOVA results using lm, and from there I could easily just add the covariate - but I'm concerned from your reply that I might be doing the wrong analysis entirely. Let me explain my situation in a bit more detail and hopefully someone can tell me whether I'm doing something wrong and if so what I should do instead. I know this isn't really an ez question any more, but hopefully it's OK because it's continuing the thread topic?

I did a study with pretest-training-posttest design. My DV is transfer, defined as posttest minus pretest score. My main IVs are training condition (ws) and question type (bs). There are 3 training conditions and 5 types of questions. Transfer scores for one question type can take on values in {-1,-.5,0,.5,1}.

The covariate I want to add is pretest score. I know pretest score has an effect on transfer because people with initially lower scores tend to show greater improvement. (It's not a logical necessity but it happens to be true in my data.) There are no differences in pretest score by training condition, but there are differences by question type.

That's actually not my real reason for wanting to add the covariate though. The more important reason is that in a subsequent analysis I want to add a new b-s factor called "solution quality" to the model. (There are actually two factors like this, but that's not important at the moment.) "Solution quality" is a binary-valued factor that says how well study participants were able to state a general method of solving problems like the ones they saw during training. This question was asked after training but before posttest. The actual factor value is hand-coded for each participant based on their open answers.

Now, the question I'd like to answer is "does one of my training conditions lead to better solution quality, which in turn leads to better transfer?" I know that the answer to the first half of that question (do the differentially training conditions differentially promote solution quality) is yes, just from chi-squared tests. As for the second half (does solution quality in turn lead to higher transfer), this is where I'd like to add the "solution quality" factor to my model, but unfortunately it is confounded with pretest performance (people with higher solution quality also did better on pretest), so if I got an effect of solution quality I wouldn't know whether it was really an effect of solution quality per se or just an effect of differences on pretest score. So I want to "factor out" the effects of pretest score.

My first introduction to ANCOVA told me that it was to be used precisely for situations like this, but your reply suggests that is precisely wrong. Am I understanding correctly? If so, any advice as to what analysis I should use instead? Thanks very much in advance!

David Braithwaite

unread,

Jul 15, 2012, 5:02:17 PM7/15/12

to ez...@googlegroups.com

Apologies, I meant condition is bs and question type is ws.

Daniel Zingaro

unread,

Jul 15, 2012, 5:55:49 PM7/15/12

to ez...@googlegroups.com

Hi,

I think a simple analysis that can answer your question is mediation
analysis, like in this paper:
Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating
mediation in treatment evaluations. Evaluation Review, 5, 602-619

From what I can tell, you want to know if solution quality is a
mediator between training condition and transfer. That is, you want to
know if the effect of training condition on transfer is mediated by
solution quality, such that training condition explains solution quality
which in turn explains transfer.

Thanks,
Dan

Mike Lawrence

unread,

Jul 15, 2012, 6:10:18 PM7/15/12

to ez...@googlegroups.com

On Sun, Jul 15, 2012 at 6:01 PM, David Braithwaite <baix...@gmail.com> wrote:
> ...

>
> So I want to "factor out" the effects of pretest score.
>
> My first introduction to ANCOVA told me that it was to be used precisely for
> situations like this, but your reply suggests that is precisely wrong.

Indeed, it is a lamentably common myth about ANCOVA that it can
somehow "control for" confounds in the data. It can't. ANCOVA was
designed to solely to increase statistical power in scenarios where
you can demonstrate that your covariate does *not* correlate or
interact with your other predictor variables (even then, its
application is sketchy due to the perverse logic of applying
null-hypothesis testing procedures to the validation of no
correlation/interaction).

I haven't encountered such confounded data sufficiently often enough
in my own research to have bothered looking into alternative analysis
paradigms that might handle such scenarios. Daniel mentions mediation
analysis, which I believe is a subset of path analysis (which, in
turn, is a subset of structural equation modelling), but upon googling
to verify this belief I find the following warning from Andrew Gelman:
http://andrewgelman.com/2010/03/criticizing_sta/

Mike

Thom

unread,

Jul 16, 2012, 7:42:42 AM7/16/12

to ez...@googlegroups.com

On Sunday, July 15, 2012 11:10:18 PM UTC+1, Mike Lawrence wrote:

On Sun, Jul 15, 2012 at 6:01 PM, David Braithwaite wrote:
> ...
>
> So I want to "factor out" the effects of pretest score.
>
> My first introduction to ANCOVA told me that it was to be used precisely for
> situations like this, but your reply suggests that is precisely wrong.

Indeed, it is a lamentably common myth about ANCOVA that it can
somehow "control for" confounds in the data. It can't. ANCOVA was
designed to solely to increase statistical power in scenarios where
you can demonstrate that your covariate does *not* correlate or
interact with your other predictor variables (even then, its
application is sketchy due to the perverse logic of applying
null-hypothesis testing procedures to the validation of no
correlation/interaction).

This is a huge literature on this. In a randomized design then using the ANCOVA

with pretest as a covariate will help correct for (usually small) differences on pretest

(and produce better estimates) as well as increase power.

Of course if you have big differences on the pretest then that might suggest

the randomization is flawed in some way or there is some other confound.

In non-randomized studies there is a huge debate about use of ANCOVA as a

"statistical control", but the recent literature is much more open to the idea that

using pretest as covariate is a good option - and often the best available option for a

dealing with confounding. The resulting estimates are probably still biased - but

may well be less biased than other options.

Mike Lawrence

unread,

Jul 16, 2012, 8:08:01 AM7/16/12

to ez...@googlegroups.com

Here's code for a simulation that demonstrates the danger of violating
the assumptions of ANCOVA (at least, when used in conjunction with
Null Hypothesis Significance Testing):

https://gist.github.com/3122329

and attached is a plot of the results I get after running overnight.

ancova.pdf

David Braithwaite

unread,

Jul 16, 2012, 9:46:31 AM7/16/12

to ez...@googlegroups.com

Thank you for all the replies!

Let's see, mediation analysis: it sounds sort of like what I want, but I'm not sure it's exactly appropriate, for two reasons. First, my manipulation of training condition doesn't actually have an effect on my main dv (transfer). My impression is that to do mediation analysis (does solution quality mediate effects of training on transfer) there must first be an effect to be mediated (an effect of training on transfer, in this case), right? My situation is more that I failed to get that effect, but am hoping for the next best thing, which would be that training affects something (solution quality) which is known to affect transfer, even though it showed no direct effect on transfer. Would mediation analysis work for this?

Secondly, I'm not sure that mediation analysis (alone) would resolve the confound issue I raised (i.e. solution quality is confounded with pretest score, which also affects transfer, so apparent effects of solution quality on transfer might actually be effects of pretest score) - would it?

I guess that maybe I'm worrying about something I don't need to worry about because, in my case, the supposed confound would actually act against the effect I'm trying to detect. That is - high pretest score correlates with low transfer; high pretest score correlates with high solution quality; but high solution quality correlates with HIGH transfer, not low transfer. So, can I just say "obviously the confound with pretest score doesn't matter because it would only weaken the observed effect", forget about including pretest score in my analyses, and leave it at that?

If the answer is yes then maybe this next question doesn't matter, but I thought I'd raise it just for completeness. I don't actually have a significant confound of pretest score with my experimental variable (training). These aren't significantly correlated. The significant correlation is with solution quality, which wasn't randomized to begin with. Does this point affect the appropriateness (or not) of adding pretest score as a covariate in order to get at the independent effect of my factor of interest, i.e. solution quality?

Mike Lawrence

unread,

Jul 16, 2012, 10:08:57 AM7/16/12

to ez...@googlegroups.com

You might consider transitioning your queries over to
stats.stackoverflow.com, which I find is an excellent community for
these more statistical-theory/best-practices related issues.

Mike Lawrence

unread,

Jul 16, 2012, 10:34:57 AM7/16/12

to ez...@googlegroups.com

oops, that url should be:

stats.stackexchange.com

David Braithwaite

unread,

Jul 16, 2012, 10:56:32 AM7/16/12

to ez...@googlegroups.com

Thank you! I've just done so. Thanks again for the assistance!

Henrik Singmann

unread,

Jul 17, 2012, 10:07:42 AM7/17/12

to ez...@googlegroups.com

Hi Thom and list,

if have read similar claims (i.e., to use pretest values as covariates) but wouldn't know what to cite for this. Can you perhaps post some key references of this "huge literature". That would be totally awesome.

@Mike: On an unrelated note, is there a chance that a version of ez will be relaesed to CRAN that actually allows for the computation of type 3 sums of squares? My current setup is to load ez, load reshape and the source the files you send over the list ages ago in which it is working. This is a little bit too tricky... (I am giving a R course next month in which I would like to present the easiest way of obtain classical ANOVA results).

Best,
Henrik

2012/7/16 Thom <thomas....@ntu.ac.uk>

Thom

unread,

Jul 17, 2012, 5:58:43 PM7/17/12

to ez...@googlegroups.com

Here is a cross-section of refs (not comprehensive) based on things I mention in my book. There are really several issues ranging from the classic issue of how to deal with change scores (Lord's paradox) to more philosophical issues (e.g., contrast Miller and Chapman vs. Gelman & Hill) and practical issues such as including interactions for measured covariates (Yzerbyt et al).

Thom

Cook, R. J., & Sackett, D. L. (1995). The number needed to treat: a clinically useful measure of treatment effect. British Medical Journal, 310, 452-454.

Cook, T. D., W. J. Shadish, and V. C. Wong. 2008. Three conditions under which observational studies produce the same results as experiments. Journal of Policy Analysis and Management, 274, 724–50.

Cousens, S., Hargreaves, J., Bonelli, C., Armstrong, B. Thomas, J., Kirkwood, B. R., & Hayes, R. (2011). Alternatives to randomisation in the evaluation of public-health interventions: statistical analysis and causal inference, Journal of Epidemiology & Community Health. doi:10.1136/jech.2008.082610

Dinh, P., & Yang, P. (2011). Handling baselines in repeated measures analyses with missing data at random. Journal of Biopharmaceutical Statistics, 21, 326-341.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

Jamieson, J., (2004). Analysis of covariance (ANCOVA) with difference scores. International Journal of Psychophysiology, 52, 277-283.

Lord, F M. (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin, 68, 304-305.

Maris, E. (1998). Covariance adjustment versus gain scores - Revisited. Psychological Methods, 3, 309-327.

Miller, G. M., & Chapman, J. P. (2001). Misunderstanding analysis of covariance. Journal of Abnormal Psychology, 110, 40-48.

Senn, S. J. (2006). Change from baseline and analysis of covariance revisited, Statistics in Medicine, 25, 4334-434.

Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 1334-1343.

Van Breukelen, G. J. P. (2006). ANCOVA versus change from baseline had more power in randomized studies and more bias in nonrandomized studies. Journal of Clinical Epidemiology, 59, 920-25.

Wainer, H. (1991). Adjusting for differential base-rates: Lord’s Paradox again. Psychological Bulletin, 109, 147-151.

Wainer, H., & Brown, L. M. (2004). Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data. American Statistician, 58, 117-123.

Wright, D. B. (2006). Comparing groups in a before-after design: When t-test and ANCOVA produce different results. British Journal of Educational Psychology, 76, 663-675.

Wright, D. B., & London, K. (2009). Modern regression techniques: Examples for psychologists. London: Sage.

Yzerbyt, V. C., Muller, D., & Judd, C. M. (2004). Adjusting researchers' approach to adjustment: On the use of covariates when testing interactions. Journal of Experimental Social Psychology, 40, 424-431.

Henrik Singmann

unread,

Jul 17, 2012, 7:22:24 PM7/17/12

to ez...@googlegroups.com

Thanks a lot. I immediately ordered the book by Wright and London (which is actually called: "Modern Regression Techniques Using R: A Practical Guide for Students and Researchers") and now gonna read the Miller and Chapman paper. Someone disagreement with Andrew Gelman can be interesting.

Cheers,
Henrik

2012/7/17 Thom <thomas....@ntu.ac.uk>

Thom

unread,

Jul 18, 2012, 7:13:35 AM7/18/12

to ez...@googlegroups.com

Most of the discussion of this is on Gelman's blog. I don't think Gelman was aware of the paper before then. However, the discussion of how to deal with confounding in the Gelman-Hill book contradicts the usual take-home message of Miller and Chapman.

Thom

unread,

Jul 18, 2012, 11:38:09 AM7/18/12

to ez...@googlegroups.com

You might also look at my book - which has an overview of these issues.

http://seriousstats.wordpress.com/

Thom

On Wednesday, July 18, 2012 12:13:35 PM UTC+1, Thom wrote:

Most of the discussion of this is on Gelman's blog. I don't think Gelman was aware of the paper before then. However, the discussion of how to deal with confounding in the Gelman-Hill book contradicts the usual take-home message of Miller and Chapman.

Thom

On Wednesday, July 18, 2012 12:22:24 AM UTC+1, Henrik wrote:

Thanks a lot. I immediately ordered the book by Wright and London (which is actually called: "Modern Regression Techniques Using R: A Practical Guide for Students and Researchers") and now gonna read the Miller and Chapman paper. Someone disagreement with Andrew Gelman can be interesting.

Cheers,
Henrik

2012/7/17 Thom

Here is a cross-section of refs (not comprehensive) based on things I mention in my book. There are really several issues ranging from the classic issue of how to deal with change scores (Lord's paradox) to more philosophical issues (e.g., contrast Miller and Chapman vs. Gelman & Hill) and practical issues such as including interactions for measured covariates (Yzerbyt et al).

Henrik Singmann

unread,

Jul 19, 2012, 5:00:21 PM7/19/12

to ez...@googlegroups.com

You are probably right, so I finally ordered it.
Actually I was trying to do something similar as the OP namely trying to run a repeated measures ANCOVA in R (using car::Anova, see also here: http://stackoverflow.com/q/11567446/289572) and did not succeed. But according to your blog, your book seems to cover it. I hope it can solve this issue.

Or do you have any quick ideas which R function can handle both within-subject factors, and a covariate (I think SPSS glm can do so) and at best Type 3 sums of squares? (and I want a standard ANOVA, so no lme4)

Cheers,
Henrik

2012/7/18 Thom <thomas....@ntu.ac.uk>

David Braithwaite

unread,

Jul 19, 2012, 5:06:46 PM7/19/12

to ez...@googlegroups.com

The following worked for me - I can't vouch for its correctness but maybe someone else here can (or not).

First I put my data in "wide form", so that each level of my within-subjects factor (here questype) had a separate column of data. These are referred to as "D$transfer_XXX" below. Then:

mdata   = cbind( D$transfer_PCO, D$transfer_OSS, D$transfer_OAPlc, D$transfer_CAE, D$transfer_OAPpl )
fit1    = lm( mdata ~ other*factors + covariate, data=D )
fit     = Anova( fit1, idata=data.frame(questype=levels(long-form-data$questype)), idesign=~questype, type=3 )
print( fit )

Henrik Singmann

unread,

Jul 19, 2012, 7:25:42 PM7/19/12

to ez...@googlegroups.com

The problem with this approach (at least when I tried it) is that you get the interactions of the covariate with the within-subject variables (i.e., questtype:covariate) in your case. This is not wht I want from my covariate rather merely have it as a main effect (i.e., as in your fit1 formula).

Or is it different in your results?

Henrik

2012/7/19 David Braithwaite <baix...@gmail.com>

David Braithwaite

unread,

Jul 19, 2012, 7:27:47 PM7/19/12

to ez...@googlegroups.com

Ah, I see what you mean - yeah, I do get that interaction term (which I also don't want). I just had not noticed it before. I don't immediately know a way around that.

Thom

unread,

Jul 20, 2012, 9:34:35 AM7/20/12

to ez...@googlegroups.com

I do a short bit on repeated measures ANCOVA. There are a few technical issues that make it a complex topic. I'd probably want to run it as a multilevel model in all but the simplest cases.

Thom

unread,

Jul 20, 2012, 12:10:04 PM7/20/12

to ez...@googlegroups.com

An important point in relation to repeated measures designs and ANCOVA is that you need to center the covariate. Some software (e.g., SPSS) will compute the RM ANCOVA using difference coding (generally those with MANOVA style output) and this messes up the covariate adjustment if the covariate is uncentred. Alternatively, use a multilevel model and this allows you to have time-varying covariates and more flexibility over coding the effects than most MANOVA/RM ANOVA software.

As for interactions - often you do want these - and in some cases the interactions are the only reason to run the RM ANCOVA (because in a pure RM design the covariate is just competing to explain variance with the subjects term).

Thom

Reply all

Reply to author

Forward