Controlling for baseline data in a regression

Ang

unread,

Mar 19, 2011, 1:01:02 PM3/19/11

to

I need to run a regression analysis to see if income predicts change
in fruit and vegetable intake (changeFV) as a result of an
intervention. ChangeFV is calculated by post-intervention fruit &
vegetable intake (postFV) minus pre-intervention fruit & vegetable
intake (preFV), so changeFV = postFV - preFV. I know that changeFV is
my dependent variable, and income is my independent/predictor
variable, but I need to run the analysis while controlling for the
preFV baseline data. How can I do this in SPSS (I am using version 19
for Mac)? Thanks!

Bruce Weaver

unread,

Mar 19, 2011, 3:09:42 PM3/19/11

to

On 19/03/2011 1:01 PM, Ang wrote:
> I need to run a regression analysis to see if income predicts change
> in fruit and vegetable intake (changeFV) as a result of an
> intervention. ChangeFV is calculated by post-intervention fruit&

> vegetable intake (postFV) minus pre-intervention fruit& vegetable

> intake (preFV), so changeFV = postFV - preFV. I know that changeFV is
> my dependent variable, and income is my independent/predictor
> variable, but I need to run the analysis while controlling for the
> preFV baseline data. How can I do this in SPSS (I am using version 19
> for Mac)? Thanks!

As there is an intervention, I assume you also have a control group that
did not receive the intervention. If so, the *standard* approach is a
model with:

Y = post-intervention score
X1 = indicator for intervention group (1=Int, 0=control)
X2 = covariate = baseline score

There may be other covariates too, but that's the basic model.

If instead you use Y = Change Score (post - baseline), you will get
exactly the same t-test on X1 (your group indicator). I wrote some
syntax to demonstrate this after a similar question came up in 2001.
You can see it here.

www.angelfire.com/wv/bwhomedir/spss/change_scores_and_ANCOVA.txt

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Art Kendall

unread,

Mar 19, 2011, 3:23:12 PM3/19/11

to

I would advise two things to run that might be used in different
disciplines to do very close to the same thing.
One uses a regression approach. The other uses a repeated measures
approach. Both take the interaction into account.
one approach is to use REGRESSION using something like this untested
syntax assuming treatment has values 0 = comparison group 1 = treated group.
Of course inference is greatly strengthened in you have 0 mean a control
group (i.e., there was random assignment to treatment vs control.
If you do not have a control group or even a comparison group, just
ignore treatment effects and interactions with treatment.

compute constant =1.
aggregate file= * /mode =addvariables /break = constant)
/mean_pre = mean(pre)
/mean_income = mean(income).
compute interact1 = (pre -pre_mean) * (income -mean_income).
compute interact2 = (pre -pre_mean) * treatment.
compute interact3 = (pre -pre_mean) * (income -mean_income) * treatment.
REGRESSION
variables = post pre income interact1 power1
/statisics = r anova cha outs
/dependent = post
/method = enter pre income treatment
/method = enter interact1 interact2
/ethod = enter interact3.

In addition use the GUI to paste syntax from GLM repeated and run it
with pre and post as the repeated measure and income and treatment as
independent variables.

If you have 4 variables Pre_fruit Post_fruit Pre_vegs Post vegs. Use the
GUI and paste syntax with a doubly repeated measure (pre vs post) *
(fruit vs veg).

Art Kendall
Social Research Consultants

On 3/19/2011 1:01 PM, Ang wrote:
> I need to run a regression analysis to see if income predicts change
> in fruit and vegetable intake (changeFV) as a result of an
> intervention. ChangeFV is calculated by post-intervention fruit&

> vegetable intake (postFV) minus pre-intervention fruit& vegetable

Andy W

unread,

Mar 19, 2011, 11:46:49 PM3/19/11

to

> bwea...@lakeheadu.cahttp://sites.google.com/a/lakeheadu.ca/bweaver/Home

> "When all else fails, RTFM."

Just as a note for the OP, this problem of whether to use the change
scores as the dependent variable or the levels at post test is
sometimes referred to as "Lord's Paradox". Paul Allison has a good
paper on the topic ( http://www.pauldallison.com/downloads/Allison.SM90.pdf
) for observational data.

Bruce Weaver

unread,

Mar 21, 2011, 8:49:20 AM3/21/11

to

> paper on the topic (http://www.pauldallison.com/downloads/Allison.SM90.pdf
> ) for observational data.

But notice that the two models that Allison is contrasting in his
Lord's Paradox section differ in that the one using the change score
as the outcome variable does not include Y1 as an explanatory
variable. I.e., his two models are:

1. Y2 = b0 + b1*Y1 + b2*X
2. (Y2-Y1) = b0 + b1*X

where X = 1 for the treatment group and 0 for the control group. He
calls Model 1 "the regressor variable approach", and Model 2 the
"change score method".

For the data in his Table 1, b2 from Model 1 was positive and had a p-
value around .03. In Model 2, b1 was close to 0 and non-significant.
But if he had run Model 3 below, he would have found that b2 from
Model 3 equals b2 from Model 1. This is what the example on my
webpage illustrates.

3. (Y2-Y1) = b0 + b1*Y1 + b2*X

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca

Andy W

unread,

Mar 21, 2011, 10:10:23 AM3/21/11

to

I don't deny your math is right Bruce, I'm not quite sure why that
matters though. Unless I'm missing something, just because your models
1 and 3 give equivalent answers doesn't make them preferable to model
2. In the paper Allison gives examples where model 2 may be preferable
and where model 1 is obviously innapropriate in different
observational contexts.

Your model 3 can be rewritten as

1. (Y2 - Y1) = b0 + b1*Y1 + b2*X2 + e
2. Y2 = b0 + b1*Y1 + b2*X2 + (e + Y1)

Doesn't this mean that b1*Y1 is correlated with the error term (or
does that not matter since all I am interested in is b2?)

So am I missing something? Although these types of experiments are
different than time series analysis of the economics sort, it is
innapropriate to use differences on one side of the equation and
levels on the other. See http://www.griffith.edu.au/__data/assets/pdf_file/0017/88100/Greenberg-2001.pdf
for an example. Maybe that is not a good example, as the nature of
unemployment is different than "vegtable intake", but still aren't
there issues in saying the levels of vegtable intake affect the change
scores?

Andy W

Bruce Weaver

unread,

Mar 21, 2011, 11:15:49 AM3/21/11

to

Hi Andy. All I was getting at was that the reason Allison found
different group effects in his two models was not *simply* that he
changed from using Y2 to using Y2-Y1 as his outcome variable. The key
factor was dropping Y1 as a covariate when he went to the model with
Y2-Y1. Had he retained Y1 as a covariate, then he would have found
exactly the same group effect.

>
> Your model 3 can be rewritten as
>
> 1. (Y2 - Y1) = b0 + b1*Y1 + b2*X2 + e
> 2. Y2 = b0 + b1*Y1 + b2*X2 + (e + Y1)
>
> Doesn't this mean that b1*Y1 is correlated with the error term (or
> does that not matter since all I am interested in is b2?)

As I recall, this is the argument against using Y2-Y1 as the outcome
when you are including Y1 as a covariate. The group effect (the thing
of main interest) is the same if you just use Y2; and the coefficient
for Y1 is much easier to interpret.

>
> So am I missing something? Although these types of experiments are
> different than time series analysis of the economics sort, it is
> innapropriate to use differences on one side of the equation and

> levels on the other. Seehttp://www.griffith.edu.au/__data/assets/pdf_file/0017/88100/Greenber...

> for an example. Maybe that is not a good example, as the nature of
> unemployment is different than "vegtable intake", but still aren't
> there issues in saying the levels of vegtable intake affect the change
> scores?

Thanks for the reference.

>
> Andy W

--
Bruce Weaver
bwe...@lakeheadu.ca