Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Regression: Degrees of freedom for F-change

166 views
Skip to first unread message

Bruce Weaver

unread,
Feb 22, 2000, 3:00:00 AM2/22/00
to
Hello group,
I am a bit puzzled by something I've noticed about the output
that is generated when you include /STATISTICS CHANGE as a REGRESSION
subcommand. As I understand it, the F-change statistic is a test on the
change in R**2 from one model to the next. Degrees of freedom for the
numerator and denominator are reported too (df1, df2), as well as a
p-value.
To put things in context, I did a repeated measures ANOVA (with 3
levels of the DV), and then wanted to produce the same F-test by
comparing 2 regression models. In order to do this, I used n-1 dummy codes
for subjects and k-1=2 dummy codes for treatments. I started by ENTERING
all of these dummy codes on step 1; then I removed the 2 codes for
treatments on step 2. Now here's the bit that puzzles me: The F-value,
p-value, and df1 were exactly as they were for the repeated measures ANOVA;
but df2 was too large. It was too large by 2 times the number of
variables I removed, in fact.
I then added a 3rd step in which I ENTERED the 2 codes for
treatments again, and now everything INCLUDING df2 was exactly as it was
for the ANOVA.
The same thing happens when I try a between-subjects problem, by
the way. I cannot for the life of me figure out why df2 should be for
F-change should be different in these two situations. Am I missing
something obvious? Or is there a problem with what SPSS is displaying?

Thanks,
Bruce

p.s. - If my description of the problem is not clear enough, see below for
syntax that produces the output I'm talking about.

--
Bruce Weaver
wea...@fhs.csu.mcmaster.ca
http://www.angelfire.com/wv/bwhomedir/


* Do repeated measures ANOVA, then regression with dummy codes .

DATA LIST LIST /id(f2.0) a1(F5.0) a2(F5.0) a3(f5.0) .
BEGIN DATA.
1 7 3 2
2 4 8 3
3 7 6 3
4 8 6 1
5 7 2 3
6 6 3 3
7 4 2 0
8 6 7 5
END DATA.


SAVE OUTFILE='C:\MyDocs\SPSS\repmeas1.sav'
/compressed.

GLM
a1 a2 a3
/WSFACTOR = a 3 Polynomial
/METHOD = SSTYPE(3)
/EMMEANS = TABLES(a)
/PRINT = DESCRIPTIVE
/CRITERIA = ALPHA(.05)
/WSDESIGN = a .

* Thanks to Raynald for the next bit of code .
******************************.
* From one case to many cases .
******************************.
VECTOR va=a1 TO a3 .
LOOP cnt=1 TO 3.
COMPUTE a=va(cnt).
XSAVE OUTFILE='c:\MyDocs\SPSS\repmeas2.sav' /KEEP id cnt a .
END LOOP.
EXECUTE.

GET
FILE='c:\MyDocs\SPSS\repmeas2.sav'.
**************************.

formats cnt a (f5.0).

* Get dummy codes for regression .
compute a1 = (cnt=1).
compute a2 = (cnt=2).
compute a3 = (cnt=3).
compute s1 = (id=1).
compute s2 = (id=2).
compute s3 = (id=3).
compute s4 = (id=4).
compute s5 = (id=5).
compute s6 = (id=6).
compute s7 = (id=7).
compute s8 = (id=8).
exe.

formats a1 to s8 (f2.0).

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT a
/METHOD=ENTER s2 to s8 a1 a2
/remove a1 a2
/enter a1 a2 .

* NOTE: It appears that in order to get the correct degrees of freedom for
the F-change, you have to ENTER the variables of interest;
if you REMOVE them, df2 is too large by 2 times the number of
variables you REMOVE; BUT, despite this the p-value does not
depend on whether you ENTER or REMOVE; This is a bit puzzling ;
I will see if this happens with BETWEEN-Ss designs next .


**** Now do Between-Ss ANOVA using variable CNT as group code .
UNIANOVA
a BY cnt
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = cnt .
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI R ANOVA CHANGE
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT a
/METHOD=ENTER a1 a2
/remove a1 a2
/enter a1 a2 .


* NOTE: The same thing happens with a completely between-subjects
design: df2 for the F-change is greater than it should be
by 2 x the number of variables being removed when you REMOVE
variables; but the p-value does not appear to be affected;
This is indeed curious! .

Rich Ulrich

unread,
Feb 23, 2000, 3:00:00 AM2/23/00
to
On Tue, 22 Feb 2000 20:28:35 -0500, Bruce Weaver
<wea...@fhs.csu.McMaster.CA> wrote:
< snip >

> To put things in context, I did a repeated measures ANOVA (with 3
> levels of the DV), and then wanted to produce the same F-test by
> comparing 2 regression models. In order to do this, I used n-1 dummy codes
> for subjects and k-1=2 dummy codes for treatments. I started by ENTERING
> all of these dummy codes on step 1; then I removed the 2 codes for
> treatments on step 2. Now here's the bit that puzzles me: The F-value,
> p-value, and df1 were exactly as they were for the repeated measures ANOVA;
> but df2 was too large. It was too large by 2 times the number of
> variables I removed, in fact.
> I then added a 3rd step in which I ENTERED the 2 codes for
> treatments again, and now everything INCLUDING df2 was exactly as it was
> for the ANOVA.
> The same thing happens when I try a between-subjects problem, by
> the way. I cannot for the life of me figure out why df2 should be for
> F-change should be different in these two situations. Am I missing
> something obvious? Or is there a problem with what SPSS is displaying?

Clearly, there is only one test for the set of variables -- the
residual DF that goes along with the fuller version of the model is
always the *correct* denominator for F-change.

I see that there is a potential for confusion -- if you "step-down",
the DF -after-the-step is not the one to use for testing. However,
it seems like SPSS uses the correct term, since (you say) the p-value
is right, even with your small DF.

There is the question of, exactly, What do the SPSS labels say? Does
it claim that the wrong denominator is used for this test? Or, does
it do as well as one can expect? ( - not a rhetorical question.)


> p.s. - If my description of the problem is not clear enough, see below for
> syntax that produces the output I'm talking about.

- Bruce, my NewsReader cut off all the advertised SPSS syntax,
because you used the proper marking for indicating that a set of sig.
lines are *supposed* to be cut off. If you want us to use them, it
can be easier if you don't start any earlier line with "-- " , two
dashes and a blank, for columns 1-3. Like this one should chop off my
sig:

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Bruce Weaver

unread,
Feb 23, 2000, 3:00:00 AM2/23/00
to

On Wed, 23 Feb 2000, Rich Ulrich wrote:

---------------------- >8 -----------------------

> > p.s. - If my description of the problem is not clear enough, see below for
> > syntax that produces the output I'm talking about.
>
> - Bruce, my NewsReader cut off all the advertised SPSS syntax,
> because you used the proper marking for indicating that a set of sig.
> lines are *supposed* to be cut off. If you want us to use them, it
> can be easier if you don't start any earlier line with "-- " , two
> dashes and a blank, for columns 1-3. Like this one should chop off my
> sig:
>

---------------------- >8 -----------------------

Oops! I wasn't thinking about that when I pasted the syntax at the tail
end of my post. In case others had the same experience, here it is again:

* Do repeated measures ANOVA, then regression with dummy codes .

DATA LIST LIST /id(f2.0) a1(F5.0) a2(F5.0) a3(f5.0) .
BEGIN DATA.
1 7 3 2
2 4 8 3
3 7 6 3
4 8 6 1
5 7 2 3
6 6 3 3
7 4 2 0
8 6 7 5
END DATA.


SAVE OUTFILE='C:\MyDocs\SPSS\repmeas1.sav'
/compressed.

GLM
a1 a2 a3
/WSFACTOR = a 3 Polynomial
/METHOD = SSTYPE(3)
/EMMEANS = TABLES(a)
/PRINT = DESCRIPTIVE
/CRITERIA = ALPHA(.05)
/WSDESIGN = a .

**************************.

Rich Ulrich

unread,
Feb 25, 2000, 3:00:00 AM2/25/00
to
On Wed, 23 Feb 2000 20:58:42 -0500, Bruce Weaver
<wea...@fhs.csu.McMaster.CA> wrote:
< snip, detail >

> * NOTE: It appears that in order to get the correct degrees of freedom for
> the F-change, you have to ENTER the variables of interest;
> if you REMOVE them, df2 is too large by 2 times the number of
> variables you REMOVE; BUT, despite this the p-value does not
> depend on whether you ENTER or REMOVE; This is a bit puzzling ;
> I will see if this happens with BETWEEN-Ss designs next .

I ran the task with SPSS 6.1, but I think the listing is essentially
the same. The problem lies in : Jumping to conclusions. Maybe it
would be possible to add a comment to the labels which would prevent
that. As Bruce describes them the *tests* provided are already
correct.

There is one table which tries to give too much information, or which
leads to incorrect assumptions.

In one set of columns, there are [R Square Change], [F Change], and
[Signif F Change]. Those go together; they refer back to the previous
table, too, along with the [R Square] in the prior columns.

In the next set of columns, under "Analysis of Variance", there are
Regression and Residual terms for DF and [Sum of Squares] and [Mean
Squares]. This Residual gives the *result of the overall ANOVA*
after the last step that was just described, adding or taking away
variables.

- Do these describe the "change" columns? - not necessarily.
- If you step-up, then the Residual is correct for the change-test.
- If you step-down, then that Residual is *not* correct ("Please
look at the previous table"). The F-test is performed with ALL the
variables in the equation, and the Residual DF is the reduced number,
since you have to test that *block of variance* and the denominator
has to have that variance *removed* for you to have an F.

There is ambiguity by having that Residual sitting there on the same
page; and the chance of error is heightened by the casual nature of
explanations that people will hear in a lifetime: yep, the Residual
is the term that was used (but not always) to compute the
test-on-Change.

Perhaps the down-step could have a footnote,
"During Step-down, the previous table has the [Df] and [SS], for
the [Residual], used in [F Change] and [Signif F Change] ."

Or something.

0 new messages