Question about degrees of freedom calculation in CFA

835 views

Skip to first unread message

Todd Fernandez

unread,

Nov 1, 2016, 11:09:54 AM11/1/16

to lavaan

I am having a very strange experience with the Lavaan (v0.5-20 R v3.1) degrees of freedom calculation. In short, the calculation of df inside of the cfa function and measurement seems to be happening independent of sample size.

In more detail, I have a data set (n=1595) that I am fitting to a CFA model (see code below). Fitting the full dataset returns a df of 729. Fitting that model by group using the cfa function and ‘group’ argument doubles the df, as I expect, to 1458 and using the ‘measurementInvariance’ command fits a configural model with 1458 df which progresses to a df of 1578 as I would expect with the more constrained strict invariance model. In sum, when the model is fitted to the full dataset, everything operates as I would expect.

However, my research partner and I decided to test the measurement model with equal sample sizes to compare among two groups (n_male=1211 and n_female=384). We randomly sampled from male population 384 times. We then reran the exact same model using the new data set. Strangely (to me) the df numbers at every step (i.e., simple cfa, cfa by group, and measurementInvariance) exactly match those that were calculated for the full data set. The test statistic changes, as one would expect, but the df values remain at 729 for the basic model no matter what sample size I throw at it.

I have done a bunch of trouble shooting including splitting the data set outside of R just as a sanity check with no change in result. I feel like there must be something about the df calculation that I do not understand, but I also am confused by how changing the sample size cannot change the degrees of freedom for the same model/number of free parameters. A simplified (but notably rough) version of the code is attached which also contains trimmed results from running it. I also included a very parsimonious version of the relevant results below

What I am missing?

Todd

===

Full Sample
> summary(DATA_gender_full$Gender)
Male Female
1211    384
> summary(cfa_gender_full)
Estimator                                         ML
Minimum Function Test Statistic             3446.453
Degrees of freedom                               729
> summary(cfa_gender_full_groups)
Estimator                                         ML
Minimum Function Test Statistic             4383.211
Degrees of freedom                              1458
Chi-square for each group:
Male                                        2689.297
Female                                      1693.914
Reduced Sample
> summary(DATA_gender_reduced$Gender)
Male Female
   384    384
> summary(cfa_gender_reduced)
Estimator                                         ML
Minimum Function Test Statistic             2181.862
Degrees of freedom                               729
> summary(cfa_gender_reduced_groups)
Estimator                                         ML
Minimum Function Test Statistic             3126.600
Degrees of freedom                              1458
Chi-square for each group:
Male                                        1432.686
Female                                      1693.914

df question post code.R

Jeremy Miles

unread,

Nov 1, 2016, 11:39:08 AM11/1/16

to lavaan

You're not doing anything wrong. Well, you're calculating df wrongly. In CFA (and SEM generally) df depends on the number of variables and number of parameters, not the sample size.

Specifically, df = nMoments - nParameters

Where nMoments is the number of moments going into your model. This is typically given by k(k+1)/2, where k is the number of variables. If I've counted correctly, k = 40, you have 40 variables, so 820 moments (40 variances, and 780 covariances). (You also have 40 means, but they cancel out so we can ignore them).

You then have one variance and one loading per item, so that's 80 parameters.

Four correlated errors.

Six structural loadings.

I think the variance of hof1 is also free (I don't tend to use the cfa() function, so I'm not sure what it's defaults are.

That makes 91 parameters. 820 - 91 = 729

Hooray! Our df calculation seems to be correct.

When I use functions like measurementInvariance (or even when I don't) I like to check that the df match what I think they should be when I count them 'by hand' to make sure the model is what I think it is. (I do the same thing when I review papers - it's amazing how often the calculation doesn't work out, and the authors come back and say something like "Ah, yes, there were some extra parameters we didn't mention).

Hope that helps,

Jeremy

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages