Degrees of Freedom Calculations

107 views
Skip to first unread message

Kenneth Granillo-Velasquez

unread,
Jun 21, 2023, 8:45:51 PM6/21/23
to lavaan
Hello, all,

I have quick question about the way that the lavaan package calculates the degrees of freedom for the chi-square tests (i.e., "model test user model" and "model test baseline"). Below is a dummy model that I created to try and understand the way that lavaan calculates the degrees of freedom for the tests; it is not theoretical or practical in nature. 
--------------------------
model <- "
#Structural Equations
Commit_Summed_T2 ~ Commit_Summed_T1
Burnout_Summed_T2 ~ Burnout_Summed_T1

# Residual Covariances
Commit_Summed_T2 ~~ Burnout_Summed_T2"


lavaan::sem(model = model,
    data = main_dataset) -> fit
fit %>%
  summary(fit.measures = T,
          standardized = T)

----------------------------

According to this model, there are 2 regression pathways, 2 residual covariances (automatically estimated by the sem function), and 1 residual covariance (i.e., 5 free parameters). According to my calculations, the chi-square test should report 5 degrees of freedom.

[4 (4 + 1)]/2 = 10 (covariances and variances)
10 - 5 [freely estimated parameters; see above] = 5 DF

However, the software reports 2 DF for the chi-square test (see output below), even though it does corroborate the 5 "free parameters". As for the baseline model, I am generally unsure what lavaan considers to be the baseline model; more information would be helpful, since I cannot seem to find any resources explaining this section of the output and "baseline models' can vary between software.

R Output:
lavaan 0.6.15 ended normally after 12 iterations Estimator ML Optimization method NLMINB Number of model parameters 5 Number of observations 262 Model Test User Model: Test statistic 18.600 Degrees of freedom 2 P-value (Chi-square) 0.000 Model Test Baseline Model: Test statistic 518.189 Degrees of freedom 5 P-value 0.000 User Model versus Baseline Model: Comparative Fit Index (CFI) 0.968 Tucker-Lewis Index (TLI) 0.919 Loglikelihood and Information Criteria: Loglikelihood user model (H0) -1339.918 Loglikelihood unrestricted model (H1) -1330.618 Akaike (AIC) 2689.836 Bayesian (BIC) 2707.678 Sample-size adjusted Bayesian (SABIC) 2691.825 Root Mean Square Error of Approximation: RMSEA 0.178 90 Percent confidence interval - lower 0.110 90 Percent confidence interval - upper 0.256 P-value H_0: RMSEA <= 0.050 0.002 P-value H_0: RMSEA >= 0.080 0.989 Standardized Root Mean Square Residual: SRMR 0.076 Parameter Estimates: Standard errors Standard Information Expected Information saturated (h1) model Structured Regressions: Estimate Std.Err z-value P(>|z|) Std.lv Commit_Summed_T2 ~ Commit_Smmd_T1 0.807 0.038 21.145 0.000 0.807 Burnout_Summed_T2 ~ Burnot_Smmd_T1 0.770 0.041 18.835 0.000 0.770 Std.all 0.790 0.754 Covariances: Estimate Std.Err z-value P(>|z|) Std.lv .Commit_Summed_T2 ~~ .Burnot_Smmd_T2 -1.643 0.619 -2.656 0.008 -1.643 Std.all -0.166 Variances: Estimate Std.Err z-value P(>|z|) Std.lv Std.all .Commit_Smmd_T2 6.359 0.556 11.446 0.000 6.359 0.375 .Burnot_Smmd_T2 15.346 1.341 11.446 0.000 15.346 0.431

Jasper Bogaert

unread,
Jun 22, 2023, 3:25:32 AM6/22/23
to lavaan
Hi,

I tried to have a look at your example (in a hurry). If you work with the fixed.x = F option, then you have 8 free parameters (which you can find in the summary output): 
- 4 variances (2 variances of the exogenous variables and 2 residual variances);
- 2 covariances;
- 2 regression pathways.
Additionally, [4 (4 + 1)]/2 = 10 (covariances and variances) and 10 - 8 [freely estimated parameters; see above] = 2 DF

If you work with the fixed.x = T option (the default), then you have 5 free parameters (which you can find in the summary output): 
- 2 residual variances;
- 1 covariance;
- 2 regression pathways.
Additionally, [4 (4 + 1)]/2 - [2 (2 + 1)]/2 = 7 (covariances and variances) and 7 - 5 [freely estimated parameters; see above] = 2 DF

FYI: If TRUE (default option), the exogenous ‘x’ covariates are considered fixed variables and the means, variances and covariances of these variables are fixed to their sample values. If FALSE, they are considered random, and the means, variances and covariances are free parameters.

I hope this helps!

Best wishes,
Jasper


Op donderdag 22 juni 2023 om 02:45:51 UTC+2 schreef kgranillo...@gmail.com:

Kenneth Granillo-Velasquez

unread,
Jun 22, 2023, 8:13:05 AM6/22/23
to lavaan
Thank you so much for the quick response!

I am working with sem 's defaults, so I am assuming that the latter situation presented above is occurring. Quick follow-up: where does the [2(2+1)]/2 portion of the equation come from? I know that the 4(4+1)/2 is estimating that total number of variances and covariances in the matrix, but I am unsure from where the 3, which modifies the total 10, is coming. 

Also, if you or someone else can provide clarification on the baseline model section, I would be very grateful. Thank you again for the help!

Keith Markus

unread,
Jun 22, 2023, 11:47:49 AM6/22/23
to lavaan
Kenneth,
You have two exogenous variables from Time 1.  Those produce 3 observed moments: 2 variances and 1 covariance between them.  These are subtracted from the observed moments because they are treated as fixed and read directly from the data.

The traditional baseline model is an independence model in which variances are free but covariances are fixed to zero.  If you include mean structures, the means are freely estimated like the variances.  (There was some initial controversy about this as some software developers instead decided to treat means like covariances and fix them to zero.  The choice is just a matter of convention but the consensus was that this practice inflates comparative fit indices and that free means offered more conservative values.  Ed Rigdon has proposed an even more conservative option that fixes the covariances to the average covariance.)

There is a lav_partable_indepence() function that takes your lavaan fit object as the first parameter.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Kenneth Granillo-Velasquez

unread,
Jun 22, 2023, 12:29:30 PM6/22/23
to lavaan
Understood! In other words, when the (co)variances are fixed and come directly from the matrix, we remove them from the total pool of moments (i.e., the total pool of covariance and variance matrix), correct? Sorry for all these questions; just want to make sure I completely understand. 

Keith Markus

unread,
Jun 23, 2023, 8:34:22 AM6/23/23
to lavaan
Kenneth,
Interestingly, this same topic seems to have come up on two different threads discussed in parallel.

Yes, your description is correct.  One way to think about it is as follows:  Order the variables in the moment matrix placing the exogenous variables (x) to the left and on top and the endogenous variables (y) to the right and below.  You can now partition the moment matrix into an xx matrix, a yy matrix, and two copies of an xy matrix, one of which is transposed.  Fixed-x removes the xx partition of the moment matrix from consideration in model fit.

Incidentally, I did not describe Ed Rigdon's suggested baseline model in ideally clear terms.  It is better described as one that constrains all the covariances to be equal to one another.  Here is the reference.

Kenneth Granillo-Velasquez

unread,
Jun 23, 2023, 11:43:07 AM6/23/23
to lavaan
Thank you for the explanation! This makes sense!
Reply all
Reply to author
Forward
0 new messages