CFI is significantly worse than the TLI?!

Yannick Diehl

unread,

May 26, 2022, 10:54:54 AM5/26/22

to lavaan

Hello,

I would be very happy if someone could explain to me why my CFI is so much worse than the TLI, although that can't actually be the case. In addition, all fit indices are absolutely fine except for the CFI. Does that have something to do with the inclusion of my categorical variables? So I classified the ordinal variables as "ordered" for the latent factors. For the predictors, on the other hand, I chose a dummy coding. By default, WLSMV is used for estimation. Do I have to define the predictors differently?

model syntax:

protint.reg2 <- '
# Measurement Model
socialIn =~ im01 + im21
exteff =~ pe01 + lp05 + pe05r
intleff =~ pe04 + pe06 + pe02r
protestint =~ pp09 + pp17 + pp20 + pp22

pronorm=~ pe10r
reldepr=~ id01
migration=~px06

#Regression
protestint~ socialIn + exteff + intleff + pe08_1 + pe08_3 + pe08_4 +
pe07r_2 + pe07r_3 + pe07r_4 + ingler_1 + ingler_3 + ingler_4 + pronorm + reldepr + migration + s1_2 + s1_3 + s2_2 + s2_3 + s3_2 + s3_3
'

protint.reg2.DWLS.w <- cfa(protint.reg2, df, estimator="WLSMV", sampling.weights = "wghtpew", ordered = c("im01", "im21", "pe01", "lp05", "pe04", "pe05r",
"pe06", "pe02r", "pp09", "pp17", "pp20", "pp22"))

result:

> summary(protint.reg2.DWLS.w, fit.measures = T, standardized = T)
lavaan 0.6-10 ended normally after 74 iterations

Estimator DWLS
Optimization method NLMINB
Number of model parameters 78

Used Total
Number of observations 2723 3477
Sampling weights variable wghtpew

Model Test User Model:
Standard Robust
Test Statistic 1403.816 1220.446
Degrees of freedom 282 282
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.232
Shift parameter 81.044
simple second-order correction

Model Test Baseline Model:

Test statistic 12068.645 8767.605
Degrees of freedom 105 105
P-value 0.000 0.000
Scaling correction factor 1.381

User Model versus Baseline Model:

Comparative Fit Index (CFI) 0.906 0.892
Tucker-Lewis Index (TLI) 0.965 0.960

Robust Comparative Fit Index (CFI) NA
Robust Tucker-Lewis Index (TLI) NA

Root Mean Square Error of Approximation:

RMSEA 0.038 0.035
90 Percent confidence interval - lower 0.036 0.033
90 Percent confidence interval - upper 0.040 0.037
P-value RMSEA <= 0.05 1.000 1.000

Robust RMSEA NA
90 Percent confidence interval - lower NA
90 Percent confidence interval - upper NA

Standardized Root Mean Square Residual:

SRMR 0.043 0.043

Parameter Estimates:

Standard errors Robust.sem
Information Expected
Information saturated (h1) model Unstructured

Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
socialIn =~
im01 1.000 0.566 0.566
im21 1.303 0.146 8.926 0.000 0.738 0.738
exteff =~
pe01 1.000 0.693 0.693
lp05 1.234 0.038 32.268 0.000 0.856 0.856
pe05r 1.021 0.034 30.095 0.000 0.708 0.708
intleff =~
pe04 1.000 0.824 0.824
pe06 0.866 0.036 23.878 0.000 0.714 0.714
pe02r 0.692 0.029 23.713 0.000 0.571 0.571
protestint =~
pp09 1.000 0.623 0.604
pp17 1.318 0.076 17.402 0.000 0.821 0.780
pp20 1.125 0.069 16.338 0.000 0.700 0.674
pp22 1.242 0.072 17.205 0.000 0.773 0.739
pronorm =~
pe10r 1.000 0.727 1.000
reldepr =~
id01 1.000 0.690 1.000
migration =~
px06 1.000 1.340 1.000

Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
protestint ~
socialIn 0.241 0.044 5.524 0.000 0.219 0.219
exteff 0.118 0.037 3.177 0.001 0.132 0.132
intleff 0.122 0.022 5.442 0.000 0.161 0.161
pe08_1 -0.071 0.047 -1.514 0.130 -0.113 -0.040
pe08_3 0.101 0.034 2.963 0.003 0.162 0.076
pe08_4 0.017 0.077 0.221 0.825 0.027 0.005
pe07r_2 0.088 0.055 1.609 0.108 0.141 0.070
pe07r_3 0.146 0.056 2.608 0.009 0.234 0.114
pe07r_4 0.174 0.074 2.335 0.020 0.279 0.077
ingler_1 -0.203 0.057 -3.542 0.000 -0.326 -0.087
ingler_3 0.141 0.037 3.856 0.000 0.227 0.104
ingler_4 0.270 0.041 6.521 0.000 0.434 0.189
pronorm 0.151 0.021 7.373 0.000 0.177 0.177
reldepr 0.058 0.023 2.552 0.011 0.064 0.064
migration -0.061 0.014 -4.303 0.000 -0.130 -0.130
s1_2 0.415 0.147 2.818 0.005 0.667 0.091
s1_3 0.545 0.178 3.061 0.002 0.875 0.074
s2_2 0.477 0.069 6.918 0.000 0.766 0.182
s2_3 0.359 0.101 3.539 0.000 0.577 0.101
s3_2 0.412 0.176 2.340 0.019 0.662 0.066
s3_3 0.087 0.140 0.622 0.534 0.140 0.014

Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
socialIn ~~
exteff -0.123 0.017 -7.392 0.000 -0.313 -0.313
intleff -0.002 0.015 -0.157 0.875 -0.005 -0.005
pronorm 0.049 0.011 4.421 0.000 0.119 0.119
reldepr -0.094 0.013 -7.124 0.000 -0.240 -0.240
migration -0.049 0.020 -2.418 0.016 -0.065 -0.065
exteff ~~
intleff 0.139 0.015 9.230 0.000 0.244 0.244
pronorm -0.007 0.011 -0.626 0.531 -0.014 -0.014
reldepr 0.192 0.011 16.944 0.000 0.401 0.401
migration -0.440 0.027 -16.183 0.000 -0.473 -0.473
intleff ~~
pronorm 0.088 0.013 6.575 0.000 0.147 0.147
reldepr 0.088 0.013 6.977 0.000 0.154 0.154
migration -0.239 0.026 -9.085 0.000 -0.217 -0.217
pronorm ~~
reldepr -0.001 0.009 -0.092 0.927 -0.002 -0.002
migration -0.075 0.019 -3.991 0.000 -0.077 -0.077
reldepr ~~
migration -0.248 0.020 -12.289 0.000 -0.268 -0.268

Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.im01 0.000 0.000 0.000
.im21 0.000 0.000 0.000
.pe01 0.000 0.000 0.000
.lp05 0.000 0.000 0.000
.pe05r 0.000 0.000 0.000
.pe04 0.000 0.000 0.000
.pe06 0.000 0.000 0.000
.pe02r 0.000 0.000 0.000
.pp09 0.000 0.000 0.000
.pp17 0.000 0.000 0.000
.pp20 0.000 0.000 0.000
.pp22 0.000 0.000 0.000
.pe10r 3.036 0.050 60.358 0.000 3.036 4.178
.id01 2.668 0.055 48.750 0.000 2.668 3.864
.px06 3.054 0.101 30.202 0.000 3.054 2.279
socialIn 0.000 0.000 0.000
exteff 0.000 0.000 0.000
intleff 0.000 0.000 0.000
.protestint 0.000 0.000 0.000
pronorm 0.000 0.000 0.000
reldepr 0.000 0.000 0.000
migration 0.000 0.000 0.000

Thresholds:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
im01|t1 0.406 0.096 4.206 0.000 0.406 0.406
im21|t1 -1.574 0.087 -18.082 0.000 -1.574 -1.574
im21|t2 -0.520 0.078 -6.681 0.000 -0.520 -0.520
im21|t3 0.797 0.078 10.214 0.000 0.797 0.797
pe01|t1 -0.756 0.081 -9.316 0.000 -0.756 -0.756
pe01|t2 0.358 0.081 4.433 0.000 0.358 0.358
pe01|t3 1.783 0.090 19.813 0.000 1.783 1.783
lp05|t1 0.367 0.098 3.742 0.000 0.367 0.367
pe05r|t1 -1.257 0.084 -14.933 0.000 -1.257 -1.257
pe05r|t2 -0.006 0.080 -0.078 0.938 -0.006 -0.006
pe05r|t3 1.653 0.086 19.253 0.000 1.653 1.653
pe04|t1 -1.391 0.086 -16.220 0.000 -1.391 -1.391
pe04|t2 -0.356 0.081 -4.425 0.000 -0.356 -0.356
pe04|t3 0.725 0.081 8.901 0.000 0.725 0.725
pe06|t1 -1.508 0.089 -16.861 0.000 -1.508 -1.508
pe06|t2 -0.455 0.079 -5.773 0.000 -0.455 -0.455
pe06|t3 0.748 0.079 9.430 0.000 0.748 0.748
pe02r|t1 -0.350 0.076 -4.591 0.000 -0.350 -0.350
pe02r|t2 0.741 0.077 9.655 0.000 0.741 0.741
pe02r|t3 1.689 0.079 21.250 0.000 1.689 1.689
pp09|t1 0.539 0.098 5.486 0.000 0.539 0.523
pp17|t1 0.476 0.101 4.715 0.000 0.476 0.453
pp20|t1 -0.448 0.102 -4.379 0.000 -0.448 -0.431
pp22|t1 0.352 0.097 3.628 0.000 0.352 0.337

Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.im01 0.679 0.679 0.679
.im21 0.456 0.456 0.456
.pe01 0.519 0.519 0.519
.lp05 0.268 0.268 0.268
.pe05r 0.499 0.499 0.499
.pe04 0.321 0.321 0.321
.pe06 0.491 0.491 0.491
.pe02r 0.674 0.674 0.674
.pp09 0.674 0.674 0.635
.pp17 0.434 0.434 0.392
.pp20 0.588 0.588 0.545
.pp22 0.497 0.497 0.454
.pe10r 0.000 0.000 0.000
.id01 0.000 0.000 0.000
.px06 0.000 0.000 0.000
socialIn 0.321 0.042 7.600 0.000 1.000 1.000
exteff 0.481 0.023 21.337 0.000 1.000 1.000
intleff 0.679 0.031 22.226 0.000 1.000 1.000
.protestint 0.251 0.025 10.244 0.000 0.648 0.648
pronorm 0.528 0.016 32.658 0.000 1.000 1.000
reldepr 0.477 0.014 35.008 0.000 1.000 1.000
migration 1.795 0.082 21.829 0.000 1.000 1.000

Scales y*:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
im01 1.000 1.000 1.000
im21 1.000 1.000 1.000
pe01 1.000 1.000 1.000
lp05 1.000 1.000 1.000
pe05r 1.000 1.000 1.000
pe04 1.000 1.000 1.000
pe06 1.000 1.000 1.000
pe02r 1.000 1.000 1.000
pp09 1.000 1.000 1.000
pp17 1.000 1.000 1.000
pp20 1.000 1.000 1.000
pp22 1.000 1.000 1.000

Jeremy Miles

unread,

May 26, 2022, 12:05:56 PM5/26/22

to lav...@googlegroups.com

On Thu, 26 May 2022 at 07:54, 'Yannick Diehl' via lavaan <lav...@googlegroups.com> wrote:

Hello,

I would be very happy if someone could explain to me why my CFI is so much worse than the TLI, although that can't actually be the case.

Why do you say that can't be the case? It is the case.

All you need for the CFI and TLI are the chi-squares and degrees of freedom for the fitted and null (baseline) models. I often find looking at the formulas to be helpful, they're in this paper: https://link.springer.com/article/10.3758/s13428-018-1055-2

Brett

unread,

May 26, 2022, 1:18:45 PM5/26/22

to lav...@googlegroups.com

Your baseline model has 105 df and your user model 282 df - you can use the formulas (below) to figure out why since the distinction is the df adjustment and potentially the scaling parameters.

Also for what it's worth, strongly recommend against a kitchen sink regression with 21 predictors that I suspect are (a) highly correlated, (b) measured contemporaneously, and/or (c) who (co)variances are not modeled to account for IID violations (robust SEs do not fix this). If the goal is prediction, you can abandon the strong parametric assumptions of SEM. If it's causal inference, you risk all sorts of biases of unknown sign with such a model unless you know the 22-node DAG: see models 7, 10-12, 16-18 as examples.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/da19599c-f3d6-46ff-9b24-230da8e6a51cn%40googlegroups.com.

Edward Rigdon

unread,

May 29, 2022, 11:40:12 AM5/29/22

to lav...@googlegroups.com

Yannick--

I hope that I have not missed a better reply than mine, but here goes. Something strange has happened. Now, both CFI and TLI involve comparison of the fit of the target model to the fit of a baseline model. The baseline model is highly constrained, and should (almost) always have greater degrees of freedom (DF) than the target model. But observe that DF for your baseline model is only 105, while DF for your target model is 282.

You have ordinal variables, which complicate the output, but for a single group model things work out about the same. The DF = 105 for your baseline model is consistent with a baseline model having only 15 observed variables. Your model has 15 observed variables PLUS the 18 additional observed variable predictors in your "Regression" statement. But it looks like the baseline model estimation EXCLUDED those 18 additional observed variables.

This may have happened because lavaan has treated those 18 variables as exogenous, which excludes them from certain calculations. lavaan's default behavior will be different because you have endogenous categorical variables.

The easiest way to change this is to add the options

conditional.x=F fixed.x=F

(I'm not entirely clear on what the first option will do, in light of your categorical variables.)

to your cfa() call. That should cause lavaan to treat the 18 variables like all of your other observed variables.

--Ed Rigdon

Yannick Diehl

unread,

May 30, 2022, 5:37:59 AM5/30/22

to lavaan

Thank you to everyone who replied! I looked into the suggestions and Ed Rigdon helped me a lot. Unfortunately, there is also an error message here, which after detailed research often occurs with such models.

Warning in muthen1984(Data = X[[g]], wt = WT[[g]], ov.names = ov.names[[g]],
lavaan WARNING: trouble constructing W matrix; used generalized inverse for A11 submatrix
Warning in lavaan::lavaan(model = protint.reg2, data = df, ordered = c("im01",
lavaan WARNING:
the optimizer (NLMINB) claimed the model converged, but not all
elements of the gradient are (near) zero; the optimizer may not
have found a local solution use check.gradient = FALSE to skip
this check.

Is there a solution for this?

Thanks in advance!

Terrence Jorgensen

unread,

May 31, 2022, 5:55:30 PM5/31/22

to lavaan

Ed identified the reason your baseline.model is not nested in your target model (making it invalid for calculating incremental fit indices). I think his solution (to treat all variable as endogenous) would make the baseline.model too restrictive by fixing all covariances among predictors to zero. While the baseline.model can be restricted in any meaningful way, it sounds like you want to restrict endogenous with endogenous covariances to zero, but you should continue to allow exogenous with exogenous covariances to be free. The remaining issue is how to restrict endogenous with exogenous covariances such that the baseline.model is nested in the target model.

Because your predictors affect the common factors, they only have one estimated slope for all the indicators of that factor. Because factor loadings can vary (i.e., congeneric measurement model), each predictor's (indirect) effect can vary across indicators of the same factor. But a reasonably restricted baseline.model could constrain the effect of each predictor to equality across indicators of the same factor. For example, the 2 indicators of the first factor:

im01 ~ foo*pe08_1 + bar*pe08_3 + ... + baz*s3_3
im21 ~ foo*pe08_1 + bar*pe08_3 + ... + baz*s3_3

Then the same concept for indicators of each other factor. You can verify that this baseline.model is nested in the target model using semTools::net()

Warning in muthen1984(Data = X[[g]], wt = WT[[g]], ov.names = ov.names[[g]],
lavaan WARNING: trouble constructing W matrix; used generalized inverse for A11 submatrix
Warning in lavaan::lavaan(model = protint.reg2, data = df, ordered = c("im01",
lavaan WARNING:
the optimizer (NLMINB) claimed the model converged, but not all
elements of the gradient are (near) zero; the optimizer may not
have found a local solution use check.gradient = FALSE to skip
this check.

Is there a solution for this?

The whole point of conditioning on exogenous predictors is to make it easier on the optimizer by estimating fewer parameters, especially for 2-step estimators like DWLS (that's why it is the default for models with categorical outcomes).

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Yves Rosseel

unread,

Jun 1, 2022, 3:32:08 AM6/1/22

to lavaan

A few additional thoughts:

- When you have ordered outcomes, and you switch to conditional.x = FALSE, make sure that any binary covariates (eg gender) are included in the list of 'ordered' variables. Failing to do so may lead to the warning about 'trouble constructing W matrix'

- If you use conditional.x = TRUE (the default), there is a 'baseline.conditional.x.free.slopes' argument which is set to TRUE by default; the reasoning is that in the baseline model, the covariates may freely influence all the endogenous variables; in the user model, this is not necessarily the case (but the user has full control), potentially leading to poor fit, perhaps even worse compared to the baseline model; setting this option to FALSE will force these 'slopes' to be zero in the baseline model (inflating the CFI/TLI fit indices)

- You can always specify your own baseline model, and provide it to fitMeasures() to get the CFI/TLI fit indices

Yves.

Reply all

Reply to author

Forward