Scaled vs Unscaled chi-square values

Sofia Orellana

unread,

Jan 20, 2022, 7:08:40 AM1/20/22

to lavaan

Hi everyone,

I am simply assessing the goodness of fit of a model through a closeness of fit test. i.e. I am just looking at the Chi-square and associated p-value.

I obtain the output GOF values of my model using the fitMeasures() function, but GOF yields scaled and unscaled chis-squared values and p-values.

What does it mean that the chi-square and p-value are scaled?

Which one should I be reporting?

Thank you!

Best,

Sofia

Jasper Bogaert

unread,

Jan 21, 2022, 3:19:29 AM1/21/22

to lavaan

Hi Sofia

I am not 100% sure of this answer (so anyone, please feel free to jump in or comment), but I guess trying to help is better than doing nothing. I found more information on the following stackexchange page: https://stats.stackexchange.com/questions/207314/scaled-vs-unscaled-chi-square-statistic-in-cfa-grm-fit-evaluation

There you can find the following: "Satorra suggested the scaled Chi-square statistic as a more robust statistic when the data are not normal. Apparently it is also more accurate in small samples." These scaled chi-squared values and p-values could be the result of using the Satorra-Bentler correction. If this is used, then an extra column is shown in the output. This correction is used when the data of the items is not normally distributed, so depending on that you can choose which to report.

In the following link you can find some useful information on the Satorra-Bentler correction: https://www.stata.com/features/overview/sem-satorra-bentler/

Short answer: as long as your data is normally distributed, I would suggest using the regular ones.

Some R-code you can try:

------

myModel <- '

# latent variables
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual covariances
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'

fit <- sem(model = myModel,
data = PoliticalDemocracy,
test="satorra.bentler") # this line is for the Satorra-Bentler correction

summary(fit, fit.measures = TRUE)

-----

Best wishes,

Jasper Bogaert

---------------------
PhD student and teaching assistant
Department of Data Analysis (PP01)
Faculty of Psychology and Educational Sciences
Ghent University

Terrence Jorgensen

unread,

Jan 25, 2022, 6:10:35 PM1/25/22

to lavaan

These scaled chi-squared values and p-values could be the result of using the Satorra-Bentler correction

Indeed Jasper, also if the OP requested estimator = "MLM", or perhaps the Yuan-Bentler correction (estimator = "MLR"). See the ?lavOptions help page for a description of estimator= and test= options.

The scaled statistics are what appears in the "Robust" column of the summary() output.

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Jasper Bogaert

unread,

Jan 26, 2022, 3:36:04 AM1/26/22

to lavaan

Hello Terrence

Thank you for confirming and the extra information, it is much appreciated!

Jasper Bogaert
---------------------
PhD student and teaching assistant
Department of Data Analysis (PP01)
Faculty of Psychology and Educational Sciences
Ghent University

Shu Fai Cheung

unread,

Jan 26, 2022, 5:20:28 AM1/26/22

to lavaan

I would like to share something on the output with estimator = MLM or MLR. Please correct me if I am wrong.

This is an example (with only commonly reported fit measures retained):

# Adapted from https://lavaan.ugent.be/tutorial/cfa.html library(lavaan) #> This is lavaan 0.6-9 #> lavaan is FREE software! Please report any bugs. HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit <- cfa(HS.model, data=HolzingerSwineford1939, estimator = "MLM") summary(fit, fit.measures = TRUE) #> lavaan 0.6-9 ended normally after 35 iterations #> #> Estimator ML #> Optimization method NLMINB #> Number of model parameters 21 #> #> Number of observations 301 #> #> Model Test User Model: #> Standard Robust #> Test Statistic 85.306 80.872 #> Degrees of freedom 24 24 #> P-value (Chi-square) 0.000 0.000 #> Scaling correction factor 1.055 #> Satorra-Bentler correction #> #> Model Test Baseline Model: #> #> Test statistic 918.852 789.298 #> Degrees of freedom 36 36 #> P-value 0.000 0.000 #> Scaling correction factor 1.164 #> #> User Model versus Baseline Model: #> #> Comparative Fit Index (CFI) 0.931 0.925 #> Tucker-Lewis Index (TLI) 0.896 0.887 #> #> Robust Comparative Fit Index (CFI) 0.932 #> Robust Tucker-Lewis Index (TLI) 0.897 #> #> Root Mean Square Error of Approximation: #> #> RMSEA 0.092 0.089 #> 90 Percent confidence interval - lower 0.071 0.068 #> 90 Percent confidence interval - upper 0.114 0.110 #> P-value RMSEA <= 0.05 0.001 0.001 #> #> Robust RMSEA 0.091 #> 90 Percent confidence interval - lower 0.070 #> 90 Percent confidence interval - upper 0.113 #>

Created on 2022-01-26 by the reprex package (v2.0.1)

For chi-square and p-value, there are only two sets of them, one in the column "Standard" (85.306) and one in the column "Robust" (80.872). As already noted in previous posts, the scaled statistics are in the "Robust" column.

However, there are two sets of "robust" CFI, TLI, and RMSEA. One set presented in the "Robust" column next to the "Standard" versions of the values (.925, .887, and .089 for CFI, TLI, and RMSEA). To my understanding, they are simply the usual CFI, TLI, and RMSEA, computed using the scaled chi-squares.

The other set presented in the "Robust" column but in their own rows, prefixed by "Robust," i.e., Robust CFI, Robust TLI, and Robust RMSEA (.932, .897, and .091 for CTI, TLI, and RMSEA). According to the release history for version 0.5-21 (https://lavaan.ugent.be/history/dot5.html), they are computed using the method proposed by Brosseau-Liard, Savalei, & Li (2012) and Brosseau-Liard & Savalei, V. (2014). This can be confirmed as below:

# Adapted from https://lavaan.ugent.be/tutorial/cfa.html library(lavaan) #> This is lavaan 0.6-9 #> lavaan is FREE software! Please report any bugs. HS.model <- ' visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 ' fit <- cfa(HS.model, data=HolzingerSwineford1939, estimator = "MLM") fitMeasures(fit, c("cfi.scaled", "tli.scaled", "rmsea.scaled")) #> cfi.scaled tli.scaled rmsea.scaled #> 0.925 0.887 0.089 fitMeasures(fit, c("cfi.robust", "tli.robust", "rmsea.robust")) #> cfi.robust tli.robust rmsea.robust #> 0.932 0.897 0.091

Created on 2022-01-26 by the reprex package (v2.0.1)

From the release history: "Robust RMSEA and CFI values are now computed correctly, following Brosseau-Liard, P. E., Savalei, V., and Li, L. (2012), and Brosseau-Liard, P. E. and Savalei, V. (2014); in the output of fitMeasures(), the ‘new’ ones are called cfi.robust and rmsea.robust, while the ‘old’ ones are called cfi.scaled and rmsea.scaled."

Therefore, if MLM or MLR is used, then the values from the robust *rows* should be used, according to the quote above.

But it was 2016. I am not sure whether there were any updates or changes after then.

Actually, I found it quite confusing to have two sets of "robust" values for these three measures. Maybe the naming of them in the text output can be modified to clarify the differences? Or maybe the "incorrect" values be removed, giving that they are "now computed correctly" and reported in the robust rows?

-- Shu Fai

Yves Rosseel

unread,

Jan 28, 2022, 7:58:31 AM1/28/22

to lav...@googlegroups.com

Hello Shu Fai,

On 1/26/22 11:20, Shu Fai Cheung wrote:
> Actually, I found it quite confusing to have two sets of "robust" values
> for these three measures. Maybe the naming of them in the text output
> can be modified to clarify the differences? Or maybe the "incorrect"
> values be removed, giving that they are "now computed correctly" and
> reported in the robust rows?

So this is why things are what they are now:

In the beginning, lavaan (as well as, say, Mplus) only reported the
'scaled' versions of CFI/TLI and RMSEA. In these versions, we use the
standard formula of CFI/TLI/RMSEA, but we replace the naive/standard
chi-square test statistic by the so-called 'scaled' (or Satorra-Bentler)
version.

Then, Brosseau-Liard et.al pointed out that these were not 'correct',
and in a series of two papers, the correct 'robust' CFI/TLI and RMSEA
formulas were presented and tested. (Note that for the RMSEA, the
formulas were already derived in unpublished work of Li and Bentler, 2006).

In version 0.5-21 (7 sept 2016), I did what I believed was 'the right
thing' to do, and simply replaced the 'old' (=scaled) versions of the
CFI/TLI/RMSEA by the new and correct 'robust' versions.

Then, all hell broke loose. I received (literally) hundreds of angry
emails from lavaan users (some of them famous experts in the field) that
some numbers changed in the summary() output of lavaan! As a result,
they could no longer replicate older (often published) results with this
new version. (The old scaled CFI/TLI/RMSEA values were still available
using fitMeasures(), but that didn't matter).

I thought long and hard what to do here. I also consulted with Vika
Savalei about this. Eventually, we decided to 'bring back' the
old/scaled versions in the summary() output, and to add extra lines for
the 'Robust' CFI/TLI/RMSEA. This was in 0.5-22 (24 sept 2016). That
seemed to calm down things, and I did not receive angry emails again (at
least not about that).

When I release 0.7 (please don't ask me when), there will be a number of
significant changes in the summary() output. That might be a good time
to get rid of those 'old/scaled' versions once and for all.

But note that these 'correct' robust CFI/TLI/RMSEA statistics do not yet
exist for the MLMV estimator, let alone for categorical data (despite
many efforts). And we happily keep on using the old ones in those
settings. And as far as I can tell, Mplus still reports the 'old/scaled'
versions only even when estimator = MLM.

Yves.

Shu Fai Cheung

unread,

Jan 28, 2022, 8:22:10 AM1/28/22

to lavaan

Millions thanks for the detailed explanation! And thanks for developing lavaan and keeping it up-to-date for us. I can't wait for the next major release! :)

-- Shu Fai

Edward Rigdon

unread,

Jan 28, 2022, 8:53:16 AM1/28/22

to lav...@googlegroups.com

THANK YOU Thank you!!

This answer alone is worth the price of admission.

--Ed Rigdon

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a4a44e39-eba9-d601-e237-698cda53dd60%40gmail.com.

Tobias Krieger

unread,

Dec 21, 2022, 5:34:26 AM12/21/22

to lavaan

Dear all,

In the comment from 28.01.2022 Yves wrote "but note that these correct robust CFI/TLI/RMSEA statistics do net yet exist for the MLMV estimator". But there is an article from Victoria Savalei (2018). On the computation of the RMSEA and CFI from the mean-and-variance corrected test statistic with nonnormal data in SEM. where she shows how to compute the robust CFI and RMSEA for MLMV with lavaan output.

Is this computation wrong? Can't we trust those robust results for CFI and RMSEA?

Thank you and best regards,

Tobias

Yves Rosseel

unread,

Dec 23, 2022, 11:46:14 AM12/23/22

to lav...@googlegroups.com

Hello Tobias,

You are quite right. In fact, the robust RMSEA/CFI for MLMV was even on
my todo list around 2018, but was then somehow forgotten, so I never
actually implemented this. I now did in dev 0.6-13 (on github)!

Note that this 'correction' is only valid for continuous data, and does
not apply to the categorical case.

Yves.

Tobias Krieger

unread,

Dec 28, 2022, 5:49:16 AM12/28/22

to lavaan

Dear Yves,

Thank you very much for your fast answer and the quick implementation!

I have got three additional questions:

1. Do you mean it is only valid for real continuous data or also in the case of categorical data with for example five categories? I know it is a long debate and a lot of research has discussed the topic of when the amount of categories is enough to also use maximum likelihood estimators which were originally developed for continuous data. So would you say it is possible and correct to use MLMV and the robust correction from Savalei (2018) also in the case of for example data with 5 categories? (As Maydeu-Olivares (2017). Maximum likelihood estimation of structural equation models for continuous data: Standard errors and goodness of fit recommends MLMV as the best method, using categorical data with 5 categories in his example).

Or is it better, besides WLSMV, to use then MLM or MLR for correct robust estimations of RMSEA/CFI/Standard errors? What would you recommend?

2. I found in the lavaan group an older post from you about the chi-square difference tests, saying, if I understood it correctly, that the chi-square difference tests in lavaan are automatically using the correct estimation method. So lavaan could, for example, estimate the correct chi-square difference test in the case of data that was calculated by WLSMV, MLMV, MLM, MLR. In papers that were published after your post I found authors saying that there is not yet a chi-square difference test for MLMV in lavaan. So my question is: Is the automatic chi-square test in lavaan always correct, using the right method according to the estimation method? Or are there some estimation methods (for example MLMV) for which the chi-square difference test produces a wrong output?

3. Do you have an answer to my question in this post Shift parameter of scaled and shifted estimators about the meaning of the (negative) shift parameters from MLMV and WLSMV?

Thank you very much and best regards,

Tobias

Yves Rosseel

unread,

Dec 30, 2022, 11:10:33 AM12/30/22

to lav...@googlegroups.com

On 12/28/22 11:49, Tobias Krieger wrote:
> I have got three additional questions:

> 1. Do you mean it is only valid for _real continuous data_ or also in

> the case of categorical data

The Savalei 2018 method is for real continuous data only indeed. If you
have categorical/ordinal data, and you use ordered= in combination with
DWLS ('WLSMV') or a related LS method, then we need yet another approach
(as per Savalei 2021), which also has been included in lavaan (in the
github version) just a few days ago. At least for simple models
(categorical data only, conditional.x = FALSE).

If you have, say, 5-point Likert scales, you may also consider using
estimator = "MLR". In that case, the 'Robust' RMSEA/CFI is the one you
should report.

> 2. I found in the lavaan group an older post from you about the
> chi-square difference tests, saying, if I understood it correctly, that
> the chi-square difference tests in lavaan are automatically using the
> correct estimation method. So lavaan could, for example, estimate the
> correct chi-square difference test in the case of data that was
> calculated by WLSMV, MLMV, MLM, MLR.

Indeed. There are 4 options: 'standard' LRT, satorra.bentler.2001,
satorra.bentler.2010 and satorra.2000.

If you use a standard (unscaled) estimator (ML, GLS, WLS), the standard
chi-square difference test is used.

If a 'scaled' test statistic is used (estimator MLM, MLR, WLSM, ULSM;
implying test = "satorra.benter" or "yuan.bentler.mplus"), you get the
satorra.bentler.2001 method (per default). An alternative for these
estimators is the satorra.bentler.2010 method to avoid negative values
in special cases.

For all *MV (scaled + shifted) methods (WLSMV, MLMV ...) we fall back to
the original satorra.2000 method.

> your post I found authors saying that there is not yet a chi-square
> difference test for MLMV in lavaan. So my question is: Is the automatic
> chi-square test in lavaan always correct

To the best of my knowledge: yes.

> 3. Do you have an answer to my question in this post Shift parameter of
> scaled and shifted estimators

Will post there.

Yves.

Tobias Krieger

unread,

Jan 2, 2023, 12:13:31 PM1/2/23

to lavaan

Dear Yves,

Happy New Year!

Thank you again very much for your fast and detailed answer, which is very helpful!

Coming back to two points:

The implementation of the Savalei 2021 method for WLSMV.

Do I understand it correctly that conditional.x = FALSE means, that models are currently not allowed to have exogenous categorical covariates to estimate the Savalei 2021 method for WLMSV? I tried to understand from this post conditional.x what conditional.x does in the model.

You mentioned for 5-point Likert scales the MLR estimator should be considered as it produces correct robust RMSEA/CFI.

If we look at the case where we have complete and non-normal data:

1. Did I understand you correctly, that you in general recommend MLR as the estimator of choice (in the case of a ML estimator) to have at the same time an estimator that can handle non-normal data on a 5-point Likert scale and that produces correct robust RMSEA/CFI?

Having both would not be possible with MLMV as you mentioned before.

2. But wouldn't it also be possible with MLM to have a correction for non-normality and a correct robust RMSEA/CFI?

3. Are there additional reasons why especially MLR should be recommend? Is it because it can handle also other situations like complex data (according to Li, 2021, Statistical estimation of structural equation models with a mixture of continuous and categorical observed data, p. 2192)?

4. Also in the case of complete data, I would just use the standard settings that are automatically implied by MLR, meaning: test = "yuan.bentler.mplus", se = "robust.huber.white", information = "observed", observed.information = "hessian", h1.information = "structured", correct? (I saw in one of your older presentations that lavaan would normally use "robust.mlm" in the case of complete data page 32, but this is probably from an older lavaan version as I couldn't find further information about that)

4. Is there a paper explaining how the robust RMSEA/CFI are calculated for MLR? I could only find papers by Brousseau-Liard and Savalei for robust MLM RMSEA/CFI. In this post by Terrence Jorgensen (where he also mentions those papers) I read, that in the case of MLM and MLR the same formulas for correct robust RMSEA/CFI are used MLM and MLR RMSEA/CFI corrections. But I couldn't find a specific paper that shows this or explains it.

If for MLM and MLR the same formula is used to estimate the correct robust RMSEA/CFI, that means that all statements about robust RMSEA/CFI in the case of MLM also apply for MLR? So the only reason why they produce slightly different values is not due to formulas but due to the beforehand corrected data that enters the formulas?

Thank you so much, I highly appreciate your support and explanations!

Best regards,

Tobias

Yves Rosseel

unread,

Jan 4, 2023, 5:16:43 AM1/4/23

to lav...@googlegroups.com

On 1/2/23 18:13, Tobias Krieger wrote:
> *The implementation of the Savalei 2021 method for WLSMV. *

> Do I understand it correctly that conditional.x = FALSE means, that
> models are currently not allowed to have exogenous categorical
> covariates to estimate the Savalei 2021 method for WLMSV?

conditional.x = TRUE (which is the default if you have categorical data
AND exogenous 'x' covariates) means that we first regress out the 'x'
covariates, and the main analysis is then based on the *residual*
correlation matrix of the endogenous categorical variables (See Muthen 1984)

And indeed, conditional.x = TRUE is not supported yet for the
computation of the robust RMSEA/CFI when data is categorical. In fact,
it only works (for now) if all data is categorical.

> If we look at the case where we have _complete and non-normal _data:

> 1. Did I understand you correctly, that you in general recommend MLR as
> the estimator of choice (in the case of a ML estimator)

No. There is nothing wrong with MLM. MLR has the advantage that it can
be combined with missing = "ml". For both MLM and MLR, there is a
correct 'robust' version of CFI and RMSEA available.

> Having both would not be possible with MLMV as you mentioned before.

MLMV could be used as well (for complete data only). The only difference
is that the test statistic is now using a mean and variance (hence 'MV')
adjusting correction (ie test = "scaled.shifted")

> 2. But wouldn't it also be possible with MLM to have a correction for
> non-normality and a correct robust RMSEA/CFI?

Sure.

> 3. Are there additional reasons why especially MLR should be recommend?

It is can be combined with missing = "ml", that is all.

> 4. Also in the case of complete data, I would just use the standard
> settings that are automatically implied by MLR, meaning: test =
> "yuan.bentler.mplus", se = "robust.huber.white", information =
> "observed", observed.information = "hessian", h1.information =
> "structured", correct?

Yes. See this paper for an overview:

https://www.tandfonline.com/doi/full/10.1080/10705511.2021.1877548

(I saw in one of your older presentations that
> lavaan would normally use "robust.mlm" in the case of complete data page

> 32 <https://users.ugent.be/~yrosseel/lavaan/lavaan2.pdf>, but this is

> probably from an older lavaan version as I couldn't find further
> information about that)

Indeed. That document is 10 years old. (I only keep it there because
some papers cite this link.) The Savalei & Rosseel paper above is the
better reference. "robust.mlm" is now called "robust.sem", while
"robust.mlr" is now called "robust.huber.white". Both refer to the way
'robust' standard errors are computed.

> 4. Is there a paper explaining how the robust RMSEA/CFI are calculated
> for MLR?

It is the same as for MLM. MLM and MLR differ in the way they compute
the 'scaling factor' that is used to 'scale' the original test
statistic. But once you have this value, the formulas for computing the
robust RMSEA/CFI are the same.

The case of MLMV is different, hence a different paper.

> that means that all statements about robust RMSEA/CFI
> in the case of MLM also apply for MLR?

Yes.

> So the only reason why they
> produce slightly different values is not due to formulas but due to the
> beforehand corrected data that enters the formulas?

Indeed. If data is complete, the main differences are a follows:

- MLM implies test = "satorra.benter", using (per default) information =
"expected", h1.information = "structured"

- MLR implies test = "yuan.benter.mplus", using (per default)
information = "observed", observed.information = "hessian", and
h1.information = "structured"

The meaning of these options is explained in the Savalei & Rosseel
(2022) paper.

Yves.

Tobias Krieger

unread,

Jan 5, 2023, 1:04:38 PM1/5/23

to lavaan

Dear Yves,

Thank you so much for taking the time to answer all my questions, this was incredibly helpful to me.

All the best,

Tobias

Tobias Krieger

unread,

Mar 5, 2023, 12:51:32 PM3/5/23

to lavaan

Dear Yves,

Based on our last conversation I have a follow-up question:

You mentioned, that all statements about robust RMSEA and CFI from MLM also apply to MLR.

In a paper from Brosseau-Liard & Savalei (2014). Adjusting incremental fit indices for nonnormality I read that in the case of the corrected RMSEA and CFI used in MLM the typical cutoff values from normal ML estimation also apply for these corrected RMSEA and CFI results.

As we discussed that MLM and MLR use the same formulas to calculate RMSEA and CFI and only differ in the scaling beforehand, does this mean that for MLR it is also true that normal cutoff values originally developed from ML estimation can be used? I searched for a paper but couldn't find one.

And what about especially the use of the cutoff values in the case of using MLR or MLM with categorical data (e.g. 5 categories)? Is there anything or any difference between the methods to be aware of?

I am sorry that this is more a general question than a specific lavaan question and I know that cutoff values in general are criticized. But hopefully the answer here also helps future researchers using lavaan.

Thank you very much and best regards

Tobias

Message has been deleted

Grace

unread,

Mar 6, 2023, 8:20:59 AM3/6/23

to lavaan

Hallo,

can any one help me. I am preliminary in lavaan. I used wlmsv estimator for my five point likert scale. I reported in my result scaled fit indices instead of robust in model comparision. I cannot used robust because in the threshold model I got (NA) in front of robust. Is there any problem in my analysis?!