Testing Measurement Invariance with WLSMVS

David Disabato

unread,

Jun 20, 2016, 4:03:37 PM6/20/16

to lavaan

Hi lavaan listserve,

I am testing a multiple group bifactor CFA model with ordered categorical variables and 7 groups and 7600 total cases. I am confused by the robust chi-square test results when using the WLSMVS estimator and polychoric correlations. I have estimated my configural model (with all parameter estimates free to vary across groups) and my weak invariance model (with all factor loadings fixed to the same values across groups, but all other parameter estimates free to vary across groups).

The non-robust and robust chi-square values lead to different statistical inference when testing for measurement invariance. The non-robust chi-square value in the weak invariance model is significantly greater than (by a scaled chi-square difference test) that of the configural model. Usually this would lead me to reject the weak invariance model and search for partial invariance. However, the robust chi-square value in the weak invariance model is LESS than that of the configural model. Usually this would lead me to accept the weak invariance model and test for strong invariance.

Which set of chi-square values do I compare for statistical inference: the non-robust or robust ones?

I am not sure if this is something unique to the WLSMVS estimator in lavaan or not. I have never heard of a constrained model having a smaller chi-square value than its associated freed model.

Any advice is appreciated,

David

Terrence Jorgensen

unread,

Jun 22, 2016, 6:02:13 AM6/22/16

to lavaan

Which set of chi-square values do I compare for statistical inference: the non-robust or robust ones?

Typically you use the robust estimator because its Type I error rates are not inflated as non-robust estimators.

I am not sure if this is something unique to the WLSMVS estimator in lavaan or not. I have never heard of a constrained model having a smaller chi-square value than its associated freed model.

You don't directly compare the scaled chi-squared statistics. Instead, you calculate the difference between the unadjusted chi-squared statistics of the two models, then scale that (delta-)chi-squared. The anova() method or lavTestLRT() do this for you automatically. In some cases, you can get a negative result (which doesn't make sense), so as the ?lavTestLRT help page states, you can set method = "satorra.bentler.2010" to ensure a positive statistic.

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

David Disabato

unread,

Jun 22, 2016, 8:52:26 AM6/22/16

to lavaan

Thank you Terrance. Let me be a little more specific:

Here are the chi-square values from my configural and weak invariance models.

non-robust robust scale shift

configural (free) 1459.70 1727.73 0.85

weak invariance 2789.58 976.25 2.86

*The robust chi-square values suggest the weak invariance model fits BETTER than the configural model.

And here are the results of the scaled chi-square difference test.

statistic p.value

satorra.bentler.2001 193.62 .004

satorra.bentler.2010 189.69 .006

*The scaled chi-square difference test suggests the weak invariance model fits WORSE than the configural model.

As you can see, the statistical inference is opposite. I noticed you said you "typically use the robust estimator," so then should I ignore the results of the scaled chi-square difference test and simply compare the two robust chi-square values qualitatively?

Terrence Jorgensen

unread,

Jun 22, 2016, 9:36:35 AM6/22/16

to lavaan

*The robust chi-square values suggest the weak invariance model fits BETTER than the configural model.

No it doesn't. The robust/scaled chi-squared test statistic is scaled using an estimated scaling constant that is calculated for that model. So unless the scaling constants are identical between the two models, you can't compare the scaled chi-squared statistics because they aren't on the same metric. The robust correction merely transforms the statistic to follow a chi-squared distribution so that the estimated p value will yield approximately nominal Type I error rates.

http://dx.doi.org/10.1080/10705511.2013.824793

I noticed you said you "typically use the robust estimator," so then should I ignore the results of the scaled chi-square difference test and simply compare the two robust chi-square values qualitatively?

No, you just need to pay attention to the chi-squared difference test. That is the only valid comparison, because it corrects the 1 statistic that is of interest (the change in chi-squared). And it is the only result that makes sense because your fit gets worse by definition when you fit a more constrained model.

David Disabato

unread,

Jun 22, 2016, 10:30:41 AM6/22/16

to lavaan

Ahh that makes sense. Thank you for taking the time to explain that to me. I will read the article you posted - looks like a good one.

Bojana Dinic

unread,

Apr 5, 2018, 2:26:10 PM4/5/18

to lavaan

Dear Terrence,

I have the same dilemma (with WLSMV estimator) - scaled fit indices are not decreasing as I go through invariance levels (configural, metric, scalar etc.) like unscaled (they are decreasing as they should). For example, in configural model scaled CFI is .908, and in metric model it is .924, which is strange. Based on your comments, I assumed that this is ok in the case of scaled/robust chi square, but does it ok with other scaled fit indices? In this case should we report both scaled and unscaled fit indices?

Also, I am not sure if scaled fit indices other than chi square (e.g. ΔCFIscaled) could be directly compared as unscaled (e.g. ΔCFI, when we look whether the difference is greater than 0.01), so I just want to check with you which fit indices (CFI, RMSEA) would you look in this situation - scaled or unscaled?

Thank you for your time.

Best regards,

Bojana

Terrence Jorgensen

unread,

Apr 7, 2018, 12:29:37 PM4/7/18

to lavaan

I assumed that this is ok in the case of scaled/robust chi square, but does it ok with other scaled fit indices?

This issue has nothing to do with robust/scaled statistics. CFI can increase after imposing constraints whenever the increase in chi-squared (scaled or otherwise) is less than the increase in degrees of freedom.

In this case should we report both scaled and unscaled fit indices?

I don't see why not... But I would not base decisions on fit indices anyway. They are NOT test statistics, nor are they informative as measures of effect size (of model misfit).

Reply all

Reply to author

Forward