Measurement invariance using WLSMV: lower chi-square in a more restrictive model (DWLS)?

355 views
Skip to first unread message

Luka Komidar

unread,
Jun 6, 2019, 6:28:41 AM6/6/19
to lavaan
Hi everyone,

I'm conducting a simple measurement invariance analysis across gender for a 6-factor model. The data are ordinal (a 3-point response scale) so I'm using WLSMV. I understand that the robust chi-square (WLSMV) can be lower in a more restrictive model (e.g. a lower chi^2 in the scalar than the metric model) due to scaling adjustments. However, in my case, the DWLS chi^2 is lower in the scalar model (compared to the metric model). This of course leads to a negative chi^2 difference and problems when comparing the nested models. Does anyone have any idea how is that possible? Also, Mplus returns very different results (e.g. lavaan showed that the metric invariance holds, while Mplus gave the opposite result) - I've pasted the two outputs from lavaan and Mplus below.

lavaan MI testing

(I'm posting the results of LRT with the default method, since I just want to present the strangely behaving (DWLS) chi^2)

METRIC vs. CONFIGURAL

Scaled Chi Square Difference Test (method = "satorra.2000")

                     Df     AIC BIC  Chisq       Chisq diff   Df diff      Pr(>Chisq)
child_conf    2038                3655.8                              
child_metric 2079                3994.7     48.835        41           0.1872

SCALAR vs. METRIC

Scaled Chi Square Difference Test (method = "satorra.2000")

                    Df      AIC BIC   Chisq       Chisq diff   Df diff      Pr(>Chisq)
child_metric 2079                 3994.7                              
child_scalar 2120                 3856.9      -139.9        41           1

Mplus

Invariance Testing

                   Number of                   Degrees of
     Model        Parameters      Chi-Square    Freedom     P-Value

     Configural       402           3905.570      2038       0.0000
     Metric             361           3956.441      2079       0.0000
     Scalar             275           3892.857      2165       0.0000

                                               Degrees of
     Models Compared              Chi-Square    Freedom     P-Value

     Metric against Configural        91.933        41       0.0000
     Scalar against Configural       172.653       127     0.0044
     Scalar against Metric             104.125        86       0.0893

Terrence Jorgensen

unread,
Jun 6, 2019, 7:37:40 AM6/6/19
to lavaan
negative chi^2 difference and problems when comparing the nested models. Does anyone have any idea how is that possible? 

Judging from your change in df between models, I'm guessing your models are not nested.  You didn't provide all your syntax, but I'm guessing you made the common mistake of constraining loadings, then constraining thresholds.  With 3-category indicators, your 2 thresholds are only sufficient to identify the indicator's latent intercept and (residual) variance, so your configural model should simply constrain thresholds to equality and estimate loadings and intercepts.  Then you constrain loadings, then intercepts, as you would with continuous data.  See the help page:

?measEq.syntax


And find details in the Wu & Estabrook paper in the References.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Luka Komidar

unread,
Jun 6, 2019, 7:49:28 AM6/6/19
to lavaan
Thanks for your answer! I'll check the measEq.syntax help page.

Judging from your change in df between models, I'm guessing your models are not nested.

I've used the group.equal argument in the lavaan's cfa() function, i.e. group.equal = c("loadings") for the metric model and group.equal = c("loadings", "thresholds") for the scalar model. If I understand correctly, one should not rely on the use of this argument when using WLSMV on 3-category data? Btw, in Mplus, I've also used the shortened syntax MODEL = CONFIGURAL METRIC SCALAR to obtain the results I've pasted in the first post (and also used WLSMV on items defined as categorical).

Terrence Jorgensen

unread,
Jun 6, 2019, 8:08:35 AM6/6/19
to lavaan
in Mplus, I've also used the shortened syntax MODEL = CONFIGURAL METRIC SCALAR to obtain the results I've pasted in the first post (and also used WLSMV on items defined as categorical).

Yes, your results are consistent with how they recommend testing invariance, which is why that sequence of models is so popular.  Even they recommend simultaneously constraining loadings and thresholds, though, and Mplus doesn't even provide the option of freeing intercepts of the latent item responses.  The Wu & Estabrook article is the first to thoroughly illuminate a very confusing set of identification issues, about which different programmers (Mplus, LISREL) and other methodologists (Roger Millsap) have provided radically different/contradictory advice, all of which have made major assumptions about which users remain unaware.  I designed that function to simplify the issues a bit (well, at least automate the complex choices).  Ultimately, it should provide a less restrictive set of tests than any other standard advice, but there is yet to be a simulation demonstrating that.  I hope it helps in your case.   If you do follow Wu & Estabrook's advice (using the default arguments), be prepared to cite their choices and defend it to a reviewer who thinks Muthén can do no wrong ;-)

Luka Komidar

unread,
Jun 7, 2019, 4:12:25 AM6/7/19
to lavaan
Thanks for a very illuminating answer, I've already started reading the Wu & Estabrook paper!

cheers,
Luka
Reply all
Reply to author
Forward
0 new messages