How problematic is to use wlsmv for continuous

217 views
Skip to first unread message

Nikola Ćirović

unread,
Jun 22, 2019, 3:02:15 PM6/22/19
to lavaan
Dear All,

I am struggling with an appropriate estimator for both single and multi group CFA with data that has less than five response options (four). 

I know that the recommended procedure is to treat these variables as order categorical and to use WLSMV as an estimator.

However, the fit goes way up (all models fit unrealistically good) if I use both WLSMV and a separate specification of variables as ordered and there are studies that show that conventional cut-off criteria do not apply so blindly to such a situation (Xia & Young, 2018). I see that simulation studies gauge the differences of MLR vs WMSLV for ordered categorical data but lavaan does not allow for MLR to be applied once the data are specified as ordered categorical.

Therefore, my question is, in short, what do you think about using WLSMV but still estimating intercepts (in measurement invariance)? Is the number of response scale options enough of a justification for using this estimator without saying variables are ordinal (specifying it in lavaan)?

My current strategy is to use both estimator and document the differences for single group analysis (as recommended by Sass, Schmitt, & Marsh, 2014). 

Terrence Jorgensen

unread,
Jun 23, 2019, 2:34:04 PM6/23/19
to lavaan
These questions are not about lavaan, so they are more appropriate to post on the general SEM forum SEMNET:


The issue of choosing between MLR or WLSMV is not about fit, but about whether the variables operate approximately enough to a continuous variable that your results are unbiased.  In the studies you mention, they treat the variables as continuous when using MLR, but as ordered when using WLSMV.  The research shows biased results with < 5 categories (i.e., underestimated polychoric correlations, which in turn lead to underestimated factor loadings), so you are better off using WLSMV.  Using MLR would mean you have less power to detect in violations of variance, because you would be testing equivalence of attenuated parameters.

Fit indices are uninformative anyway, unless you use a resampling technique to account for their sampling distribution, e.g., permutation:


FYI, you can use a less restrictive sequence of invariance tests by testing thresholds first, then loadings and intercepts.  The semTools package has a function measEq.syntax() that automates writing the syntax to keep all the identification/equality constraints straight (see the help-page examples).
 
Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Nikola Ćirović

unread,
Jun 28, 2019, 8:25:57 PM6/28/19
to lavaan
Dear Mr. Jorgensen,

Thank You for Your reply. 
You cleared a lot of my confusion.
I posted it here because I am only interested in advice applicable in lavaan since it is the only software I can use (I am thankful for that too).

Also, I saw on the web notes by Muthens that WLSMV is exclusively made for categorical data and I was confused that I can use it in lavaan on the interval data while the opposite doesn't apply to MLR (and I was confused even more with my wrong assumption that MLR in the studies I mentioned was used with data specified as categorical, which You made clear is not the case.)

I guess I had the silly idea to see if it is possible to disentangle the categorical nature of the data form the problem of the number of categories and hoped to treat data as interval and still use WLSMV to reduce bias due to having only 4 categories of response options. 

I can see that cut-offs for the WLSMV on the categorical data didn't provide much discrimination between alternative models (all were excellent) while model fit of alternative models in the WLSMV on the interval data resulted in some models being bad, some adequate, and some good so I assumed the criteria to be too optimistic in the former case and thought that my conclusions would be better based on the interval data. But this is anecdotal and silly.
I will try to grasp Your paper on this problem and see if  I am able to use it. Thank You for sharing!
Reply all
Reply to author
Forward
0 new messages