SEM with mix of categorical and continuous data?

Marie Jung

unread,

Jun 20, 2024, 2:20:26 PM (13 days ago) Jun 20

to lavaan

Hello all, I’m hoping to do a path analysis with several latent variables. The dataset is a mix of both categorial and continuous variables, with the outcome variable being a binary categorical variable (yes or no on substance use inititation). My questions:

Should I be scaling all of the variables, or just the continuous ones?
In lavaan, is it sufficient to input the categorical variables into the “ordered” command, or do I need to prepare them another way?
It seems from my reading that WLSM is the best estimator/method, but when I call it for my final model fit, it raises a warning that I should not use it for continuous data

Any insights would be helpful. Take care!

Terrence Jorgensen

unread,

Jun 20, 2024, 5:09:09 PM (13 days ago) Jun 20

to lavaan

Should I be scaling all of the variables, or just the continuous ones?

What do you mean by "scaling"?

In lavaan, is it sufficient to input the categorical variables into the “ordered” command, or do I need to prepare them another way?

https://lavaan.ugent.be/tutorial/cat.html

It seems from my reading that WLSM is the best estimator/method, but when I call it for my final model fit, it raises a warning that I should not use it for continuous data

For categorical data, the estimator is DWLS (diagonally weighted least squares). The M(V) suffix is just a shortcut for indicating the type of test statistic you want. When you specify any variables as ordered=, lavaan selects DWLS automatically, so there is no need to set it manually, unless you prefer unweighted (ULS) or pairwise maximum likelihood (PML) instead. The default test= will be "scaled.shifted", which typically performs better than alternative adjustments.

Terrence D. Jorgensen (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Marie Jung

unread,

Jun 20, 2024, 6:18:45 PM (13 days ago) Jun 20

to lavaan

Thank you Dr. Jorgensen,

I appreciate the feedback. By scaling, I meant z-scoring everything, regardless of type of variable, using the scale() function. Should I perform this on my entire dataset except for (or maybe including) the outcome variable?

So to recap:

dummy code all dependent categorical, continuous data can be left as is, and estimator method as DWLS?

Reply all

Reply to author

Forward