WLSMV and ordered categorical variable

694 views
Skip to first unread message

Shu Fai Cheung (張樹輝)

unread,
Oct 26, 2023, 10:36:05 PM10/26/23
to lavaan
Hi All,

I myself rarely use the value "WLSMV" directly to handle ordered categorical variable. I will just use the argument  "ordered" and lets lavaan to set other options, including the estimator.

In lavaan, setting estimator to "WLSMV" actually sets the estimator to "DWLS" (the same method used in Mplus), and sets some other options:


As far as I know, in Mplus, WLSMV is used along with the option CATEGORICAL (at least in the examples I found in the manual). I did a quick test and confirmed that, if I fit a model with WLSMV and did not specify and variable as ordered categorical, a warning will be raised:

*** WARNING in ANALYSIS command
  Estimator WLSMV is not available for analysis with all continuous variables.
  Default estimator will be used.
   1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

This confirmed that, in Mplus, WLSMV is supposed to be used when some variables are declared ordered categorical, and is not supposed to be used when all variables are treated as continuous.

However, this is not the case for lavaan. I can set estimator to "WLSMV" even if the model has no categorical variables:

``` r
library(lavaan)
#> This is lavaan 0.6-17.1897
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "WLSMV")
fit
#> lavaan 0.6.17.1897 ended normally after 44 iterations
#>
#>   Estimator                                       DWLS
#>   Optimization method                           NLMINB
#>   Number of model parameters                        21
#>
#>   Number of observations                           301
#>
#> Model Test User Model:
#>                                               Standard      Scaled
#>   Test Statistic                                43.902      79.342
#>   Degrees of freedom                                24          24
#>   P-value (Chi-square)                           0.008       0.000
#>   Scaling correction factor                                  0.598
#>   Shift parameter                                            5.867
#>     simple second-order correction
lavInspect(fit, "ordered")
#> character(0)
```

<sup>Created on 2023-10-27 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>

I asked this question because I noticed that some users set estimator to "WLSMV" to handle ordered categorical variables. They are doing the right thing, but only for the estimator. They did not use "ordered" to specify which variables are ordered categorical, and lavaan faithfully does what it is told to do, using DWLS (as in Mplus). They may not know that ordered categorical variables are not treated as such in lavaan, without using "ordered"

Is this an intended behavior of lavaan?

Users of Mplus may not notice this difference, and believe incorrectly that setting the estimator to "WLSMV" is enough to handle ordered categorical variables.

-- Shu Fai

Terrence Jorgensen

unread,
Nov 1, 2023, 11:02:22 AM11/1/23
to lavaan
In principle, DWLS can be applied to models with any combination of continuous and ordinal outcomes, including exclusively one or the other type.  With all continuous variables, it is not an efficient estimator relative to (robust) ML, but it is not technically incorrect.  However, I agree that it might be prudent to add a warning when WLSMV is requested yet no ordered variables are detected (either with the ordered= argument or already of class "ordered","factor" in the data=). 

https://github.com/yrosseel/lavaan/issues/303

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Isabel B.

unread,
Nov 1, 2023, 1:53:27 PM11/1/23
to lavaan
Hello, 

This is interesting to consider! If I am working with a CFA model that has a mixture of categorical and continuous variables, would this mean that my syntax should be the following? 

Model.fit <- lavaan::sem(Model, data = name, ordered = TRUE, estimator="WLSMV", missing="pairwise")

Or, should I only be specifying which variable is ordered in the model (i.e., the 1 categorical variable) with the following syntax? 

Model.fit <- lavaan::sem(Model, data = name, ordered = c("categorical variable"), estimator="WLSMV", missing="pairwise")?

Best, 
Isabel

Terrence Jorgensen

unread,
Nov 2, 2023, 3:04:32 AM11/2/23
to lavaan
Indicating at least one ordered= endogenous variable will always trigger lavaan to set the DELS estimator with a scaled/shifted (“MV”) test. No need to explicitly set the estimator, unless you prefer ULS or PML instead. 
Reply all
Reply to author
Forward
0 new messages