Hi,
I have a dataset containing ordinal variables, which was shared with me when I attended on Mplus course recently. I know what the answers should be in Mplus, but I cannot get the same result using lavaan.
The ordinal variables are: TEAMWORK1, TEAMWORK2, SOCSUP1, SOCSUP2, JOBSAT1 AND JOBSAT2.
I specify the model as follows:
HW.model <- "
# create endogenous variables
tmwork =~ 1*TEAMWORK1 + TEAMWORK2
socsup =~ 1*SOCSUP1 + SOCSUP2
jobsat =~ 1*JOBSAT1 + JOBSAT2
# residual variances observed variables
TEAMWORK1 ~~ TEAMWORK1
TEAMWORK2 ~~ TEAMWORK2
SOCSUP1 ~~ SOCSUP1
SOCSUP2 ~~ SOCSUP2
JOBSAT1 ~~ JOBSAT1
JOBSAT2 ~~ JOBSAT2
# factor covariances
tmwork ~~ tmwork
tmwork ~~ socsup
tmwork ~~ jobsat
socsup ~~ socsup
socsup ~~ jobsat
jobsat ~~ jobsat
"
If I make the following call, then I can get the same results in Mplus (in both cases, a underlying continuous distribution would be assumed):
fit <- lavaan(HW.model, data = healthworkers)
summary(fit, fit.measures = TRUE)
However, if I change the fit specification to (with the aim of correctly specifying the variables as being categorical):
healthworkers[,c("TEAMWORK1", "TEAMWORK2", "SOCSUP1", "SOCSUP2", "JOBSAT1", "JOBSAT2")] <-
lapply(healthworkers[,c("TEAMWORK1", "TEAMWORK2", "SOCSUP1", "SOCSUP2", "JOBSAT1", "JOBSAT2")], ordered)
fit <- lavaan(HW.model, data = healthworkers,
parameterization = "theta",
ordered = c("TEAMWORK1", "TEAMWORK2",
"SOCSUP1", "SOCSUP2",
"JOBSAT1", "JOBSAT2"))
I get:
Estimator DWLS Robust
Model Fit Test Statistic 7.311 17.038
Degrees of freedom 28 28
P-value (Chi-square) 1.000 0.948
Scaling correction factor 1.415
Shift parameter 11.870
This contrast with the Mplus output which is:
Chi-Square Test of Model Fit
Value 19.211*
Degrees of Freedom 6
P-Value 0.0038
* The chi-square value for MLM, MLMV, MLR, ULSMV, WLSM and WLSMV cannot be used
for chi-square difference testing in the regular way. MLM, MLR and WLSM
chi-square difference testing is described on the Mplus website. MLMV, WLSMV,
and ULSMV difference testing is done using the DIFFTEST option.
I do not understand what specifications I need for the fit in lavaan to make R do the same thing as Mplus. I inferred from a presentation by Yves Rosseel, which I found online, that setting the estimator argument in lavaan to:
estimator="WLSMV"
would prompt lavaan to analyse the data in the same way as Mplus, but it does not make a difference: I still get diverging results in Mplus and lavaan.
Any ideas or insights on this issue would be most welcome. I have also attached the original SPSS file for the data.
Many thanks,
CATEGORICAL =
SOCSUP1 SOCSUP2 TMWORK1 TMWORK2 JOBSAT1 JOBSAT2;
VARIABLE:
Keith,
Thank you for your suggestions. I am confident that the problem arises because the variables are not being treated as categorical rather than model specification. When I pretend the variables are continuous both Mplus and lavaan outputs agree.
Having a sample R script for some example categorical data that is known to work would be ideal. I have searched around for a good example, but I have not found good one. Did you have a particular example and script in mind?
A side note (but related): I do not understand why categorical exogenous and endogenous variables need to be treated differently (see: http://lavaan.ugent.be/tutorial/cat.html).
My variables TEAMWORK1, TEAMWORK2 etc are all exogenous variables, but if all data is assumed to be continuous these would need to be converted to categorical variables regardless of whether they are exogenous or endogenous. What am I missing? I have read the page for categorical variables on the lavaan site multiple times but I still do not get it.
Charlie
If I make the following call, then I can get the same results in Mplus (in both cases, a underlying continuous distribution would be assumed):
fit <- lavaan(HW.model, data = healthworkers)
class(healthworkers$TEAMWORK1)
I am confident that the problem arises because the variables are not being treated as categorical rather than model specification. When I pretend the variables are continuous both Mplus and lavaan outputs agree.
Having a sample R script for some example categorical data that is known to work would be ideal.
myData <- read.table("http://www.statmodel.com/usersguide/chap5/ex5.16.dat")
names(myData) <- c("u1","u2","u3","u4","u5","u6","x1","x2","x3","g")
mod5.16 <- ' # measurement invariance, except for u3
f1 =~ u1 + u2 + c(l3, l3b)*u3
f2 =~ u4 + u5 + u6
# mimic
f1 + f2 ~ x1 + x2 + x3
# equal thresholds, but free u3|1 in second group
u3 | c(u3, u3b)*t1
# fix scale of u3* to 1 in second group
u3 ~*~ c(1, 1)*u3
'
fit5.16 <- cfa(mod5.16, data = myData, ordered = paste0("u", 1:6),
# mimic = "Mplus", # optional, still not exactly identical
group = "g", group.equal = c("loadings","thresholds"))
summary(fit, fit.measures = TRUE, rsquare = TRUE)
A side note (but related): I do not understand why categorical exogenous and endogenous variables need to be treated differently (see: http://lavaan.ugent.be/tutorial/cat.html).
My variables TEAMWORK1, TEAMWORK2 etc are all exogenous variables
targetColumns <- c("TEAMWORK1","TEAMWORK2","SOCSUP1","SOCSUP2","JOBSAT1","JOBSAT2")
HW.modelB <- "
# measure endogenous variables
tmwork =~ 1*TEAMWORK1 + TEAMWORK2
socsup =~ 1*SOCSUP1 + SOCSUP2
jobsat =~ 1*JOBSAT1 + JOBSAT2
"
fit <- cfa(HW.modelB,
data = healthworkers,
ordered = targetColumns,
parameterization = "theta")
summary(fit, fit.measures = TRUE)
Estimator DWLS Robust
Model Fit Test Statistic 7.311 19.178
Degrees of freedom 6 6
HW.modelA <- "
# measure endogenous variables
tmwork =~ 1*TEAMWORK1 + TEAMWORK2
socsup =~ 1*SOCSUP1 + SOCSUP2
jobsat =~ 1*JOBSAT1 + JOBSAT2
# residual (co)variances observed variables
TEAMWORK1 ~~ TEAMWORK1
TEAMWORK2 ~~ TEAMWORK2
SOCSUP1 ~~ SOCSUP1
SOCSUP2 ~~ SOCSUP2
JOBSAT1 ~~ JOBSAT1
JOBSAT2 ~~ JOBSAT2
# factor (co)variances
tmwork ~~ tmwork + socsup + jobsat
socsup ~~ socsup + jobsat
jobsat ~~ jobsat
"
healthworkers_copy <- healthworkers
fit <- lavaan(HW.modelA,
data = healthworkers_copy,
ordered = targetColumns,
parameterization = "theta")
summary(fit, fit.measures = TRUE)
Estimator DWLS Robust
Model Fit Test Statistic 7.311 17.038
Degrees of freedom 28 28
lavInspect(fit, what='free')
# Thresholds
TEAMWORK1 | t1
TEAMWORK2 | t1
SOCSUP1 | t1
SOCSUP2 | t1
JOBSAT1 | t1
JOBSAT2 | t1
I want to use the lavaan() function which requires explicit model definition
fit <- cfa(...)
as.call(lavInspect(fit, "call"))
Thanks for the suggestion, Keith. I read the chapter attentively and although it was generally relevant and interesting, I could not find anything to elaborate on what these thresholds are
or how I might specify them in lavaan
How can this approach be applied if the data is non-normal?
The real data I am interested in analysing is actually very non-normal. In this case, is it appropriate to apply the constraints mean = 0 and var = 1
Well, the categorical data could be suggesting that the underlying latent item-response is non-normal. Either that or the LIR is normal but the participants give skewed responses. I am not sure which way to look at it.