SEM is not identified with categorical variables while fits well with continuous variables

150 views
Skip to first unread message

tran viet Yen

unread,
May 18, 2018, 7:38:15 PM5/18/18
to lavaan

Dear Lavaan users,

I'm getting trouble with the problem of ordered variables in SEM with Lavaan. I don't know whether it is a problem with the program or with my model. Let met describe my model.

I attached the Lavaan.csv data file for this model with 200 samples. This is my code:


library(lavaan)
Table <- read.csv("Lavaan.csv",header = TRUE,sep = ",",dec = ".")
M
<-as.matrix(Table)


#Run model with continuous variables
model
<- '
y1 =~ x1
y2 =~ x2
y3 =~ x3 + x4 + x5
y4 =~ x6 + x7 + x8
y5 =~ x9
y6 =~ x10 + x11
# regressions
y1 ~ y3 + y5
y3 ~ y2 + y4 + y5 + y6
y2 ~ y6
y4 ~ y6
#Correlation
y2 ~~ y4
y2 ~~ y5
y4 ~~ y5
'

fit
<- sem(model,sample.nobs = 200,estimator = "DWLS",data=M,std.lv=TRUE)
fitMeasures
(fit,c("CFI","TLI","RMSEA","GFI","NFI","SRMR"))

All of my observed variables (indicators) are from a 5-points Likert scale. In the first attempt, I treated all of them as continous variables, then I run the model with DWLS estimator and the model is identified with very good fit.

> fit <- sem(model,sample.nobs = 200,estimator = "DWLS",data=M,std.lv=TRUE)
> fitMeasures(fit,c("CFI","TLI","RMSEA","GFI","NFI","SRMR"))
  cfi   tli rmsea   gfi   nfi  srmr
1.000 1.011 0.000 0.990 0.978 0.056

However, when I consider these variables as categorical variables by defining them as "ordered", the model was not identified:

#Run model with categorical variables


#Define ordered variables
M
<- data.frame(M)
M$x1
<- ordered(M$x1,levels = c("1","2","3","4","5"))
M$x2
<- ordered(M$x2,levels = c("1","2","3","4","5"))
M$x3
<- ordered(M$x3,levels = c("1","2","3","4","5"))
M$x4
<- ordered(M$x4,levels = c("1","2","3","4","5"))
M$x5
<- ordered(M$x5,levels = c("1","2","3","4","5"))
M$x6
<- ordered(M$x6,levels = c("1","2","3","4","5"))
M$x7
<- ordered(M$x7,levels = c("1","2","3","4","5"))
M$x8
<- ordered(M$x8,levels = c("1","2","3","4","5"))
M$x9
<- ordered(M$x9,levels = c("1","2","3","4","5"))
M$x10
<- ordered(M$x10,levels = c("1","2","3","4","5"))
M$x11
<- ordered(M$x11,levels = c("1","2","3","4","5"))


fit
<- sem(model,sample.nobs = 200,estimator = "DWLS",data=M,std.lv=TRUE)


Here is the result of the model with categorical variables:

> fit <- sem(model,sample.nobs = 200,estimator = "DWLS",data=M,std.lv=TRUE)
Warning messages:
1: In lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing,  :
  lavaan WARNING
: 54 bivariate tables have empty cells; to see them, use:
                  lavInspect
(fit, "zero.cell.tables")
2: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING
: could not compute standard errors!
  lavaan NOTE
: this may be a symptom that the model is not identified.


3: In lav_object_post_check(object) :
  lavaan WARNING
: covariance matrix of latent variables
               
is not positive definite;
               
use inspect(fit,"cov.lv") to investigate.

So this is what I cannot understand. Why with the same model, changing from continous variables to categorical variables cause the model from identified to unidentified? Is there any wrong in my syntax?

As I understand, the empty cell error in the bivariate table does not affect the identification.

Another question, if the variables are considered as continuous, is it possible to still use DWLS estimator instead of WLS estimator? My data is not multinormal so I cannot use ML estimator. When I switch to WLS, the model fit decrease significantly

> fit <- sem(model,sample.nobs = 200,estimator = "WLS",data=M,std.lv=TRUE)
> fitMeasures(fit,c("CFI","TLI","RMSEA","GFI","NFI","SRMR"))
  cfi   tli rmsea   gfi   nfi  srmr
0.670 0.481 0.107 0.904 0.613 0.215

Thank you very very much for taking time with my case. It was stressed with this problem. If variables cannot be considered as categorical, I have to use WLS estimator. However if I use WLS estimator, the model fit is very low (e.g. cfi at 0.67) and it suggests a bad model.

Best regads,

Tran Viet Yen.


Lavaan.csv
Lavaan.R

Yves Rosseel

unread,
Jul 30, 2018, 2:23:28 PM7/30/18
to lav...@googlegroups.com
> model <-'
> y1 =~ x1

This is the problem: a single-indicator is not identified if the
indicator is categorical!

It works in the continuous case, because lavaan uses the auto.fix.single
argument.

But you can use the indicators directly in the regression formulas:

model <- '
y3 =~ x3 + x4 + x5
y4 =~ x6 + x7 + x8
y6 =~ x10 + x11
# regressions
x1 ~ y3 + y5
y3 ~ x2 + y4 + x9 + y6
y2 ~ y6
y4 ~ y6
'

This does not make sense to me:

#Correlation
x2 ~~ y4
x2 ~~ y5
y4 ~~ x9


Yves.
Reply all
Reply to author
Forward
0 new messages