SEM and categorical endogenous variables in lavaan

87 views

WLSMVcategoricallavaannon-convergencesemwarning

Skip to first unread message

samfi...@gmail.com

unread,

Jan 18, 2019, 2:39:58 PM1/18/19

to lavaan

List,

As I understand the implementation of the WLSMV estimator in lavaan, one can think of binary and ordinal manifest variables as arising from an underlying normally distributed variable with variance = 1. When this unobserved variable exceeds a defined threshold(s), the observed variable switches categories from a lower to higher category. I am interested in fitting the following model.

model<-

'theta =~ t*c0 + t*c3 + t*c5 + t*c9

c0 ~ 1

c3 ~ 1 + r*c0

c5 ~ 1 + r*c3

c9 ~ 1 + r*c5'

where c0 - c9 are binary variables measured over time in children aged 0,3,5,and 9. From this model, you can see that the dependence between observations over time arise from an time-stable unobserved component (theta) and temporally lagged effects of variables measured from previous measurement occasions.

Starting from the first time period, I generate a normal random variate comprised of two components - an idiosyncratic error and theta.

theta <- rnorm(n,0,1)

c0_star <- rnorm(n,0,sqrt(.75)) + .5*theta

c0 <- ifelse(c0_star > -.35,1,0)

Note that I choose the path coefficient for theta and the variance for the error component so that they sum to one ( .75 + .5^2) and then I draw a binary random variable based on a threshold of -.35.

Going forward in time gets a bit more complicated because c3 is formed from three components - idiosyncratic error, theta, and the impact from the previous time period. Choosing the variance for the error component so that the total variance is 1 requires considering an additional source of variation attributable to c0 and the covariance between c0 and theta. I am embarrassed to reveal how I avoided having to do algebra in order to accomplish this, but in the interest of full disclosure...

x <- seq(.1:1,by=.01)

test <- rep(999,length(x))

for(i in 1:length(x)){

test[i] <- x[i] + .25 + .3^2 + 2*cov(.5*theta,c0_star*.3)- 1}

plot(x,test)

cbind(x,test)

c3_star <- rnorm(n,0,sqrt(.515)) + .3*c0_star + .5*theta

c3 <- ifelse(c3_star > -.35,1,0)

var(c3_star)

Naturally, the magnitude of the idiosyncratic error goes down over time. I repeat this for the other two time periods.

c5_star <- rnorm(n,0,sqrt(.46)) + .3*c3_star + .5*theta

c5 <- ifelse(c5_star > -.35,1,0)

var(c5_star)

c9_star <- rnorm(n,0,sqrt(.45)) + .3*c5_star + .5*theta

c9 <- ifelse(c9_star > -.35,1,0)

var(c9_star)

In this artificial situation, where I have the underlying normally distributed random variables, I can fit a model directly to these. Thus,

dt2 <- as.data.frame(cbind(c0_star,c3_star,c5_star,c9_star))

names(dt2) <- c("c0","c3","c5","c9")

fit2 <- sem(model,data=dt2,std.lv = TRUE)

The parameter estimates obtained are close to the true values and repeated sampling reveals that the sampling distributions are centered on their true values. I also don't get any warnings about convergence.

parameterEstimates(fit2)

However, when I switch the binary measures,

dt <- as.data.frame(cbind(c0,c3,c5,c9))

names(dt) <- c("c0","c3","c5","c9")

dt$c0 <- ordered(dt$c0, levels = c("0","1"))

dt$c3 <- ordered(dt$c3, levels = c("0","1"))

dt$c5 <- ordered(dt$c5, levels = c("0","1"))

dt$c9 <- ordered(dt$c9, levels = c("0","1"))

The parameter estimates are clearly off (not dramatically) and I get warnings about convergence.

Is my logic correct? Should I be able to recover the true values for the parameter estimates (in expectation) using both approaches? With slightly simpler models I have found this to be the case...

If you read this far, THANK YOU!

Sam

Reply all

Reply to author

Forward

0 new messages