List,
As I understand the implementation of the WLSMV estimator in lavaan, one can think of binary and ordinal manifest variables as arising from an underlying normally distributed variable with variance = 1. When this unobserved variable exceeds a defined threshold(s), the observed variable switches categories from a lower to higher category. I am interested in fitting the following model.
model<-
'theta =~ t*c0 + t*c3 + t*c5 + t*c9
c0 ~ 1
c3 ~ 1 + r*c0
c5 ~ 1 + r*c3
c9 ~ 1 + r*c5'
where c0 - c9 are binary variables measured over time in children aged 0,3,5,and 9. From this model, you can see that the dependence between observations over time arise from an time-stable unobserved component (theta) and temporally lagged effects of variables measured from previous measurement occasions.
Starting from the first time period, I generate a normal random variate comprised of two components - an idiosyncratic error and theta.
theta <- rnorm(n,0,1)
c0_star <- rnorm(n,0,sqrt(.75)) + .5*theta
c0 <- ifelse(c0_star > -.35,1,0)
Note that I choose the path coefficient for theta and the variance for the error component so that they sum to one ( .75 + .5^2) and then I draw a binary random variable based on a threshold of -.35.
Going forward in time gets a bit more complicated because c3 is formed from three components - idiosyncratic error, theta, and the impact from the previous time period. Choosing the variance for the error component so that the total variance is 1 requires considering an additional source of variation attributable to c0 and the covariance between c0 and theta. I am embarrassed to reveal how I avoided having to do algebra in order to accomplish this, but in the interest of full disclosure...
x <- seq(.1:1,by=.01)
test <- rep(999,length(x))
for(i in 1:length(x)){
test[i] <- x[i] + .25 + .3^2 + 2*cov(.5*theta,c0_star*.3)- 1}
plot(x,test)
cbind(x,test)
c3_star <- rnorm(n,0,sqrt(.515)) + .3*c0_star + .5*theta
c3 <- ifelse(c3_star > -.35,1,0)
var(c3_star)
Naturally, the magnitude of the idiosyncratic error goes down over time. I repeat this for the other two time periods.
c5_star <- rnorm(n,0,sqrt(.46)) + .3*c3_star + .5*theta
c5 <- ifelse(c5_star > -.35,1,0)
var(c5_star)
c9_star <- rnorm(n,0,sqrt(.45)) + .3*c5_star + .5*theta
c9 <- ifelse(c9_star > -.35,1,0)
var(c9_star)
In this artificial situation, where I have the underlying normally distributed random variables, I can fit a model directly to these. Thus,
dt2 <- as.data.frame(cbind(c0_star,c3_star,c5_star,c9_star))
names(dt2) <- c("c0","c3","c5","c9")
fit2 <- sem(model,data=dt2,
std.lv = TRUE)
The parameter estimates obtained are close to the true values and repeated sampling reveals that the sampling distributions are centered on their true values. I also don't get any warnings about convergence.
parameterEstimates(fit2)
However, when I switch the binary measures,
dt <- as.data.frame(cbind(c0,c3,c5,c9))
names(dt) <- c("c0","c3","c5","c9")
dt$c0 <- ordered(dt$c0, levels = c("0","1"))
dt$c3 <- ordered(dt$c3, levels = c("0","1"))
dt$c5 <- ordered(dt$c5, levels = c("0","1"))
dt$c9 <- ordered(dt$c9, levels = c("0","1"))
The parameter estimates are clearly off (not dramatically) and I get warnings about convergence.
Is my logic correct? Should I be able to recover the true values for the parameter estimates (in expectation) using both approaches? With slightly simpler models I have found this to be the case...
If you read this far, THANK YOU!
Sam