Hi All,
I'm using survey.lavaan to correct for clustering in a dataset, and getting unexpected results. Specifically, when I use survey.lavaan, the chi-square is inflated dramatically. I wondered if this was because of missing data.
Reproducible example below.
Summary:
I generate data with 6 items and 1000 cases to fit a single factor model.
Fitting with ML gives me chi-square of 8.5 (df = 9).
Then I randomly set ~20% of each variable to missing, and fit the model again.
Fitting with ML gives me chi-square of 10.4 (df = 9).
Then I randomly assign each case to 1 of 100 clusters (each of size 10), set up a survey design, and run survey.lavaan. Chi-square now = 24.5 (9 df).
My hunch is that the survey.lavaan is not taking into account that there is missing data. Or that I've made a mistake in my survey setup. If anyone can shed light on this, I'd be grateful.
Thanks,
Jeremy
Here is the code for the example:
set.seed(1234)
#Generate data to fit a one factor model
F <- rnorm(1000)
x1 <- rnorm(1000) + F
x2 <- rnorm(1000) + F
x3 <- rnorm(1000) + F
x4 <- rnorm(1000) + F
x5 <- rnorm(1000) + F
x6 <- rnorm(1000) + F
d <- as.data.frame(cbind(x1, x2, x3, x4, x5, x6))
d$cluster <- rep(c(1:10), 100)
cfaModel <- "f =~ x1 + x2 + x3 + x4 + x5 + x6"
cfaFit <- cfa(cfaModel, data=d, estimator="ML", missing="ml",
std.lv=FALSE)
summary(cfaFit)
#Randomly set ~20% of cases to missing
dm <- d
dm$x1 <- ifelse(runif(1000) < 0.2, NA, d$x1)
dm$x2 <- ifelse(runif(1000) < 0.2, NA, d$x2)
dm$x3 <- ifelse(runif(1000) < 0.2, NA, d$x3)
dm$x4 <- ifelse(runif(1000) < 0.2, NA, d$x4)
dm$x5 <- ifelse(runif(1000) < 0.2, NA, d$x5)
dm$x6 <- ifelse(runif(1000) < 0.2, NA, d$x6)
#Fit missing data model
cfaFitm <- cfa(cfaModel, data=dm, estimator="ML", missing="direct",
std.lv=FALSE)
summary(cfaFitm)
#Set up survey design
survey.design <- svydesign(ids=~cluster, probs=NULL, data=dm)
#Correct for clustering
cfaFitms <- lavaan.survey(cfaFitm, survey.design, estimator="ML")
summary(cfaFitms)