survey.lavaan giving unexpected results with missing data

393 views
Skip to first unread message

Jeremy Miles

unread,
Feb 19, 2014, 2:09:59 PM2/19/14
to lav...@googlegroups.com

Hi All,

I'm using survey.lavaan to correct for clustering in a dataset, and getting unexpected results. Specifically, when I use survey.lavaan, the chi-square is inflated dramatically. I wondered if this was because of missing data.

Reproducible example below.

Summary:
I generate data with 6 items and 1000 cases to fit a single factor model. 
Fitting with ML gives me chi-square of 8.5 (df = 9).

Then I randomly set ~20% of each variable to missing, and fit the model again.
Fitting with ML gives me chi-square of 10.4 (df = 9).

Then I randomly assign each case to 1 of 100 clusters (each of size 10), set up a survey design, and run survey.lavaan. Chi-square now = 24.5 (9 df).

My hunch is that the survey.lavaan is not taking into account that there is missing data. Or that I've made a mistake in my survey setup. If anyone can shed light on this, I'd be grateful.

Thanks,

Jeremy



Here is the code for the example:


set.seed(1234)


#Generate data to fit a one factor model
F <- rnorm(1000)
x1 <- rnorm(1000) + F
x2 <- rnorm(1000) + F
x3 <- rnorm(1000) + F
x4 <- rnorm(1000) + F
x5 <- rnorm(1000) + F
x6 <- rnorm(1000) + F

d <- as.data.frame(cbind(x1, x2, x3, x4, x5, x6))

d$cluster <- rep(c(1:10), 100)

cfaModel <- "f =~  x1 + x2 + x3 + x4 + x5 + x6"

cfaFit <- cfa(cfaModel, data=d, estimator="ML", missing="ml", std.lv=FALSE)
summary(cfaFit)


#Randomly set ~20% of cases to missing
dm <- d

dm$x1 <- ifelse(runif(1000) < 0.2, NA, d$x1)
dm$x2 <- ifelse(runif(1000) < 0.2, NA, d$x2)
dm$x3 <- ifelse(runif(1000) < 0.2, NA, d$x3)
dm$x4 <- ifelse(runif(1000) < 0.2, NA, d$x4)
dm$x5 <- ifelse(runif(1000) < 0.2, NA, d$x5)
dm$x6 <- ifelse(runif(1000) < 0.2, NA, d$x6)

#Fit missing data model
cfaFitm <- cfa(cfaModel, data=dm, estimator="ML", missing="direct", std.lv=FALSE)
summary(cfaFitm)

#Set up survey design
survey.design <- svydesign(ids=~cluster, probs=NULL, data=dm)

#Correct for clustering
cfaFitms <- lavaan.survey(cfaFitm, survey.design, estimator="ML")
summary(cfaFitms)

Daniel Oberski

unread,
Feb 21, 2014, 6:16:25 AM2/21/14
to lav...@googlegroups.com
Hi Jeremy


There is a combination of issues here. One is that 

    lavaan.survey(cfaFitm, survey.design, estimator="ML")

should throw an error. The fact that output is given instead of an error is a bug, which I will correct in the upcoming version. Thanks for bringing my attention to it. (An incorrect chi-square is indeed given, but this has nothing to do with the clustering. It is because the FIML-estimated variance-covariance matrix is taken as being based on 1000 cases when in fact it is much more uncertain than that due to the missing data). 

Instead it should be 

        lavaan.survey(cfaFitm, survey.design)

However, when you use  this with cfaFitm, an error is also thrown (correctly). The reason for that is that there is currently no implementation of the combination of FIML and survey weights/clustering/stratification. This would require an adjusted ("MLR-complex") estimator that is not implemented (yet?) in lavaan. 

However, it is possible to deal with missing data by using multiple imputation. This is described in the lavaan.survey paper (accepted to the Journal of Statistical Software, preprint here: http://daob.nl/publications/ .

For your example, you could use

# Fit using listwise deletion to get a starting model
cfaFitmm <- cfa(cfaModel, data=dm, estimator="ML", std.lv=FALSE)

# Perform multiple imputation, for example using mice (but any package/software can be used as long as it produces a list of dataframes)
library(mice)
dm.imp <- mice(dm)
# After using mice, this would be a way of getting a list of dataframes:
dm.implist <- lapply(seq(dm.imp$m), function(im) complete(dm.imp, im))

# The generic mi library mitools can be used with any multiple imputation program
library(mitools)
dm.implist <- imputationList(dm.implist)

# Create a survey design object with the imputation list as data
des.dm.imp <- svydesign(ids=~cluster, prob=~1, data=dm.implist)

# Now just fit the lavaan.survey analysis as usual
cfaFitms <- lavaan.survey(cfaFitmm, des.dm.imp)

# lavaan.survey takes care of the clustering etc but also of the multiple imputations in calculating standard errors and chi-square.
summary(cfaFitms)


Robert Bodily

unread,
Jul 2, 2015, 5:41:45 PM7/2/15
to lav...@googlegroups.com
Hi Daniel,

You said "The reason for that is that there is currently no implementation of the combination of FIML and survey weights/clustering/stratification. This would require an adjusted ("MLR-complex") estimator that is not implemented (yet?) in lavaan. "

Has this been implemented in lavaan or lavaan.survey yet?

Thanks!

Bob
Reply all
Reply to author
Forward
0 new messages