Different model fit with different ways of using multiply imputed data

toe...@ohlsen-web.de

unread,

Jan 11, 2017, 7:42:54 AM1/11/17

to lavaan

Hi everyone,

out of interest, i tried different ways of using multiple imputation in lavaan. Three different ways to be precise.
1.) Multiple Imputation and model estimation in one step with runMI()
2.) Multiple Imputation with mice first, then passing the previously imputed data to runMI()
3.) Multiple Imputation with mice first, then passing the imputed data to a svydesign-object and analyse it in lavaan.survey. BUT: Without specifying any samplingdesign r weight.

I used the HolzingerSwineford-Data. Something must have gone wrong, because i get three different chi-square values.

#--------------------------
# Setting up example data and model
#--------------------------

# Create data with missings
set.seed(20170110)
HSMiss <- HolzingerSwineford1939[,paste("x", 1:9, sep="")]
randomMiss <- rbinom(prod(dim(HSMiss)), 1, 0.1)
randomMiss <- matrix(as.logical(randomMiss), nrow=nrow(HSMiss))
HSMiss[randomMiss] <- NA

# lavaan model
HS.model <- ' visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9 '


#------------------------------------------------------------------
# Variant 1: Imputation and model estimation with runmI
#-------------------------------------------------------------------

# run lavaan and imputation in one step
out1 <- runMI(HS.model, 
              data=HSMiss,
              m = 5, 
              miPackage="mice",
              fun="cfa",
              meanstructure = TRUE)
summary(out1)
fitMeasures(out1, "chisq")

#------------------------------------------------------------------
# Variant 2: Imputation in step 1 and model estimation in step 2 with runMI
#-------------------------------------------------------------------

# impute data first
HSMiss_imp<-mice(HSMiss, m = 5)
mice.imp <- NULL
for(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp,"long",inc=FALSE)  

# run lavaan with previously imputed data using runMI
out2 <- runMI(HS.model, 
              data=mice.imp,
              fun="cfa",
              meanstructure = TRUE)
summary(out2)
fitMeasures(out2, "chisq") #extremely high chi-square

#------------------------------------------------------------------
# Variant 3: Imputation in step 1 and model estimation in step 2 with lavaan.survey (but without weights)
#-------------------------------------------------------------------

# take previously imputed data from variant 2 and convert it to svydesign-object
mice.imp2<-lapply(seq(HSMiss_imp$m),function(im) complete(HSMiss_imp,im))
mice.imp2<-mitools::imputationList(mice.imp2) 
svy.df_imp<-survey::svydesign(id=~1,weights=~1,data=mice.imp2)                    


# fit model with lavaan.survey
lavaan_fit_HS.model<-cfa(HS.model, meanstructure = TRUE)
out3<-lavaan.survey(lavaan_fit_HS.model, svy.df_imp)
summary(out3)
fitMeasures(out3, "chisq")

It´s more a matter of personal interest for me, but it would be great to have some help, figuring out, why i get three different chisquares.


> fitMeasures(out1, "chisq")
 chisq 
73.841 
> fitMeasures(out2, "chisq")
  chisq 
483.637 
> fitMeasures(out3, "chisq")
 chisq 
96.748

Daniel Oberski

unread,

Jan 11, 2017, 3:42:55 PM1/11/17

to lavaan

Hi Niels

I don't know why out1 and out2 differ but I can offer an explanation as to why lavaan.survey's output is not identical to either.

lavaan.survey deals with multiply imputed data in a slightly different way than the other methods.

Usually it is:

Obtain a covariance matrix S_m in each of the M imputed datasets;
Estimate the model on each covariance matrix S_m -> obtain M parameter estimates and M asymptotic variance-covariance matrices of the estimates given that imputation ("within-imputation variance");
Pool estimates and variances according to "Rubin's rules" (combine within- and between-imputation variance).

lavaan.survey propagates the estimates and errors entirely through a single pooled covariance matrix and its asymptotic var-cov matrix, so the procedure is:

Obtain a covariance matrix S_m in each of the M imputed datasets as well as its asy var-cov matrix under complex sampling (within-imputation);
Pool the M covariance matrices to obtain pooled covariance matrix S and its asy variance Gamma according to "Rubin's rules" (combining within- and between variance);
Estimate the model using (for instance & by default) MLM with S as observed covariance matrix and Gamma as the ACOV.

More details and other options are given in the JSS lavaan.survey paper.

Both procedures are consistent, meaning they will "tend to" the same estimates and standard errors (omitting complex sampling issues) as sample size increases. However, they are not necessarily identical in finite samples.

Best, Daniel

Mauricio Garnier-Villarreal

unread,

Jan 11, 2017, 4:50:40 PM1/11/17

to lavaan

Hi Niels

The difference is due to an error on how you are putting together the imputed data sets from mice

Just change this line
mice.imp <- NULL
for(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp, action=i, inc=FALSE)

When you were using the action "long" you were stacking the 5 imputed data sets in 1 data set, 5 times. With action=i, you are putting each imputed data set in a separate object of the list.

Here the difference can be seen as the difference from imputations

> fitMeasures(out1, "chisq")
chisq
73.841
> fitMeasures(out2, "chisq")
chisq

71.974
> fitMeasures(out3, "chisq")
chisq
95.303

bye

Message has been deleted

toe...@ohlsen-web.de

unread,

Jan 12, 2017, 3:04:48 AM1/12/17

to lavaan

Thank you so much for your answers, Daniel and Mauricio! That cleared everything up for me =)

Reply all

Reply to author

Forward