# Different model fit with different ways of using multiply imputed data

201 views

### toe...@ohlsen-web.de

Jan 11, 2017, 7:42:54 AM1/11/17
to lavaan
Hi everyone,

out of interest, i tried different ways of using multiple imputation in lavaan. Three different ways to be precise.
1.) Multiple Imputation and model estimation in one step with runMI()
2.) Multiple Imputation with mice first, then passing the previously imputed data to runMI()
3.) Multiple Imputation with mice first, then passing the imputed data to a svydesign-object and analyse it in lavaan.survey. BUT: Without specifying any samplingdesign r weight.

I used the HolzingerSwineford-Data. Something must have gone wrong, because i get three different chi-square values.

`#--------------------------# Setting up example data and model#--------------------------# Create data with missingsset.seed(20170110)HSMiss <- HolzingerSwineford1939[,paste("x", 1:9, sep="")]randomMiss <- rbinom(prod(dim(HSMiss)), 1, 0.1)randomMiss <- matrix(as.logical(randomMiss), nrow=nrow(HSMiss))HSMiss[randomMiss] <- NA# lavaan modelHS.model <- ' visual  =~ x1 + x2 + x3textual =~ x4 + x5 + x6speed   =~ x7 + x8 + x9 '#------------------------------------------------------------------# Variant 1: Imputation and model estimation with runmI#-------------------------------------------------------------------# run lavaan and imputation in one stepout1 <- runMI(HS.model,               data=HSMiss,              m = 5,               miPackage="mice",              fun="cfa",              meanstructure = TRUE)summary(out1)fitMeasures(out1, "chisq")#------------------------------------------------------------------# Variant 2: Imputation in step 1 and model estimation in step 2 with runMI#-------------------------------------------------------------------# impute data firstHSMiss_imp<-mice(HSMiss, m = 5)mice.imp <- NULLfor(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp,"long",inc=FALSE)  # run lavaan with previously imputed data using runMIout2 <- runMI(HS.model,               data=mice.imp,              fun="cfa",              meanstructure = TRUE)summary(out2)fitMeasures(out2, "chisq") #extremely high chi-square#------------------------------------------------------------------# Variant 3: Imputation in step 1 and model estimation in step 2 with lavaan.survey (but without weights)#-------------------------------------------------------------------# take previously imputed data from variant 2 and convert it to svydesign-objectmice.imp2<-lapply(seq(HSMiss_imp\$m),function(im) complete(HSMiss_imp,im))mice.imp2<-mitools::imputationList(mice.imp2) svy.df_imp<-survey::svydesign(id=~1,weights=~1,data=mice.imp2)                    # fit model with lavaan.surveylavaan_fit_HS.model<-cfa(HS.model, meanstructure = TRUE)out3<-lavaan.survey(lavaan_fit_HS.model, svy.df_imp)summary(out3)fitMeasures(out3, "chisq")`

It´s more a matter of personal interest for me, but it would be great to have some help, figuring out, why i get three different chisquares.

`> fitMeasures(out1, "chisq") chisq 73.841 > fitMeasures(out2, "chisq")  chisq 483.637 > fitMeasures(out3, "chisq") chisq 96.748 `

### Daniel Oberski

Jan 11, 2017, 3:42:55 PM1/11/17
to lavaan
Hi Niels

I don't know why out1 and out2 differ but I can offer an explanation as to why lavaan.survey's output is not identical to either.

lavaan.survey deals with multiply imputed data in a slightly different way than the other methods.

Usually it is:
1. Obtain a covariance matrix S_m in each of the M imputed datasets;
2. Estimate the model on each covariance matrix S_m -> obtain M parameter estimates and M asymptotic variance-covariance matrices of the estimates given that imputation ("within-imputation variance");
3. Pool estimates and variances according to "Rubin's rules" (combine within- and between-imputation variance).

lavaan.survey propagates the estimates and errors entirely through a single pooled covariance matrix and its asymptotic var-cov matrix, so the procedure is:
1. Obtain a covariance matrix S_m in each of the M imputed datasets as well as its asy var-cov matrix under complex sampling (within-imputation);
2. Pool the M covariance matrices to obtain pooled covariance matrix S and its asy variance Gamma according to "Rubin's rules" (combining within- and between variance);
3. Estimate the model using (for instance & by default) MLM with S as observed covariance matrix and Gamma as the ACOV.

More details and other options are given in the JSS lavaan.survey paper.

Both procedures are consistent, meaning they will "tend to" the same estimates and standard errors (omitting complex sampling issues) as sample size increases. However, they are not necessarily identical in finite samples.

Best, Daniel

### Mauricio Garnier-Villarreal

Jan 11, 2017, 4:50:40 PM1/11/17
to lavaan
Hi Niels

The difference is due to an error on how you are putting together the imputed data sets from mice

Just change this line
mice.imp <- NULL
for(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp, action=i, inc=FALSE)

When you were using the action "long" you were stacking the 5 imputed data sets in 1 data set, 5 times. With action=i, you are putting each imputed data set in a separate object of the list.

Here the difference can be seen as the difference from imputations

> fitMeasures(out1, "chisq")
chisq
73.841
> fitMeasures(out2, "chisq")
chisq
71.974
> fitMeasures(out3, "chisq")
chisq
95.303

bye
Message has been deleted