153 views

Skip to first unread message

Jan 11, 2017, 7:42:54 AM1/11/17

to lavaan

Hi everyone,

out of interest, i tried different ways of using multiple imputation in lavaan. Three different ways to be precise.

1.) Multiple Imputation and model estimation in one step with runMI()

2.) Multiple Imputation with mice first, then passing the previously imputed data to runMI()

3.) Multiple Imputation with mice first, then passing the imputed data to a svydesign-object and analyse it in lavaan.survey. BUT: Without specifying any samplingdesign r weight.

I used the HolzingerSwineford-Data. Something must have gone wrong, because i get three different chi-square values.

It´s more a matter of personal interest for me, but it would be great to have some help, figuring out, why i get three different chisquares.

out of interest, i tried different ways of using multiple imputation in lavaan. Three different ways to be precise.

1.) Multiple Imputation and model estimation in one step with runMI()

2.) Multiple Imputation with mice first, then passing the previously imputed data to runMI()

3.) Multiple Imputation with mice first, then passing the imputed data to a svydesign-object and analyse it in lavaan.survey. BUT: Without specifying any samplingdesign r weight.

I used the HolzingerSwineford-Data. Something must have gone wrong, because i get three different chi-square values.

`#--------------------------`

# Setting up example data and model

#--------------------------

# Create data with missings

set.seed(20170110)

HSMiss <- HolzingerSwineford1939[,paste("x", 1:9, sep="")]

randomMiss <- rbinom(prod(dim(HSMiss)), 1, 0.1)

randomMiss <- matrix(as.logical(randomMiss), nrow=nrow(HSMiss))

HSMiss[randomMiss] <- NA

# lavaan model

HS.model <- ' visual =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed =~ x7 + x8 + x9 '

#------------------------------------------------------------------

# Variant 1: Imputation and model estimation with runmI

#-------------------------------------------------------------------

# run lavaan and imputation in one step

out1 <- runMI(HS.model,

data=HSMiss,

m = 5,

miPackage="mice",

fun="cfa",

meanstructure = TRUE)

summary(out1)

fitMeasures(out1, "chisq")

#------------------------------------------------------------------

# Variant 2: Imputation in step 1 and model estimation in step 2 with runMI

#-------------------------------------------------------------------

# impute data first

HSMiss_imp<-mice(HSMiss, m = 5)

mice.imp <- NULL

for(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp,"long",inc=FALSE)

# run lavaan with previously imputed data using runMI

out2 <- runMI(HS.model,

data=mice.imp,

fun="cfa",

meanstructure = TRUE)

summary(out2)

fitMeasures(out2, "chisq") #extremely high chi-square

#------------------------------------------------------------------

# Variant 3: Imputation in step 1 and model estimation in step 2 with lavaan.survey (but without weights)

#-------------------------------------------------------------------

# take previously imputed data from variant 2 and convert it to svydesign-object

mice.imp2<-lapply(seq(HSMiss_imp$m),function(im) complete(HSMiss_imp,im))

mice.imp2<-mitools::imputationList(mice.imp2)

svy.df_imp<-survey::svydesign(id=~1,weights=~1,data=mice.imp2)

# fit model with lavaan.survey

lavaan_fit_HS.model<-cfa(HS.model, meanstructure = TRUE)

out3<-lavaan.survey(lavaan_fit_HS.model, svy.df_imp)

summary(out3)

fitMeasures(out3, "chisq")

It´s more a matter of personal interest for me, but it would be great to have some help, figuring out, why i get three different chisquares.

> fitMeasures(out1, "chisq")

chisq

73.841

> fitMeasures(out2, "chisq")

chisq

483.637

> fitMeasures(out3, "chisq")

chisq

96.748

Jan 11, 2017, 3:42:55 PM1/11/17

to lavaan

Hi Niels

**Usually **it is:

**lavaan.survey **propagates the estimates and errors entirely through a single pooled covariance matrix and its asymptotic var-cov matrix, so the procedure is:

I don't know why out1 and out2 differ but I can offer an explanation as to why lavaan.survey's output is not identical to either.

lavaan.survey deals with multiply imputed data in a slightly different way than the other methods.

- Obtain a covariance matrix
*S_m*in each of the*M*imputed datasets; - Estimate the model on each covariance matrix
*S_m*-> obtain*M*parameter estimates and*M*asymptotic variance-covariance matrices of the estimates given that imputation ("within-imputation variance"); - Pool estimates and variances according to "Rubin's rules" (combine within- and between-imputation variance).

- Obtain a covariance matrix
*S_m*in each of the*M*imputed datasets as well as its asy var-cov matrix under complex sampling (within-imputation); - Pool the
*M*covariance matrices to obtain pooled covariance matrix S and its asy variance*Gamma*according to "Rubin's rules" (combining within- and between variance); - Estimate the model using (for instance & by default) MLM with
*S*as observed covariance matrix and*Gamma*as the ACOV.

More details and other options are given in the JSS lavaan.survey paper.

Both procedures are consistent, meaning they will "tend to" the same estimates and standard errors (omitting complex sampling issues) as sample size increases. However, they are not necessarily identical in finite samples.

Best, Daniel

Jan 11, 2017, 4:50:40 PM1/11/17

to lavaan

Hi Niels

The difference is due to an error on how you are putting together the imputed data sets from mice

Just change this line

mice.imp <- NULL

for(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp, action=i, inc=FALSE)

When you were using the action "long" you were stacking the 5 imputed data sets in 1 data set, 5 times. With action=i, you are putting each imputed data set in a separate object of the list.

Here the difference can be seen as the difference from imputations

The difference is due to an error on how you are putting together the imputed data sets from mice

Just change this line

mice.imp <- NULL

for(i in 1:5) mice.imp[[i]] <- complete(HSMiss_imp, action=i, inc=FALSE)

When you were using the action "long" you were stacking the 5 imputed data sets in 1 data set, 5 times. With action=i, you are putting each imputed data set in a separate object of the list.

Here the difference can be seen as the difference from imputations

> fitMeasures(out1, "chisq")

chisq

73.841

> fitMeasures(out2, "chisq")

chisq

71.974

> fitMeasures(out3, "chisq")

chisq

95.303

bye

> fitMeasures(out3, "chisq")

chisq

95.303

bye

Message has been deleted

Jan 12, 2017, 3:04:48 AM1/12/17

to lavaan

Thank you so much for your answers, Daniel and Mauricio! That cleared everything up for me =)

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu