Multiple imputation followed by wlsmv

David Kaplan

unread,

Mar 27, 2014, 5:40:54 PM3/27/14

to lav...@googlegroups.com

Greetings,

I'm running the following code to multiply impute data using predictive mean matching

---------

datami <- mice(scimodel, m=5,seed=12345,method=c("pmm","pmm","pmm","pmm","pmm","pmm","pmm"))

mice.imp <- NULL

m=5

for(i in 1:m) {

mice.imp[[i]] <- complete(datami,action=i,include=FALSE)

}

out <- runMI(pathmodel,data=mice.imp,chi="mplus",fun="sem")

summary(out)

------------------------------

The imputation work just fine. The problem is that I can't seem to also deal with the categorical nature of the data using, say "wlsmv". My fear is that this means that I can't directly deal with non-normal categorical data AND missing data via multiple imputation. Is there a work around? One would want to be able to handle these two violations of the standard assumptions at the same time.

Thanks in advance,

David

Alex Schoemann

unread,

Mar 27, 2014, 8:35:38 PM3/27/14

to lav...@googlegroups.com

Hi David,

You certainly should be able to deal with both missing data and categorical indicators! In your script, you're missing the bit that tells lavaan which variables are categorical. At the end of the runMI command you can add any arguments you'd like passed to lavaan. In the case of categorical data you want to include the ordered argument with the names of the categorical variables. I've included an example of this below.

One other note, right now with categorical data the chi="Mplus" option will kick back an error (the only option for chi square combination is the lmrr method).

Hope this helps,

Alex

Example (adapted from the runMI documentation and simsem examples):

library(semTools)

library(lavaan)

library(mice)

popModel <- "

f1 =~ 0.6*y1 + 0.6*y2 + 0.6*y3 + 0.6*y4

y1 ~*~ 1*y1

y2 ~*~ 1*y2

y3 ~*~ 1*y3

y4 ~*~ 1*y4

f1 ~~ 1*f1

y1 | 0.5*t1

y2 | 0.25*t1

y3 | 0*t1

y4 | -0.5*t1

"

analyzeModel <- "

f1 =~ y1 + y2 + y3 + y4

y1 ~*~ 1*y1

y2 ~*~ 1*y2

y3 ~*~ 1*y3

y4 ~*~ 1*y4

"

dat <- simulateData(popModel, sample.nobs = 200L)

miss.pat <- matrix(as.logical(rbinom(prod(dim(dat)), 1, 0.2)), nrow(dat), ncol(dat))

dat[miss.pat] <- NA

datami <- mice(dat, m=5,seed=12345,method=c("pmm","pmm","pmm","pmm"))

mice.imp <- NULL

m=5

for(i in 1:m) {

mice.imp[[i]] <- complete(datami,action=i,include=FALSE)

}

out1 <- runMI(analyzeModel,data=mice.imp, chi = "all", ordered=paste0("y", 1:4), fun="sem")

summary(out1)

dka...@education.wisc.edu

unread,

Mar 27, 2014, 11:41:40 PM3/27/14

to lav...@googlegroups.com

Alex,

Thanks. I must be missing something subtle here. The model runs fine. This my MI script

datami <- mice(scimodel, m=5,seed=12345,method=c("pmm","pmm","pmm","pmm","pmm","pmm","pmm"))

mice.imp <- NULL

m=5

for(i in 1:m) {

mice.imp[[i]] <- complete(datami,action=i,include=FALSE)

}

out <- runMI(pathmodel,data=mice.imp,chi="lmrr",ordered=paste("y", 1:7),fun="sem")

summary(out)

and this is my output

Length Class Mode

d0 1 -none- numeric

d1 1 -none- numeric

d0.ci 2 -none- numeric

d1.ci 2 -none- numeric

d0.p 1 -none- numeric

d1.p 1 -none- numeric

d0.sims 50 -none- numeric

d1.sims 50 -none- numeric

z0 1 -none- numeric

z1 1 -none- numeric

z0.ci 2 -none- numeric

z1.ci 2 -none- numeric

z0.p 1 -none- numeric

z1.p 1 -none- numeric

z0.sims 50 -none- numeric

z1.sims 50 -none- numeric

n0 1 -none- numeric

n1 1 -none- numeric

n0.ci 2 -none- numeric

n1.ci 2 -none- numeric

n0.p 1 -none- numeric

n1.p 1 -none- numeric

n0.sims 50 -none- numeric

n1.sims 50 -none- numeric

tau.coef 1 -none- numeric

tau.ci 2 -none- numeric

tau.p 1 -none- numeric

tau.sims 50 -none- numeric

d.avg 1 -none- numeric

d.avg.p 1 -none- numeric

d.avg.ci 2 -none- numeric

d.avg.sims 50 -none- numeric

z.avg 1 -none- numeric

z.avg.p 1 -none- numeric

z.avg.ci 2 -none- numeric

z.avg.sims 50 -none- numeric

n.avg 1 -none- numeric

n.avg.p 1 -none- numeric

n.avg.ci 2 -none- numeric

n.avg.sims 50 -none- numeric

boot 1 -none- logical

treat 1 -none- character

mediator 1 -none- character

covariates 0 -none- NULL

INT 1 -none- logical

conf.level 1 -none- numeric

model.y 13 lm list

model.m 30 glm list

control.value 1 -none- numeric

treat.value 1 -none- numeric

nobs 1 -none- numeric

sims 1 -none- numeric

David Kaplan

unread,

Mar 28, 2014, 3:34:55 PM3/28/14

to lav...@googlegroups.com

Hi again Alex. I did modify slightly the code you sent. This is what I have.

fitmi <- sem.mi(pathmodel,data=scimodel,ordered=paste0("y",1:7),m=5,seed=12345,miPackage="mice",chi="lmrr",
miArgs=list(method=c("pmm","pmm","pmm","pmm","pmm","pmm","pmm"),ords=c("y1","y2","y3","y4","y5","y6","y7")))
summary(fitmi,fit.measures=TRUE)
inspect(fitmi, "fit")

It seems to be thinking, but this is the error message I get.

Error in lavsamplestats@cov[[g]][cbind(row.idx, col.idx)] :
subscript out of bounds

Not sure what this means.

On Thursday, March 27, 2014 4:40:54 PM UTC-5, David Kaplan wrote:

Alex Schoemann

unread,

Mar 29, 2014, 11:16:30 PM3/29/14

to lav...@googlegroups.com

Hmmm. That's odd. It looks like things are going wonky during the lavaan estimation portion of the code. Can you post your model syntax? It would help diagnose possible problems.

David Kaplan

unread,

Apr 1, 2014, 12:13:09 PM4/1/14

to lav...@googlegroups.com

Alex... here is the code I"m running.

David

install.packages("lavaan")

install.packages("semTools")

install.packages("mice")

scimodel <- read.csv(file.choose(),header=T)

## Science Achievement Example

## Kaplan (2009, pg. 15)

require(semTools)

require(lavaan)

require(mice)

## Generate some missing values ####

scimodel$ses[scimodel$ses >=.9] <- NA

scimodel$sciach[scimodel$sciach >=30.00] <- NA

scimodel$scigrade6[scimodel$scigrade6 == 1] <- NA

## Estimation under ML using FIML to handle missing data under MAR and MCAR ###

pathmodel <- '

# regressions

sciach ~ scigrade10

scigrade10 ~ scigrade6+ses+certsci+understand+challenge

challenge~understand

understand ~ certsci

'

### FIML Estimation for Missing Data ###

fitFIML <- sem(pathmodel,data=scimodel,estimator="mlr", missing="FIML")

summary(fitFIML, fit.measures=TRUE)

### Multiple Imputation Predictive Mean Matching ###

fitmi <- sem.mi(pathmodel,data=scimodel,ordered=paste0("y",1:7),m=5,seed=12345,miPackage="mice",chi="lmrr",

miArgs=list(method=c("pmm","pmm","pmm","pmm","pmm","pmm","pmm"),ords=c("y1","y2","y3","y4","y5","y6","y7")))

summary(fitmi,fit.measures=TRUE)

inspect(fitmi, "fit")

On Thursday, March 27, 2014 4:40:54 PM UTC-5, David Kaplan wrote:

Alex Schoemann

unread,

Apr 1, 2014, 2:33:45 PM4/1/14

to lav...@googlegroups.com

I may have spotted the issue. The variable names you use in the ordered option in lavaan need to match those in your data. So using your code it would be:

i <- mice(scimodel, m=5,seed=12345,method=c("pmm","pmm","pmm","pmm","pmm","pmm","pmm"))

mice.imp <- NULL

m=5

for(i in 1:m) {

mice.imp[[i]] <- complete(datami,action=i,include=FALSE)

}

#Only specify endogenous variables as ordered with fixed X specification

out <- runMI(pathmodel,data=mice.imp,chi="lmrr",ordered=c("sciach", "scigrade10", "challenge", "understand"),fun="sem")

One other note. The ords argument in miargs is used with Amelia but not with mice (there is no argument to mice called ords).

Hope this helps!

Alex

David Kaplan

unread,

Apr 1, 2014, 5:14:54 PM4/1/14

to lav...@googlegroups.com

It's cranking away, but abysmally slow. I tried it with just m=1 and also abysmally slow. As you see, I'm using predictive mean matching which is typically faster than other choices for imputation. So, although this probably would work, it's way too slow for even a small problem.

THanks anyway for all of your help.

David

On Thursday, March 27, 2014 4:40:54 PM UTC-5, David Kaplan wrote:

Alex Schoemann

unread,

Apr 1, 2014, 9:19:30 PM4/1/14

to lav...@googlegroups.com

If speed is of the utmost important (I'm assuming that this would be for a demonstration). You might want to try using Amelia for imputations instead of mice. I did a quick run and for a model (with N=5000 and 5 imputations) using mice with predictive mean matching took about 47 seconds and Amelia only took about 9 seconds.

For your example the code would be:

itmi <- sem.mi(pathmodel,data=scimodel,,m=5,seed=12345,miPackage="Amelia",chi="lmrr", miArgs=list(ords=c("sciach", "scigrade10", "challenge", "understand"", "scigrade6", "ses", "certsci")), ordered=c("sciach", "scigrade10", "challenge", "understand"))

Valentina Montalto

unread,

Jul 13, 2015, 5:50:44 PM7/13/15

to lav...@googlegroups.com

Hi

I am trying to do something similar but I get several warnings - this is what I wrote:

i <- mice(newdata, m=5,seed=NA,method=c("pmm","pmm","pmm","pmm","pmm","pmm","pmm",

"pmm","pmm","pmm","pmm","pmm","pmm","pmm", "pmm","pmm","pmm","pmm","pmm","pmm"))

mice.imp <- NULL

m=5

for(i in 1:m) {

mice.imp[[i]] <- complete(datami,action=i,include=FALSE)

modelwithimp <- cfa.mi(cfa.model, data=mice.imp, m = 5,

chi="lmrr", ordered=c("q_heritage", "q_attractions", "q_restaurants","q_facilities", "q_shops", "q_transport",

"e_know", "e_aesth", "e_inter", "e_relax", "e_surprise", "e_memory","s_expect","s_invest", "s_pleased",

"revisit", "reccom"))

I get this message:

Error in colSums(!is.na(x)) :

'x' must be an array of at least two dimensions

In addition: Warning message:

In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'

I understand that there is a problem with the imputed data as not in a dataframe. What would be the right code?

Thank you

Valentina

Andy Bush

unread,

Jul 15, 2015, 12:16:57 PM7/15/15

to lav...@googlegroups.com

Hello everyone. Following what I've read in this message chain, I have been able to fit a cfa with binary indicators
both with and without imputation using lavaan, mice and semTools. However, I have not been successful extending
the approach one step to a simple MIMIC model.

I thought it might be helpful to others to see what appears to work for me with the cfa before showing where
my calls to runMI are failing. Any help with correcting my MIMIC model fitting under multiple imputation will
be much appreciated.

I used a contrived dataset with 200 subjects and five variables.
Four of the variables (item1, item2, item3, iten4) are binary indicator variables coded as ordered factors.
The fifth variable is an exogenous binary variable indicating marital state.

Here is the data structure:
'data.frame': 200 obs. of 5 variables:
$ item1 : Ord.factor w/ 2 levels "0"<"1": NA 1 2 2 2 2 2 2 2 2 ...
$ item2 : Ord.factor w/ 2 levels "0"<"1": 2 1 2 2 NA 2 2 2 2 2 ...
$ item3 : Ord.factor w/ 2 levels "0"<"1": 2 2 2 2 2 2 2 2 2 2 ...
$ item4 : Ord.factor w/ 2 levels "0"<"1": 2 2 2 2 2 2 2 2 2 2 ...
$ married: int 1 0 1 1 0 1 1 1 0 0 ...

The following versions of lavaan, semTools and mice are loaded:
library(lavaan) # Version: 0.5-18
library(semTools) # Version: 0.4-9
library(mice) # Version: 2.22

############################
# The initial cfa modeling #
# This all went well #
############################

cfamodel <- 'lv =~ NA*item1 + item2 + item3 + item4
# variance constraint
lv ~~ 1*lv'

# first under listwise deletion:
myfit <- cfa(cfamodel,data=mimicdata,estimator="WLSMV",missing="listwise")
summary(myfit,standardized=TRUE,rsquare=TRUE)

# secondly under multiple imputation using mice:
imputations=5
datami <- mice(data=mimicdata,m=imputations,seed=12345,method=c("pmm","pmm","pmm","pmm","pmm"))
mice.imp <- NULL
for(i in 1:imputations) {
mice.imp[[i]] <- complete(datami,action=i,include=FALSE)
}

# the following calls to runMI returned exactly the same results and both look good:
out1 <- runMI(cfamodel,data=mice.imp,fun="cfa")
out2 <- runMI(cfamodel,data=mice.imp,chi="lmrr",ordered=c("item1", "item2", "item3", "item4"),fun="cfa")
summary(out1)
summary(out2)

#############################
# Subsequent MIMIC modeling #
#############################

mimicmodel <- '# measurement
lv =~ NA*item1 + item2 + item3 + item4
# variance constraint
lv ~~ 1*lv
# structural
lv ~ married'

# first using listwise deletion:
myfit <- cfa(mimicmodel,data=mimicdata,estimator="WLSMV",missing="listwise")
summary(myfit,standardized=TRUE,rsquare=TRUE)
# lavaan handled the listwise deletion perfectly

# but then under multiple imputation I ran into trouble with runMI:
I tried the following four calls:
out1 <- runMI(mimicmodel,data=mice.imp,fun="sem")
out2 <- runMI(mimicmodel,data=mice.imp,fun="cfa")
out3 <- runMI(mimicmodel,data=mice.imp,chi="lmrr",ordered=c("item1", "item2", "item3", "item4"),fun="sem")
out4 <- runMI(mimicmodel,data=mice.imp,chi="lmrr",ordered=c("item1", "item2", "item3", "item4"),fun="cfa")

All four returned exactly the same error message:
"Error in GLIST[names(GLIST) == "beta"][[group]] : subscript out of bounds"
I'd certainly like to know how to make the call to runMI correctly if possible.

Terrence Jorgensen

unread,

Jul 16, 2015, 6:20:44 AM7/16/15

to lav...@googlegroups.com

for(i in 1:m) {

mice.imp[[i]] <- complete(datami,action=i,include=FALSE)

It doesn't look like you close your for-loop here, which is necessary. You need to store all the imputed data sets in a list first, then run the cfa.mi() function only once on the entire completed list "mice.imp"

modelwithimp <- cfa.mi(cfa.model, data=mice.imp, m = 5,
chi="lmrr", ordered=c("q_heritage", "q_attractions", "q_restaurants","q_facilities", "q_shops", "q_transport",
"e_know", "e_aesth", "e_inter", "e_relax", "e_surprise", "e_memory","s_expect","s_invest", "s_pleased",
"revisit", "reccom"))

Terry

Terrence Jorgensen

unread,

Jul 16, 2015, 6:30:39 AM7/16/15

to lav...@googlegroups.com

# the following calls to runMI returned exactly the same results and both look good:
out1 <- runMI(cfamodel,data=mice.imp,fun="cfa")
out2 <- runMI(cfamodel,data=mice.imp,chi="lmrr",ordered=c("item1", "item2", "item3", "item4"),fun="cfa")

That is because your variables are already stored as ordered factors in the data.frame(s) being analyzed, so the use of lavaan's "ordered" argument is redundant in this case.

#############################
# Subsequent MIMIC modeling #
#############################
mimicmodel <- '# measurement
lv =~ NA*item1 + item2 + item3 + item4
# variance constraint
lv ~~ 1*lv
# structural
lv ~ married'

out1 <- runMI(mimicmodel,data=mice.imp,fun="sem")

out2 <- runMI(mimicmodel,data=mice.imp,fun="cfa")
out3 <- runMI(mimicmodel,data=mice.imp,chi="lmrr",ordered=c("item1", "item2", "item3", "item4"),fun="sem")
out4 <- runMI(mimicmodel,data=mice.imp,chi="lmrr",ordered=c("item1", "item2", "item3", "item4"),fun="cfa")

All four returned exactly the same error message:
"Error in GLIST[names(GLIST) == "beta"][[group]] : subscript out of bounds"
I'd certainly like to know how to make the call to runMI correctly if possible.

I think you should use the sem() function, not the cfa() function. I think their defaults are identical, so it might not matter for complete-data MIMIC models using fixed exogenous predictors of the latent variable. But you have multiple imputations, so results need to be pooled, and I'm not sure whether treating the exogenous "married" variable as fixed or random will make a difference when using this function. I don't see why it would, but I am curious whether adding the argument "fixed.x = FALSE" to runMI() would work.

Terry

Andy Bush

unread,

Jul 16, 2015, 11:03:18 PM7/16/15

to lav...@googlegroups.com

Terry,

Adding the argument "fixed.x = FALSE" to runMI() led to:

"Error in lav_options_set(opt) :
lavaan ERROR: fixed.x=FALSE is not supported for estimator DWLS"

Yves Rosseel

unread,

Jul 20, 2015, 8:23:09 AM7/20/15

to lav...@googlegroups.com

> I think you should use the sem() function, not the cfa() function.

Although historically different, since 0.4 both functions are identical.

Yves.

Yves Rosseel

unread,

Jul 20, 2015, 8:24:42 AM7/20/15

to lav...@googlegroups.com

True indeed. If you must treat the exogenous covariate as stochastic,
you can upgrade it to a latent variable.

Yves.

Andy Bush

unread,

Jul 21, 2015, 8:18:51 PM7/21/15

to lavaan

Hi Terry & Yves. Thank you for your help. It doesn't appear that runMI will currently support my Mimic model.
I'll watch to see if that changes in a later version.

Andy Bush

unread,

Jul 23, 2015, 10:57:56 AM7/23/15

to lavaan

I am happy to be able to revise my last note. I've found that it is possible to get MI results for MIMIC models.

The following two models return equivalent results on fit measures, factor loadings and regression estimates using lavaan without imputation.

Even better, semTools' runMI can pool imputation results given the second model formulation (mimic2).

mimic1 <- '# measurement

lv =~ NA*item1 + item2 + item3 + item4
# variance constraint
lv ~~ 1*lv
# structural
lv ~ married'

mimic2 <- '# measurement
fendo =~ NA*item1 + item2 + item3 + item4
fexog =~ married
# variance constraint
fendo ~~ 1*fendo
# structural
fendo ~ fexog'

Andy Bush

unread,

Jul 23, 2015, 1:18:53 PM7/23/15

to lavaan

My previous note should have said "return nearly equivalent results".

Andy Bush

unread,

Jul 28, 2015, 6:50:21 PM7/28/15

to lavaan

Yves and Terry,

Resolution for simple lavaan MIMIC models requiring MI:

Both if the following models return identical solutions with listwise deletion in effect:

mimicmodel1 <- '# measurement

fendo =~ NA*item1 + item2 + item3 + item4

# variance constraint
fendo ~~ 1*fendo
# structural

fendo ~ married'

mimicmodel2 <- '# measurement

fendo =~ NA*item1 + item2 + item3 + item4

# variance constraint
fendo ~~ 1*fendo
# structural

fendo ~ married
item1 ~ 0*married'

Invoking runMI with the following call on mimicmodel1 fails:
runMI(mimicmodel1,data=mimicdata,m=sets,miArgs=list(noms=c(5),ords=c(1:4),
ordered=c("item1", "item2", "item3", "item4")),chi="lmrr", miPackage="Amelia",seed=12345, fun="sem")

Invoking runMI with the following call on mimicmodel2 succeeds and appears to return a correct solution:
runMI(mimicmodel2,data=mimicdata,m=sets,miArgs=list(noms=c(5),ords=c(1:4),
ordered=c("item1", "item2", "item3", "item4")),chi="lmrr", miPackage="Amelia",seed=12345, fun="sem")

Reply all

Reply to author

Forward