runMI, sem with multiply imputed data

687 views
Skip to first unread message

Diana Meter

unread,
Jun 28, 2014, 1:46:19 PM6/28/14
to lav...@googlegroups.com

Hello,

I am new to R and am having trouble using multiply imputed data with runMI. My data is currently in long format and has been imputed 10 times. In order to have the program actually run the syntax within a reasonable time frame to check my work, I made another dataset with 2 imputations just so I could figure out what I’m doing.

It seems as though the first 3 sections of the syntax below (from the semTools documentation) involve creating a simulated dataset. I thought perhaps that was just to create data to use in the example. However, since I’m being told my data has no missing values (which it shouldn't), do I need to be simulating a model like mine as well? I did try this to see what would happen and the program ran for a very, very long time. I’m not sure how exactly I’m supposed to use multiply imputed data in my analysis, or whether my data is in the wrong format. I’ve included the syntax from the semTools documentation and under that, my model and syntax.

My last question is, how does one request pooled parameters from the multiply imputed datasets? When I do get a line to run (although it hasn't been exactly what I want yet), the parameters are provided for each dataset, not pooled.

I’d appreciate any advice. Thank you very much in advance,

Diana

 

modsim <- '

f1 =~ 0.7*y1+0.7*y2+0.7*y3

f2 =~ 0.7*y4+0.7*y5+0.7*y6

f3 =~ 0.7*y7+0.7*y8+0.7*y9'

 

mod <- '

f1 =~ y1+y2+y3

f2 =~ y4+y5+y6

f3 =~ y7+y8+y9'

 

datsim <- simulateData(modsim,model.type="cfa", meanstructure=TRUE,

                std.lv=TRUE, sample.nobs=c(200,200))

randomMiss2 <- rbinom(prod(dim(datsim)), 1, 0.1)

randomMiss2 <- matrix(as.logical(randomMiss2), nrow=nrow(datsim))

datsim[randomMiss2] <- NA

datsimMI <- amelia(datsim,m=3, noms="group")

 

out3 <- runMI(mod, data=datsimMI$imputations, chi="LMRR", group="group", fun="cfa")

summary(out3)

inspect(out3, "fit")

inspect(out3, "impute")

mod2 <- '

grade =~ NA*grade1

gender=~ NA*gender1

cvict =~ cvict1 + cvict2 + cvict3

cbul =~ v2*cbul1 + v2*cbul2

clim =~ clim1 + clim2 + clim3 + clim4

teachdo=~NA*teachdo1

peerdo=~NA*peerdo1

agg =~ v3*agg1 + v3*agg2

assert =~ v6*assert1 + v6*assert2

talk=~NA*talk1

telladult =~ v4*telladult1 + v4*telladult2

donothing =~ v5*donothing1 + v5*donothing2

grade~~1*grade

gender~~1*gender

cvict~~1*cvict

clim~~1*clim

teachdo~~1*teachdo

peerdo~~1*peerdo

agg~~1*agg

assert~~1*assert

talk~~1*talk

telladult~~1*telladult

cbul~~1*cbul

donothing~~1*donothing

agg ~ grade + gender + cvict + cbul + clim + teachdo + peerdo 

assert~ grade + gender + cvict + cbul + clim + teachdo + peerdo

talk ~ grade + gender + cvict + cbul + clim + teachdo + peerdo

telladult ~ grade + gender + cvict + cbul + clim + teachdo + peerdo

donothing ~ grade + gender + cvict + cbul + clim + teachdo + peerdo

'

imputed2<-read.csv("imputed2.csv",header=FALSE)

names(imputed2) <-c("imp", "grade1","gender1","cvict1","cvict2","cvict3","cbul1","cbul2","clim1","clim2","clim3","clim4","teachdo1","peerdo1","agg1", "agg2", "assert1", "assert2","talk1","telladult1", "telladult2","donothing1", "donothing2")

out3 <- runMI(mod2, data=imputed2, chi="all", group="imp", fun="sem", m = 2, std.lv=TRUE)

summary(out3)I get this error:

Amelia Error Code:  39 
 Your data has no missing values.  Make sure the code for 
 missing data is set to the code for R, which is NA. 
Error in apply(seAll, 2, function(x) all(x >= 0)) : 
  dim(X) must have a positive length

 

mice3 is a dataset I set up like I believe it would be had I imputed using mice. I tried this too. I had my unimputed dataset, and then imputation 1 and imputation 2 below in long format.

 

mice3<-read.csv("mice3.csv")

test <- as.mids(mice3, .id = NULL)

is.mids(test)

test.dat <- complete(test, action = "long", include = TRUE)

 

names(mice3) <-c(".imp", "grade1","gender1","cvict1","cvict2","cvict3","cbul1","cbul2","clim1","clim2","clim3","clim4","teachdo1","peerdo1","agg1", "agg2", "assert1", "assert2","talk1","telladult1", "telladult2","donothing1", "donothing2")

out3 <- runMI(mod2, data=test.dat, chi="all",  miPackage="mice",group=".imp", fun="sem", m = 2)

This gives me output for 3 datasets, including the one I have coded as 0 because it is unimputed, when I only want the program to pay attention to the already imputed datasets.

Terrence Jorgensen

unread,
Jun 28, 2014, 5:36:56 PM6/28/14
to lav...@googlegroups.com

imputed2<-read.csv("imputed2.csv",header=FALSE)

names(imputed2) <-c("imp", "grade1","gender1","cvict1","cvict2","cvict3","cbul1","cbul2","clim1","clim2","clim3","clim4","teachdo1","peerdo1","agg1", "agg2", "assert1", "assert2","talk1","telladult1", "telladult2","donothing1", "donothing2")

out3 <- runMI(mod2, data=imputed2, chi="all", group="imp", fun="sem", m = 2, std.lv=TRUE)


Here's your problem.  You are passing a single data.frame to runMI(), in which all the imputed copies of your data set are stacked on top of each other, with an indicator "imp" for which imputation it is.  When you call runMI(..., data = <a single data.frame>), the expectation is that you want runMI() to do the imputation step for you (thus, the error message saying it doesn't see any missing values to impute).  If you have already imputed the data, you need to call runMI(..., data = <a list of data.frame objects>), like you did in your simulation.  To see what I mean by a list of data.frames, look at the imputed data sets from Amelia you create in your simulation:

summary(datsimMI$imputations)
lapply(datsimMI$imputations, head)

To take your stacked imputations and put them in a list of individual data.frames, run this code:

my.MI.list <- list()
for (i in as.numeric(
names(table(imputed2$imp)))) {
    my.MI.list[[i]] <- imputed2[ imputed2$imp == i , ]
}
lapply(
my.MI.list, head)

Then you can pass that list to runMI(... data = my.MI.list) like you do in your simulation.

 
t3 <- runMI(mod2, data=test.dat, chi="all",  miPackage="mice",group=".imp", fun="sem", m = 2)

This gives me output for 3 datasets, including the one I have coded as 0 because it is unimputed, when I only want the program to pay attention to the already imputed datasets.


I'm not sure what your "mice3" data looks like, but if you only want runMI() to pay attention to the imputed data sets, then don't include the original (unimputed) data set in the list of data.frames that you pass to runMI().

Terry

Diana Meter

unread,
Jun 28, 2014, 7:24:22 PM6/28/14
to lav...@googlegroups.com
Hi Terry,
Thank you so much for your help and your quick response! 
Diana

Diana Meter

unread,
May 28, 2015, 4:37:15 PM5/28/15
to lav...@googlegroups.com
Hello, 
I multiply imputed using mice and have 10 stacked datasets. 

imputed<-mice(datasub, m=10, seed=12345)

I also have been able to put the data into a single dataframe. 

imputed10 <- complete(imputed, action = "long")

However, I have been unsuccessful in using runMI to run the CFA using each of the separate imputed datasets. 
I have tried different code, including an adjusted version of that listed in this thread, to try to convert the data into a list that runMI can use. Could someone please let me know what code to use to create a list of dataframes that can be listed in the code below?


out <- runMI(outcomes,data=imputed10,chi="all",fun="cfa")

Thank you very much in advance,
Diana

Terrence Jorgensen

unread,
Jun 5, 2015, 6:49:54 AM6/5/15
to lav...@googlegroups.com

I also have been able to put the data into a single dataframe. 

imputed10 <- complete(imputed, action = "long")


This is the problem.  runMI() does not accept a set of imputed data sets as a single "long-format" data.frame, with an indicator for which imputation it is.  Rather, runMI() accepts a list of data.frames, each one of which is an imputed data set.  You can create a list with a for-loop:

nImputations <- 10
impList <- list() 
for (i in 1:nImputations) {
   impList[[i]] <- complete(imputed, action = i)
} 
out <- runMI(outcomes, data = impList, ...) 

Terry

Alysha Ramirez

unread,
Jan 9, 2016, 3:04:05 PM1/9/16
to lavaan
Will this syntax also work to get lavaan to recognize 100 imputed datasets such that it will not read 1000 girls, 3500 boys, etc in my multi group analyses?

Terrence Jorgensen

unread,
Jan 10, 2016, 7:00:37 AM1/10/16
to lavaan
Will this syntax also work to get lavaan to recognize 100 imputed datasets such that it will not read 1000 girls, 3500 boys, etc in my multi group analyses?

As long as the "data" argument is a list with 100 elements, each of which is an imputed dataset, then it will analyze and pool results as expected.

Terry

Reply all
Reply to author
Forward
0 new messages