FIML and Multiple Imputation show different results in interaction model and in MIMIC model

Benedikt Heuckmann

unread,

Oct 6, 2018, 3:10:37 AM10/6/18

to lavaan

Dear colleagues,

Lavaan (06.-3) yielded different results when using the FIML estimator compared to applying multiple imputation with the mice package in lavaan directly to take care of the missing data. The results strongly differ, i.e., significant predictors (p < .001) with FIML are largely irrelevant using MI (p = .220). I assumed that I would get roughly similar results when using the same set of variables and auxiliary variables in both estimations. So I was wondering what might cause the differences.

Here is what I did: The aim was to analyze the relationship between a set of 12 belief variables and attitudes in two ways. First, I specified a belief LV and regressed it on attitudes to determine the influence of the total set of beliefs on attitudes. Second, I intended to identify significant predictors of attitudes by using multiple indicators multiple causes (MIMIC) model as mentioned by Kline (2016).

For measuring the belief items, I used an expectancy-value approach. I tried two ways to deal with the expectancy value model, that is, (1) calculating expectancy-value products (evp) prior to the data analysis and include them as manifest indicators; (2) specifying interactions between expectancy-items (bb) and value-items (oe) in the model syntax.

The design is a planned missingness design (three form design), and missing data (approx. 33%) is in the beliefs only. N = 355; MLR estimator is used to accounting for the nonnormal distribution of scores (I am aware that MI requires multivariate normality!).

# model syntax for beliefs as LV and evp calculated prior to analysis

model <- '

AB.B =~ 1*ab3 + ab1 + ab5 + ab6 #first attitudinal factor

AB.N =~ 1*ab7 + ab4 + ab2 #second attitudinal factor

BB.scale =~ evp1 + evp2 + evp3 + evp4 + evp5 + evp6 + evp7 + evp8 + evp9 + evp10 + evp11 + evp12

AB.B + AB.N ~ BB.scale

'

#model syntax for beliefs as LV and evp calculated as interactions of bb and oe

model <- '

AB.B =~ 1*ab3 + ab1 + ab5 + ab6

AB.N =~ 1*ab7 + ab4 + ab2

BB =~ bb1 + bb2 + bb3 + bb4 + bb5 + bb6 + bb7 + bb8 + bb9 + bb10 + bb11 + bb12

OE =~ oe1 + oe2 + oe3 + oe4 + oe5 + oe6 + oe7 + oe8 + oe9 + oe10 + oe11 + oe12

EVP =~ bb1:oe1 + bb2:oe2 + bb3:oe3 + bb4:oe4 + bb5:oe5 + bb6:oe6: + bb7:oe7 + bb8:oe8 + bb9:oe9 + bb10:oe10 + bb11:oe11 + bb12:oe12

AB.B + AB.N ~ EVP

'

#model syntax for MIMIC model

model <- '

AB.B =~ 1*ab3 + ab1 + ab5 + ab6

AB.N =~ 1*ab7 + ab4 + ab2

AB.B + AB.N ~ evp1 + evp2 + evp3 + evp4 + evp5 + evp6 + evp7 + evp8 + evp9 + evp10 + evp11 + evp12

'

# fit measures in lavaan using FIML

fit <- lavaan::sem(model, data = data, estimator = "mlr", missing = "fiml.x")

summary(fit, fit.measures = TRUE, ci = TRUE, standardized = TRUE, rsquare = TRUE, modindices = TRUE)

#fit measures in lavaan using the mice package

dataframe <- dplyr::select(data, ab1, ab2, ab3, ab4, ab5, ab6, ab7, ab8, bb1, bb2, bb3, bb4, bb5, bb6, bb7, bb8, bb9, bb10, bb11, bb12, oe1, oe2, oe3, oe4, oe5, oe6, oe7, oe8, oe9, oe10, oe11, oe12, evpbb1, evpbb2, evpbb3, evpbb4, evpbb5, evpbb6, evpbb7, evpbb8, evpbb9,evpbb10, evpbb11, evpbb12)

out <- sem.mi(model,

data = dataframe,

m = 30,

seed = 21309,

miPackage = "mice",

fun = "lavaan", "std.lv")

summary(out, ci = TRUE, asymptotic = TRUE, add.attributes = TRUE, stand = TRUE, rsq = TRUE)

I greatly appreciate your thoughts and comments, and how you would move forward. Do you think that FIML is the better way to account for missing data in my case?

Thanks,

Benedikt

PS: Using the runMI() function for MI, I get the error message, that the initial model-implied matrix (Sigma) is not positive definite; I also found that using missing = “fiml.x” does not properly works in the MIMIC model as missing = “pairwise” leads to the same results. Is there any alternative to account for missing data in a MIMIC model?

Terrence Jorgensen

unread,

Oct 11, 2018, 5:43:26 AM10/11/18

to lavaan

I'm not sure how FIML(R) behaves in the case of using product indicators to model latent interactions. MLM & MLR work well with complete data, so I would recommend multiple imputation. But you need to double-mean-center your product indicators rather than using raw indicators. This will require you to impute your data first (which is smarter anyway, to give you more control and an opportunity to check imputation diagnostics), then create product indicators and double-mean-center them in each imputed data set.

The semTools function indProd() can help with creating double-mean-centered product indicators. I'd suggest reading the references on the ?indProd help page.

The mitml package has a nice utility for creating functions of variables (i.e., using the indProd() function) within each imputed data set. See the ?within.mitml.list help page.

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Benedikt Heuckmann

unread,

Feb 11, 2019, 5:37:55 AM2/11/19

to lavaan

Many Thanks for your advice, Terrence, and sincerely apologies for not getting back to you earlier. It took some time to understand how the different packages interact with each other, and to successfully combine the syntax, but finally, the analysis worked.

We found a solution by imputing data first (using the 'mice' package), followed by calculating double-mean-centered product indicators for the exogenous belief variables using the indProd() function, and, finally, passed the multiply imputed dataset to the runMI() formula to perform the analysis. We also conducted a similar analysis by transforming the imputed datasets to the 'long' format first, and then calculated the double-mean-center product indicators using the indProd() function (which allowed to do the analysis without the mitml package).

Thank you so much for your help!!

Kate Bromley

unread,

Feb 27, 2020, 2:29:01 PM2/27/20

to lavaan

I am running into similar issues using inProd() with a mids dataset from mice and have tried to use the mitml package without success. Were you using the mids dataset that results from imputation using the mice package or some other structure of dataset? Would you be willing to share the code you used or advice on how you passed your imputed data through the inProd() function?

Thank you!

Kate

Terrence Jorgensen

unread,

Feb 27, 2020, 4:51:19 PM2/27/20

to lavaan

have tried to use the mitml package without success.

You should be able to convert your mids object to a mitml.list object, so that you can capitalize on the with() or within() methods to add product indicators to each imputed data set. See examples here:

?mids2mitml.list

?within.mitml.list

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Kate Bromley

unread,

Feb 27, 2020, 7:02:16 PM2/27/20

to lavaan

Hi Terrence,

Thank you for your quick reply. Converting the mids object is exactly what I was looking for but seemed to have overlooked it in the package. What I am now running into is getting an error: "Error in data[, var1] : incorrect number of dimensions" that I just can't seem to resolve. The code I am using is:

impmitml <- mids2mitml.list(impute.sem)
mice.imp.interact <- within(impmitml, {interaction <- indProd(impmitml, var1=6:8, var2=5, match = FALSE, meanC = TRUE, residualC = FALSE, doubleMC = TRUE, namesProd = NULL)})

Any suggestions would be much appreciated.

Thanks!

Kate

Message has been deleted

Kate Bromley

unread,

Mar 26, 2020, 8:00:21 PM3/26/20

to lav...@googlegroups.com

Hi Terrence,

Thank you so much for the revised code. I am still running into the same error as I was before: "Error in data[, var1] : incorrect number of dimensions". Any advice on other fixes to resolve this?

Thank you!

Kate

On Sat, Mar 21, 2020 at 6:55 AM Terrence Jorgensen <tjorge...@gmail.com> wrote:

What I am now running into is getting an error: "Error in data[, var1] : incorrect number of dimensions" that I just can't seem to resolve. The code I am using is:

impmitml <- mids2mitml.list(impute.sem)
mice.imp.interact <- within(impmitml, {interaction <- indProd(impmitml, var1=6:8, var2=5, match = FALSE, meanC = TRUE, residualC = FALSE, doubleMC = TRUE, namesProd = NULL)})

indProd() returns a data.frame already, so don't use within(), use with(). And don't assign it to anything.

mice.imp.interact <- with(impmitml, {indProd(impmitml, var1=6:8, var2=5, match = FALSE, meanC = TRUE, residualC = FALSE, doubleMC = TRUE, namesProd = NULL)})

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/XpwzUh4h3gs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a1019f4b-9c5d-4306-81c1-22a9a3ee7ba5%40googlegroups.com.

Terrence Jorgensen

unread,

Apr 9, 2020, 6:31:36 PM4/9/20

to lavaan

I am still running into the same error as I was before: "Error in data[, var1] : incorrect number of dimensions".

My fault, I overlooked the fact that you would be calling the same list in indProd() instead of the first argument simply being each data.frame within the list.

I have added part of the following example to the ?indProd help page for future reference:

HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
                                      "ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
library(Amelia)
set.seed(12345)
HS.amelia <- amelia(HSMiss, m = 3, noms = "school", p2s = FALSE)
imps <- HS.amelia$imputations
imps <- lapply(imps, indProd, 
               var1 = c("x1","x2","x3"), var2 = c("x4","x5","x6"))
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  VT =~ x1.x4 + x2.x5 + x3.x6
  speed   =~ x7 + x8 + x9
  speed ~ visual + textual + VT
'
out <- cfa.mi(HS.model, data = imps, std.lv = TRUE)
summary(out)

For your case, you can use:

impmitml <- lapply(impmitml, indProd, 
                   var1=6:8, var2=5, match = FALSE)

Sorry for the confusion (and delayed reply). Also, in case you were planning to use the probing functions in semTools, they now work with lavaan.mi objects in the latest development version (0.5-2.921)

devtools::install_github("simsem/semTools/semTools")

Enes Bayrakoglu

unread,

Feb 16, 2024, 5:02:49 AM2/16/24

to lavaan

Hello Dr. Jorgensen,

Sorry to jump in here and directly refer the question to you. My question wasn't answered on different platforms, and I thought this might be a good chance to get an answer. My problem seemed like a similar kind of running problem. If not, please accept my apologies.
I am trying to test the Measurement Invariance in my one-latent-factor model. I have 24 questions coded "correct" and "incorrect" and 3 different groups in my data (Sample sizes are approximately 200 for each group). Since there is some missing data, as far as I know, I can't run the CFA models with the WLSMV estimator. So, I first imputed the data with Amelia package and then tried to run the cfa.mi function in the semTools package as stated in the runMI function description.
My problem is that every time I run the configural or metric model, fit indices (CFI, TLI, RMSEA) and the model test statistics yield very different results (I keep the imputed datasets the same).

Here's the code I am running:
###############################################################
data_mck <- MCK_DATA_
data_mck <- data_mck[,1:32]
set.seed(12345)
Amelia.imputation <- amelia(data_mck[,-c(14,15,16,17,18)], m = 20, idvars = "ID", ords = c("Q1","Q2A","Q3A","Q3B","Q3C","Q5","Q6","Q8A","Q8B","Q8C","Q10","Q12B","Q13A","Q13B","Q13C","Q17A","Q18B","Q19A","Q19B","Q19C","Q19D","Q20A","Q22","Q23"), p2s = FALSE, incheck = T)

imps_mck <- Amelia.imputation$imputations

## here I had 20 imputed datasets with 24 binary questions ###
for (i in 1: 20) { imps_mck[[i]]$COUNTRY <- factor(imps_mck[[i]]$COUNTRY) }
### There are 3 different groups (countries) in the dataset ###
model_mck <- 'MCK =~ Q1+Q2A+Q3A+Q3B+Q3C+Q5+Q6+Q8A+Q8B+Q8C+Q10+Q12B+Q13A+Q13B+Q13C+Q17A+Q18B+Q19A+Q19B+Q19C+Q19D+Q20A+Q22+Q23'
cfa.configural <- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE)
summary(cfa.configural, fit.measures = TRUE, standardized = TRUE)

cfa.metric <- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE,
group.equal = "loadings")
summary(cfa.metric.amelia_mck_wout_Q9, fit.measures = TRUE, standardized = TRUE)

cfa.scalar<- cfa.mi(model_mck, data = imps_mck, estimator = "WLSMV", group = "COUNTRY", ordered = TRUE,
group.equal = c("loadings","intercepts"))
summary(cfa.scalar, fit.measures = TRUE, standardized = TRUE)

### All models give very different and inconsistent fit indices and test statistics when I delete them from the R environment and re-run them ###

lavTestLRT.mi(cfa.metric, h1=cfa.configural)
###############################################################

It would be very appreciated If you could take a look because I am stuck here and can't go further with my Ph.D. thesis.
Many thanks in advance.
Have a beautiful day.

10 Nisan 2020 Cuma tarihinde saat 00:31:36 UTC+2 itibarıyla Terrence Jorgensen şunları yazdı:

Reply all

Reply to author

Forward