Pooling goodness of fit indices using semTools::runMI

126 views
Skip to first unread message

Mike

unread,
Sep 21, 2023, 3:29:32 PM9/21/23
to lavaan
Hi everyone, 

I am conducting a confirmatory factor analysis on some survey (ordinal) data (range: 1-4). Given the presence of missing data, I am using semTools::runMI to generate multiple imputed datasets and pool the results using the Rubin's formula. However, as you can see below, the combined goodness of fit indices do not make much sense (CFI = 1, TLI > 1, RMSEA = 0, SRMR = 0.09), especially when compared to the goodness of fit indices estimated from each individual dataset. Does anybody know how to fix this issue? I thought I could take the average of the goodness of fit indices from the results associated with each imputed dataset, but there might be a better solution.

Other than this, I wonder if there is an easy way to add information about auxiliary variables (e.g., gender, linguistic background) to the runMI function to improve the imputation process.

R syntax:

ordered2 <- scaled_db_lib |> select(i1:i6, i8:i22, i25:i28) |> colnames()

out1 <- semTools::runMI(model_cfa,
                        data = scaled_db_lib_mice,
                        m = 5,
                        miPackage = "mice",
                        fun = "cfa",
                        rotation = "oblimin",
                        estimator = "WLSMV", ordered = ordered2)

#Pooled goodness of fit indices
> out1 |> fitMeasures(c("cfi", "tli", "rmsea", "srmr"))
"D3" only available using maximum likelihood estimation. Changed test to "D2".
Robust corrections are made by pooling the naive chi-squared statistic across 5 imputations for which the model converged, then applying the average (across imputations) scaling factor and shift parameter to that pooled value.
To instead pool the robust test statistics, set test = "D2" and pool.robust = TRUE.

                  cfi            cfi.scaled                   tli
                1.000                 0.717                 1.031
           tli.scaled                 rmsea        rmsea.ci.lower
                0.686                 0.000                 0.000
       rmsea.ci.upper          rmsea.pvalue          rmsea.scaled
                0.022                 1.000                 0.028
rmsea.ci.lower.scaled rmsea.ci.upper.scaled   rmsea.pvalue.scaled
                0.012                 0.039                 1.000
                 srmr          srmr_bentler   srmr_bentler_nomean
                0.091                 0.091                 0.091
           srmr_mplus     srmr_mplus_nomean
                0.091                 0.091 

#Imputed dataset 1
> fitMeasures(lavaan::sem(model = model_cfa, data = out1@DataList[[1]], missing = "pairwise", rotation = "oblimin", estimator = "WLSMV", ordered = ordered2), c("cfi", "tli", "rmsea", "srmr"))
  cfi   tli rmsea  srmr
0.924 0.915 0.069 0.090

#Imputed dataset 2  
> fitMeasures(lavaan::sem(model = model_cfa, data = out1@DataList[[2]], missing = "pairwise", rotation = "oblimin", estimator = "WLSMV", ordered = ordered2), c("cfi", "tli", "rmsea", "srmr"))
  cfi   tli rmsea  srmr
0.913 0.903 0.076 0.094

#Imputed dataset 3  
> fitMeasures(lavaan::sem(model = model_cfa, data = out1@DataList[[3]], missing = "pairwise", rotation = "oblimin", estimator = "WLSMV", ordered = ordered2), c("cfi", "tli", "rmsea", "srmr"))
  cfi   tli rmsea  srmr
0.925 0.917 0.067 0.089

#Imputed dataset 4
> fitMeasures(lavaan::sem(model = model_cfa, data = out1@DataList[[4]], missing = "pairwise", rotation = "oblimin", estimator = "WLSMV", ordered = ordered2), c("cfi", "tli", "rmsea", "srmr"))
  cfi   tli rmsea  srmr
0.909 0.899 0.072 0.092

#Imputed dataset 5  
> fitMeasures(lavaan::sem(model = model_cfa, data = out1@DataList[[5]], missing = "pairwise", rotation = "oblimin", estimator = "WLSMV", ordered = ordered2), c("cfi", "tli", "rmsea", "srmr"))
  cfi   tli rmsea  srmr
0.920 0.911 0.070 0.091 

Thank you for your help!
Michael

Mauricio Garnier-Villarreal

unread,
Sep 26, 2023, 7:13:15 AM9/26/23
to lavaan
Michael

Why would you say that the results make no sense? Are they not properly estimated? I dont see any claer problem in the model. The pooled CFI is close to the imputations CFI around .9

First, if you were to compared the models, you need to look at the scaled indices, these are the recommneded ones when using WLS

Second, I would recommend to run the imputations separated from the runMI. So, first use mice to create the imputations, then use that in the model. That way, you can add as many variables you want in the imputations, as auxiliary as you say
Reply all
Reply to author
Forward
0 new messages