Exploding Standard Errors with RunMI() respectively sem.mi()

132 views
Skip to first unread message

Nathan Brack

unread,
Apr 16, 2021, 6:14:09 AM4/16/21
to lavaan

Hi all,

I'm writing my Master thesis using SEMs and I'm having some troubles with the sem.mi()-function. I'm giving the function a list of imputed datasets as follows (some endogenous variables are categorical with two levels, therefore im using the ordered-argument)

fit = sem.mi(model, data = impList, estimator = "WLSMV", ordered = c("var1", "var2"))
summary(fit, fit.measures = T, test = "D2", pool.robust = TRUE)

When investigating the summary-output, I get huge exploding Standard Errors, though the estimates are correct.


When I investigate the Standard Errors for a single imputed data set, they all seem fine (< 1). I'm not sure why the Standard Errors are not pooled correctly in this case. This did not happen another time when I did not give a list of imputed datasets, but let the imputations be done internally by the sem.mi()-function as follows:

fit = sem.mi(model, data=data, estimator = "DWLS", miPackage = "mice", m = 25)

However, unfortunately this approach is not feasible for me in my specific task. My session informations are the following

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
...
other attached packages:
[1] mice_3.13.0 semTools_0.5-4 lavaan_0.6-8
...

I tried different versions of semTools because I saw in another post that that might fix problems. But for example version 0.4-11 of semTools didn't even work and gave me an error when trying to run the same command as above

> sem.mi(model, data = impList, estimator = "WLSMV", ordered = c("var1", "var2"))
Fehler in lav_fit_measures(object = object, fit.measures = fit.measures, :
lavaan ERROR: please refit the model with test="standard"

I have no idea how else I could proceed. I would highly appreciate any help of yours! Thanks in advance for taking your time.

Best, Nathan

Nathan Brack

unread,
Apr 16, 2021, 6:25:42 AM4/16/21
to lavaan
It did not, for some reason, take my picture that was supposed to stand where now the three line breaks are as a placeholder. So I'll try again:

exploding_standard_errors.PNG

Nathan Brack

unread,
Apr 21, 2021, 4:36:40 AM4/21/21
to lavaan
I could resolve my issue, sorry for bothering. By putting the following argument

summary(..., scale.W = FALSE)

the standard errors were correct. When scale.W = TRUE, which is also the default, the pooling is done by some alternative method using ARIV, which apparently didn't work well in my case. However the standard pooling works just fine!

Best, Nathan

Terrence Jorgensen

unread,
Apr 28, 2021, 11:27:12 PM4/28/21
to lavaan
By putting the following argument

summary(..., scale.W = FALSE)

the standard errors were correct. When scale.W = TRUE, which is also the default, the pooling is done by some alternative method using ARIV, which apparently didn't work well in my case. However the standard pooling works just fine!

Indeed, glad you figured this out already.  The single ARIV calculated using the entire parameter vector is theoretically more stable (see Enders, 2010), but I have noticed there are situations when it is problematic.  I think it is when standard RIVs (individual per parameter) vary wildly across parameters, but it might be something else, like near linear dependency of the within-imputation sampling covariance matrix.  Setting scale.W = FALSE sounds like the best thing to do. 

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Nathan Brack

unread,
May 25, 2021, 5:07:11 AM5/25/21
to lavaan
Dear Terrence

Thank you very much for your answer! I have another brief question concerning the pooling, but this time the pooling of goodness-of-fit indices.

By accident, I found that for example the CFI for each single imputed set is around 0.95 to 0.96 (acceptable). However, the pooled estimate for the CFI is quite low, only 0.91. Obviously, the pooling of indices in semTools/lavaan is not simply taking the average over alle imputed data sets.

In Enders Applied Missing Data Analysis (2010) it says that "there is no established method for pooling these indices" (page 251). So my question is, firstly, how is the pooling of fit indices done in semTools/lavaan? And secondly, are there possibilities to modify this method?

Best, Nathan

Stas Kolenikov

unread,
May 25, 2021, 10:45:35 AM5/25/21
to lav...@googlegroups.com
I suspect that your imputations are just bad, or maybe you have so much missing data that your between-imputation variability dominates the within imputation variability. (It rarely if ever makes sense to look at the individual imputation results, IMHO. It does make sense though to look at the imputed data and check if the imputations make sense, see e.g. Bondarenko and Raghunathan 2016 https://doi.org/10.1002/sim.6926).

Also adapting scalar statistics, like R^2 in regression or fit indices SEM, for multiple imputations is not trivial. MI is primarily intended for point and variance estimation, and justifications for the fit-type statistics are done in hindsight, at best. 

-- Stas Kolenikov, PhD, PStat (ASA, SSC)  @StatStas
-- Principal Scientist, Abt Associates @AbtDataScience
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name
 


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/2df7872b-bd25-447a-ae8a-06414ea14cefn%40googlegroups.com.

Terrence Jorgensen

unread,
May 26, 2021, 1:10:32 AM5/26/21
to lavaan
By accident, I found that for example the CFI for each single imputed set is around 0.95 to 0.96 (acceptable). However, the pooled estimate for the CFI is quite low, only 0.91.

I'm also reminded of this title:


These values are hardly different.  Rules of thumb for fit indices are silly, but Bentler and Bonett (1980) recommended > .9 as acceptable. 


Hu & Bentler's (1998, 1999) simulations have been repeatedly criticized and undermined with empirical evidence of their lack of external validity.

Obviously, the pooling of indices in semTools/lavaan is not simply taking the average over alle imputed data sets.

No.  I use the same formulas for each index that lavaan does, but I use the pooled chi-squared values.  As Stas said, this is just an ad hoc proposal. 

are there possibilities to modify this method?

You are more than welcome to use the FUN= argument to save fitMeasures() and take averages, if you like.
Reply all
Reply to author
Forward
0 new messages