# semTools: multiple imputation pooled standard errors

42 views

### Mark Seeto

Feb 20, 2019, 2:50:54 PM2/20/19
to lavaan
Dear lavaan group,

My question is about the semTools calculation of pooled standard errors when multiple imputation has been used. Unless I'm misunderstanding something, the calculation is not consistent with Rubin's rule.

The following example is based on the example in the documentation for runMI. I'm using R 3.5.2, lavaan 0.6-3, semTools 0.5-1.911.

library(lavaan)
library(semTools)

HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(12345)
HSMiss\$x5 <- ifelse(HSMiss\$x5 <= quantile(HSMiss\$x5, .3), NA, HSMiss\$x5)
age <- HSMiss\$ageyr + HSMiss\$agemo/12
HSMiss\$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss\$x9)

HS.model <- '
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9
'

n.imp <- 5  # number of imputed data sets

library(Amelia)
set.seed(12345)
HS.amelia <- amelia(HSMiss, m = n.imp, noms = "school", p2s = FALSE)
imps <- HS.amelia\$imputations
out2 <- cfa.mi(HS.model, data = imps)

summary(out2)

# Fit the model to each imputed data set manually
cfa.imp <- vector("list", n.imp)

for (i in 1:n.imp) {
cfa.imp[[i]] <- cfa(HS.model, data = imps[[i]])
}

# Standard errors according to Rubin's rule
sqrt(diag(Reduce("+", lapply(cfa.imp, vcov))/n.imp +
(1 + 1/n.imp)*cov(t(sapply(cfa.imp, coef)))))

summary(out2)\$se  # similar but not the same (ignore the zeros)

The code for runMI-methods.R contains the following lines:

W <- rowMeans(sapply(object@ParTableList[useImps], "[[", i = "se")^2)
B <- apply(sapply(object@ParTableList[useImps], "[[", i = "est"), 1, var)
Bm <- B + B/m
Tot <- W + Bm

I'm wondering why the standard errors aren't calculated as sqrt(Tot).

Thanks,
Mark

### Terrence Jorgensen

Feb 21, 2019, 3:51:35 PM2/21/19
to lavaan

I'm wondering why the standard errors aren't calculated as sqrt(Tot).

They are, in the code immediately above that where the standard errors are assigned using the `lav_model_vcov_se()` function, providing `VCOV` from the `vcov()` output (which is identical to `Tot` as calculated there when `scale.W=FALSE`). Not sure why I kept `Tot` in there, it’s probably just a legacy from before I wrote the `vcov()` method. But `W` and `Bm` are there for calculating the df. The reason for the differences is that texts often repeat only the univariate version of Rubin’s rules, rather than the multivariate extension that pools the entire sampling covariance matrix.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

### Mark Seeto

Feb 21, 2019, 5:06:55 PM2/21/19
to lavaan
On Friday, February 22, 2019 at 7:51:35 AM UTC+11, Terrence Jorgensen wrote:

The reason for the differences is that texts often repeat only the univariate version of Rubin’s rules, rather than the multivariate extension that pools the entire sampling covariance matrix.

Thanks for your reply Terrence. But I don't understand what you mean by univariate vs multivariate versions of Rubin's rules.

In the example I posted, isn't

sqrt(diag(Reduce("+", lapply(cfa.imp, vcov))/n.imp +
(1 + 1/n.imp)*cov(t(sapply(cfa.imp, coef)))))

multivariate because it involves the covariance matrix of all the coefficients?