semTools: multiple imputation pooled standard errors

91 views
Skip to first unread message

Mark Seeto

unread,
Feb 20, 2019, 2:50:54 PM2/20/19
to lavaan
Dear lavaan group,

My question is about the semTools calculation of pooled standard errors when multiple imputation has been used. Unless I'm misunderstanding something, the calculation is not consistent with Rubin's rule.

The following example is based on the example in the documentation for runMI. I'm using R 3.5.2, lavaan 0.6-3, semTools 0.5-1.911.

library(lavaan)
library(semTools)

HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
                                      "ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)

HS.model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

n.imp <- 5  # number of imputed data sets

library(Amelia)
set.seed(12345)
HS.amelia <- amelia(HSMiss, m = n.imp, noms = "school", p2s = FALSE)
imps <- HS.amelia$imputations
out2 <- cfa.mi(HS.model, data = imps)

summary(out2)

# Fit the model to each imputed data set manually
cfa.imp <- vector("list", n.imp)

for (i in 1:n.imp) {
  cfa.imp[[i]] <- cfa(HS.model, data = imps[[i]])
}

# Standard errors according to Rubin's rule
sqrt(diag(Reduce("+", lapply(cfa.imp, vcov))/n.imp +
          (1 + 1/n.imp)*cov(t(sapply(cfa.imp, coef)))))

summary(out2)$se  # similar but not the same (ignore the zeros)


The code for runMI-methods.R contains the following lines:

W <- rowMeans(sapply(object@ParTableList[useImps], "[[", i = "se")^2)
B <- apply(sapply(object@ParTableList[useImps], "[[", i = "est"), 1, var)
Bm <- B + B/m
Tot <- W + Bm

I'm wondering why the standard errors aren't calculated as sqrt(Tot).

Thanks,
Mark


Terrence Jorgensen

unread,
Feb 21, 2019, 3:51:35 PM2/21/19
to lavaan

I'm wondering why the standard errors aren't calculated as sqrt(Tot).

They are, in the code immediately above that where the standard errors are assigned using the lav_model_vcov_se() function, providing VCOV from the vcov() output (which is identical to Tot as calculated there when scale.W=FALSE). Not sure why I kept Tot in there, it’s probably just a legacy from before I wrote the vcov() method. But W and Bm are there for calculating the df. The reason for the differences is that texts often repeat only the univariate version of Rubin’s rules, rather than the multivariate extension that pools the entire sampling covariance matrix.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Mark Seeto

unread,
Feb 21, 2019, 5:06:55 PM2/21/19
to lavaan
On Friday, February 22, 2019 at 7:51:35 AM UTC+11, Terrence Jorgensen wrote:

The reason for the differences is that texts often repeat only the univariate version of Rubin’s rules, rather than the multivariate extension that pools the entire sampling covariance matrix.

Thanks for your reply Terrence. But I don't understand what you mean by univariate vs multivariate versions of Rubin's rules.

In the example I posted, isn't

sqrt(diag(Reduce("+", lapply(cfa.imp, vcov))/n.imp +
          (1 + 1/n.imp)*cov(t(sapply(cfa.imp, coef)))))

multivariate because it involves the covariance matrix of all the coefficients?

Terrence Jorgensen

unread,
Feb 28, 2019, 10:00:03 PM2/28/19
to lavaan
Thanks for your reply Terrence. But I don't understand what you mean by univariate vs multivariate versions of Rubin's rules.

It should really be referred to as uni- vs. multiparameter tests.  I think it is self-explanatory, but read Enders (2010) book for a more thorough description and example applications.
Reply all
Reply to author
Forward
0 new messages