confidence intervals for standardized regression coefficients after multiple imputation (lavaan, semTools)

343 views
Skip to first unread message

Guido Biele

unread,
Jan 18, 2020, 11:40:07 AM1/18/20
to lavaan
Hello,

first, thanks to the maintainers to the lavaan and semTools packages!

I am trying to get confidence intervals for standardized regression coefficients from the analysis of a multiply imputed data set.

miFIT = runMI(model, ...)
summary
(miFIT,standardized=TRUE)

produces standardized coefficients, but no their standard errors.

fit = sem(model, ...)
standardizedSolution
(fit, se = T)

produces standardized regression coefficients and their and their standard errors (and CIs) for an ordinary data set (no multiple imputations).

standardizedSolution(miFIT, se = T)

produces standardized regression coefficients but returns NA for the standard error and CIs.

Can standardizedSolution also calculate se/CIs for multiply imputed data sets, or is my best bet to calculate them for each individual data set and then pool them myself?


Best, Guido

Terrence Jorgensen

unread,
Jan 18, 2020, 5:40:28 PM1/18/20
to lavaan
I am trying to get confidence intervals for standardized regression coefficients from the analysis of a multiply imputed data set.

standardizedSolution() is a function for lavaan objects, not lavaan.mi objects.  I did not implement a standardizedSolution.mi() function in semTools, nor is it on my to-do list.  Any SEM textbook I have read (which is certainly not all of them) mimics the same advice: draw inferences about unstandardized parameters, use standardized parameters as descriptive measures of effect size. 

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Guido Biele

unread,
Jan 19, 2020, 3:38:34 AM1/19/20
to lav...@googlegroups.com
Thanks Terrence,
I asked because I wanted to make a plot with standardized effects and CIs.

I searched for Rubin's rule and found that there is a version for SEs, which I might use to estimate the pooled SEs myself.

On the other hand, I have not read any book on SEMs. Do you remember a justification for the recommendation you describe above?

Best, Guido


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/98e4ddfc-a312-4912-a9e6-16895d6d3d92%40googlegroups.com.

Terrence Jorgensen

unread,
Jan 20, 2020, 11:34:38 AM1/20/20
to lavaan
I asked because I wanted to make a plot with standardized effects and CIs.

Oh ... well, that's actually a really good idea.  Preacher and Kelley recently advocated reporting CIs around effect-size estimates, and I agree with their reasoning (similar motivation, I'm sure, to your wanting error bars in your graph).  Even when the defined effect sizes might be of questionable value, I agree it is important to quantify uncertainty about the estimate. 

For the record, I am not against providing a standardizedSolution.mi().  I just haven't been motivated to prioritize it above adding/maintaining other functionality in semTools.  My lack of motivation came not only from finding their use for hypothesis-testing dubious, but also from not finding a simple way to access lavaan's internal procedure for obtaining SEs of standardized estimates.  Perhaps I can ask Yves to make something available via the vcov(...) method.  I will make a note to post here if I find an easy enough solution to add this to semTools this summer (I doubt I will have time to program before that).


I searched for Rubin's rule and found that there is a version for SEs, which I might use to estimate the pooled SEs myself.

See the final ?runMI help-page example to see how you can use the FUN= argument to store standardizedSolution() output for each imputation, which you could then use to pool SEs yourself.  Note that a pooled standardized estimate (i.e., the mean standardized parameter across imputations) will not be the same as the standardized pooled estimate (i.e., standardize the mean of unstandardized estimates across imputations, using model-implied SDs that are based on pooled parameter estimates).  But they will probably be close enough for graphing purposes.

data(datCat)
set.seed(123)
for (i in 1:8) datCat[sample(1:nrow(datCat), size = .1*nrow(datCat)), i] <- NA

catout
<- cfa.mi(' f =~ 1*u1 + 1*u2 + 1*u3 + 1*u4 ', data = datCat,
                 m
= 3, seed = 456,
                 miArgs
= list(ords = paste0("u", 1:8), noms = "g"),
                 FUN
= function(fit) list(STD = lavaan::standardizedSolution(fit)))
## custom function to extract estimate and SE from each imputation
getSTDs
<- function(STD, lhs, op, rhs, group = NULL, level = NULL, COL = "est.std") {
    ROW
<- STD$lhs == lhs & STD$op == op & STD$rhs == rhs
   
if (!is.null(group)) ROW <- ROW & STD$group == group
   
if (!is.null(level)) ROW <- ROW & STD$level == level
    STD
[ROW, COL]
}
est
<- mapply(getSTDs, STD = lapply(catout@funList, "[[", i = "STD"),
              lhs
= "f", op = "=~", rhs = "u2")
SE
<- mapply(getSTDs, STD = lapply(catout@funList, "[[", i = "STD"),
              lhs
= "f", op = "=~", rhs = "u2", COL = "se")
mean
(est) # pooled point estimate
sqrt
(mean(SE^2) + (1 + 1/length(est))*var(est)) # pooled SE

You could wrap up that procedure in another function to vectorize over the estimates you want.



On the other hand, I have not read any book on SEMs. Do you remember a justification for the recommendation you describe above?

Probably because historically, it took a while for software to implement correct SEs for standardized estimates.  But maybe also because that practice would be consistent with any other NHST context (e.g., Cohen's d is an effect-size supplement to a t statistic for NHST; you would not report separate t tests for raw data and z scores, although they would be identical in that context).  There is also the issue of whether the standardized parameter you estimate generalizes beyond the specifics of your design (e.g., if you sampled from a different (sub)population, the SDs of your variables could differ, making standardized parameters incomparable across contexts even in the population), which makes me uncomfortable with the knee-jerk automatic reporting of canned estimates of "standardized" effect size.  I recognize that SEMs often involve data with arbitrary scales (not just for latent variables), but I don't think dividing by sample SDs resolves that issue.  

Guido Biele

unread,
Jan 21, 2020, 5:01:32 AM1/21/20
to lavaan
Thanks for your thoughtful response Terrence!

It seems that you might have thought I wanted CIs to make inference. That was not the motivation. I like to report effect sizes that are intuitively accessible, which led me to standardized effects, and I like to report uncertainty, which led me to CIs.

I hadn't thought about the difference between pooled standardized estimate and the standardized pooled estimates. What I am using are pooled standardized estimates, which seems intuitively right to me, because I want the CIs to reflect the uncertainty of the parameter estimate and the uncertainty introduced by missing data/imputation. 

The `FUN=` option is very useful, I hadn't seen that. 

It is a good point that generalization of standardized coefficients beyond the sample is unclear if a new sample has different data (variances). Currently I still prefer to report/plot standardized effects, because I hope this helps moving the readers attention away from NHST.*

cheers, guido

If I had the time, I'd implement a Bayesian model. 

Mauricio Garnier-Villarreal

unread,
Jan 21, 2020, 11:28:00 AM1/21/20
to lavaan
If you are interested in the CI i Bayesian, you can get those from blavaan, like this


library(blavaan)
library(coda)


## Not run:
# The Holzinger and Swineford (1939) example
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- bcfa(HS.model, data=HolzingerSwineford1939)
summary(fit)

std_by <- standardizedposterior(fit)


apply(std_by, 2, mean) ### standardized posterior mean
apply(std_by, 2, function(x) HPDinterval(as.mcmc(x), porb=.95) ) ### standardized 95% CI


Guido Biele

unread,
Jan 21, 2020, 3:07:13 PM1/21/20
to lavaan
Unfortunately all measured variables are ordinal. Otherwise I would have used blavaan.
Reply all
Reply to author
Forward
0 new messages