bootstrapping with missing data

638 views
Skip to first unread message

Tracy Wong

unread,
Feb 24, 2021, 8:58:59 PM2/24/21
to lavaan

Hi everyone, 

I am trying to run "Yuan" (YHY) bootstrapping on a path model that has missing data. However, I got the following error message:"Lavaan ERROR: bollen.stine/yuan bootstrap not available for missing data".

I was thus wondering if you could please advise if there are alternative ways where I could obtain yuan-bootstrapped CFI/RMSEA for path model with missing data. 

My syntax:

fit=sem(Model1, data=FILE,missing = "FIML", estimator="ML", std.lv=TRUE, se= "boot")

CFI.boot <- bootstrapLavaan(fit, R=1000, type="yuan", FUN=fitMeasures, fit.measures="cfi", parallel="multicore", ncpus=8)

Any help is much appreciated. Thank you very much!

-- Tracy

Terrence Jorgensen

unread,
Feb 24, 2021, 10:35:47 PM2/24/21
to lavaan

please advise if there are alternative ways where I could obtain yuan-bootstrapped CFI/RMSEA for path model with missing data. 


I don't know of a software implementation of the YHY bootstrap, but semTools::bsBootMiss() implements the model-based ("Bollen-Stine") bootstrap, generalized by Savalei & Yuan (2009) for missing data.  By default, it will return the bootstrapped chi-squared test with p value, but you can request bootSamplesOnly=TRUE to obtain the bootstrap samples from the transformed data (i.e., consistent with H0), and with those use lavaanList() to fit the model to each and save the fit indices.  Not sure that is what you want, though.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Tracy Wong

unread,
Feb 25, 2021, 1:58:57 PM2/25/21
to lavaan
Hi Prof. Jorgensen, 

Thank you so much for your prompt advice! 

I have one additional question -- when we ask for se="boot", bootstrap=1000 with sem(), do we automatically get bootstrapped coefficients in the output or do we need to use the FUN="coef" in bootstrapLavaan () to get them? If I have to use the "FUN="coef",  I believe I have to remove the se="boot"  from the sem() and run another model because I received an error message when I tried to use the same model:  lavaan ERROR: se == "bootstrap"; please refit model with another option for "se"
Please advise if this is correct or not. 

My syntax: 
fit< -- sem(model1, data=dataset,missing = "FIML", estimator="ML", se="boot", bootstrap=1000, std.lv=TRUE).  ##get bootstrapped standard error
confit <--parameterEstimates(fit, level=0.95, boot.ci.type="bca.simple") ##get bootstrapped confidence interval 
bootest <--bootstrapLavaan(model1, R = 1000L, type = "ordinary", verbose = FALSE, FUN = "coef", parallel = c("no", "multicore"), ncpus = 1L) ##get bootstrapped coefficients?

I am in the process of writing my thesis, so your advice is great appreciated!!

-- Tracy 

Terrence Jorgensen

unread,
Mar 19, 2021, 4:45:19 PM3/19/21
to lavaan
when we ask for se="boot", bootstrap=1000 with sem(), do we automatically get bootstrapped coefficients in the output or do we need to use the FUN="coef" in bootstrapLavaan () to get them?

That is something you can answer yourself by trying it.  But I do not think the point of bootstrapping is to get better point estimates, but rather better SE and interval estimates.

paul.tho...@gmail.com

unread,
Apr 27, 2022, 9:43:12 AM4/27/22
to lavaan
I have successfully managed to fit the model, then get a set of bootstrap samples, and fit the BS samples using lavaanList, but I cannot work out how to extract the fitindices from a lavaanList object? Any advice would be most appreciated.

Christian Arnold

unread,
Apr 27, 2022, 9:52:19 AM4/27/22
to lav...@googlegroups.com
Even if it is not exactly an answer to your question, for such problems I provided x.boot some time ago.

Best

Christian
From: lav...@googlegroups.com <lav...@googlegroups.com> on behalf of paul.tho...@gmail.com <paul.tho...@gmail.com>
Sent: Wednesday, April 27, 2022 3:43:11 PM
To: lavaan <lav...@googlegroups.com>
Subject: Re: bootstrapping with missing data
 
--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/b29a8b2d-4093-4117-8d01-d7db6fdb1439n%40googlegroups.com.

paul.tho...@gmail.com

unread,
Apr 27, 2022, 10:20:56 AM4/27/22
to lavaan
Here is a reproducible example of my code. The only part that I can't figure out is extracting 'fitMeasures()' from the lavaanList object. Thank you for any help.



library(lavaan)

x2 <- rnorm(200)
x1 <- x2 + rnorm(200)
y2 <- rnorm(200) + x2
y1 <- y2 + rnorm(200)
z2 <- rnorm(200) + y2
z1 <- z2 + rnorm(200)
d <- data.frame(x1, x2, y1, y2, z1, z2)

#=====================================#

perc <- c(0.01, 0.02, 0.05,0.02, 0.04, 0.05)

d[] <- Map(function(x, y) {
  x[sample(seq_along(x), length(x) * y)] <- NA
  x
}, d, perc)

#=====================================#

## specify null and full models

full_mod_syntax <- '
  x2 ~ x1 + y1 + z1
  y2 ~ x1 + y1 + z1
  z2 ~ x1 + y1 + z1
'

## estimate models

m1 <- lavaan::sem(full_mod_syntax, data = d, missing="fiml")



## view full model summary
lavaan::summary(m1)

## Previous attempt that fails with missing data##
##Conduct YHY bootstrap for CFI fit index.
## Note type = “yuan” means the YHY bootstrap method

#YHY.boot <- bootstrapLavaan(m1, R = 10, type = "yuan", FUN = fitMeasures, fit.measures = c("rmsea","cfi"))

## Calculate 95% CI for CFI under the YHY bootstrap

#quantile(YHY.boot[,'rmsea'],c(0.025,0.975))
#quantile(YHY.boot[,'cfi'],c(0.025,0.975))


#======================================================
#Following Jorgensen message:

temp <- bsBootMiss(m1, transformation = 2, nBoot = 10,bootSamplesOnly = TRUE)

output <- semList(full_mod_syntax , dataList = temp, ndat = 10,
                store.slots = "all")

fitMeasures(output)

Christian Arnold

unread,
Apr 28, 2022, 3:17:23 AM4/28/22
to lav...@googlegroups.com
Hmm, maybe fitMeasures can't handle semList? If so, here's a quick and dirty workaround that may (or may not) solve your problem:

...
n <- 10
temp <- bsBootMiss(m1, transformation = 2, nBoot = n, bootSamplesOnly = TRUE)

res <- data.frame("rmsea" = rep(NA, n), "cfi" = rep(NA, n))
for(i in 1 : n) {
  fit <- sem(full_mod_syntax, temp[[i]])
  if(lavInspect(fit, "converged")) {
     res[i,] <- fitMeasures(fit, c("rmsea", "cfi"))
  }
}
na.omit(res)

Von: lav...@googlegroups.com <lav...@googlegroups.com> im Auftrag von paul.tho...@gmail.com <paul.tho...@gmail.com>
Gesendet: Mittwoch, 27. April 2022 16:20
An: lavaan <lav...@googlegroups.com>
Betreff: Re: bootstrapping with missing data
 

Terrence Jorgensen

unread,
May 4, 2022, 10:34:59 AM5/4/22
to lavaan
The only part that I can't figure out is extracting 'fitMeasures()' from the lavaanList object.

You need to tell lavaaList() to run fitMeasures() on each fitted model. 

output <- semList(full_mod_syntax , dataList = temp, ndat = 10,
                  store.slots = "all", FUN = fitMeasures)

The output of FUN= is stored in the @funList slot.  You can extract a fit index/indices of interest (from each data set) using sapply().

sapply(output@funList, function(x) x[c("cfi","srmr")] )

This will be a matrix.  Might be easier to extract one index at a time and save it to a vector, depending on what you want to do with the distribution.

paul.tho...@gmail.com

unread,
May 9, 2022, 10:38:29 AM5/9/22
to lavaan
Many thanks Terrence. That's very helpful!
Reply all
Reply to author
Forward
0 new messages