Bootstrap estimates, standardized solution, R squares, HTMT, group differences, consistent p-values, and other coefficients with a single bootstrap run

992 views
Skip to first unread message

Christian Arnold

unread,
Jan 24, 2022, 5:44:20 PM1/24/22
to lavaan
Dear lavaan community,

I occasionally read questions about various problems and challenges with the bootstrap procedure. I have taken the liberty of developing a small application that may be able to contribute to eliminating a problem or two. It is important for me to clarify that this is only (only!) a prototype. There is no guarantee that the application will produce correct results. Nevertheless, I am happy to receive any feedback. Criticism. Praise. Ideas. Everything is very welcome. I have attached the application. If you want to test 0it, you need to save it to your working directory and include it in your R script. The procedure is as follows:
(1) Bootstrap with x.boot
x.boot accepts all arguments that bootstrapLavaan processes (except parallel - sorry I'm a WIN user and couldn't test this). Unlike bootstrapLavaan, all bootstrapped lavaan objects are saved. This probably has some advantages (see below), but is memory intensive.  x.boot additionally accepts the argument x.autofill = TRUE (default is FALSE to mimic the lavaan behavior).  x.autofill = TRUE bootstraps until all non-converged bootstrap draws are replaced by converged draws. x.boot also accepts the argument x.disc. Here you can specify a path to the local file system. Please do not use the working directory here and use this feature only if you request a lot of bootstrap draws and the RAM of your machine is not sufficient. "c:/temp/x.boot" would store all bootstrap draws on disk and in the specified folder. Note: this feature is completely experimental. Please report bugs if you wish to use this feature. x.boot can also add more bootstrap draws (bootstrap [... do something] add more bootstrap draws [... do something].
(2) Request coefficients with x.boot.out
x.boot.out accepts the following arguments:
ci = The confidence level.
ci.type = Type of the interval. Specifically, "norm", "basic", "perc" (aka "percentile"), "bc" (aka "bca.simple").
what = For which coefficients should the confidence interval be calculated? Accepted are "est" (estimates) "std.lv", "std.all", "std.nox" (see lavaan parameterEstimates) and all coefficients that lavInspect provides and make sense to bootstrap (examples: "rsquared", "cov.lv", "cor.ov", "resid").
myFUN = Custom Function.
myArgs = Arguments passed to the Custom Function as a list
myFUN.post = Manipulate results and custom output after calculating CI.
p.adjust = Request p-values that are consistent with the confidence intervals. This feature is entirely experimental. I cannot even name any reference for this. Please report any inconsistency!

Here are some examples (Note: Every line of code is at your disposal. You can customize everything and use or change any Custom Function - see below. I stress again: Everything is experimental and I do not guarantee the correctness of the output!


source("x.boot.r")            #Include application

  

HS.model <- "

visual  =~ x1 + x2 + x3

textual =~ x4 + x5 + x6

speed   =~ x7 + x8 + x9

x3 ~~ x5

x4 ~~ x7

x7 ~~ x8

"

 

# Create x.boot.object

set.seed(1)

fit <- sem(HS.model, HolzingerSwineford1939)

HS.boot <- x.boot(fit, R = 400, verbose = TRUE)

summary(HS.boot)

 

 

# Inspect some bootstrap draws

inspect(HS.boot, boot.num = c(1, 9), FUN = lavInspect, args = list("what" = "resid"))

 

 

# 95% BC CI (estimates)

summary(

  x.boot.out(HS.boot, what = "est", ci.type = "bca.simple")

)

 

 

# 95% BC CI (standardized solution)

summary(

  x.boot.out(HS.boot, what = "std.all", ci.type = "bca.simple")

)

 

 

# Bootstrap Multi-Group Model and autofill non-converged bootstrap draws

set.seed(1)

fit.groups <- sem(HS.model, HolzingerSwineford1939, group = "school")

HS.boot.groups <- x.boot(fit.groups, R = 300, verbose = TRUE, x.autofill = TRUE)

HS.boot.groups <- x.boot(HS.boot.groups, R = 100, verbose = TRUE, x.autofill = TRUE)

summary(HS.boot)

 

# 95% BC CI (estimates) with CI consistent p-value

summary(

  x.boot.out(HS.boot.groups, what = "est", ci.type = "bca.simple", p.adjust = TRUE)

)

 

 

# 95% percentile CI for reliability coefficients (semTools)

summary(

  x.boot.out(HS.boot.groups, myFUN = reliability, ci.type = "perc")

)

 

 

# R-squared differences for 2 group models

myRSquareDiff <- function(object) {

  if(lavInspect(object, "ngroups") != 2) stop("Only for multi-group models with 2 groups")

  res <- lavInspect(object, "rsquare")

  c(res[[2]] - res[[1]])

}

 

summary(

  x.boot.out(HS.boot.groups, myFUN = myRSquareDiff, ci.type = "bc", p.adjust = FALSE)

)

 

 

# HTMT for one or more groups

myHTMT <- function(object) {

  res <- list()

  groups = lavInspect(object, "ngroups")

  for(g in 1 : groups) {

    cov <- unclass(lavInspect(object, "sampstat"))[[g]]

    if(typeof(cov) == "list") cov <- cov$cov

    pt <- parameterTable(object)

    model <- paste(apply(pt, 1, function(x) {

      if(x["op"] == "=~" && as.numeric(x["group"]) == g) {

        paste(x["lhs"], x["op"], if(as.numeric(x["free"]) == 0) paste(x["ustart"], "*", sep =""), x["rhs"], sep = "")

      }

      else ""

    }), collapse ="\n")

    res[[if(groups > 1) lavInspect(object, "group.label")[g] else g]] <- htmt(model, sample.cov = cov)

  }

  return(res)

}

 

summary(

  x.boot.out(HS.boot, myFUN = myHTMT)

)

 

 

# The differences of all estimated parameters (could be further developed to mimic the Cheung/Lau invariance test without messy parametrization: https://journals.sagepub.com/doi/abs/10.1177/1094428111421987)

myBootDifEstimatedCoef <- function(object) {

  if(lavInspect(object, "ngroups") != 2) stop("Only for multi-group models with 2 groups")

  res  <- list()

  pe   <- lavInspect(object, "x")

  free <- lavInspect(object, "free")

  for(m in names(pe[[1]])) {

    free.g1   <- replace(replace(free[[1]][[m]], which(free[[1]][[m]] > 0), 1), which(free[[1]][[m]] == 0), NA)

    free.g2   <- replace(replace(free[[2]][[m]], which(free[[2]][[m]] > 0), 1), which(free[[2]][[m]] == 0), NA)

    g1        <- pe[[1]][[m]] * free.g1

    g2        <- pe[[2]][[m]] * free.g2

    res[[m]]  <- g2 - g1

  }

  return(res)

}

 

summary(

  x.boot.out(HS.boot.groups, myFUN = myBootDifEstimatedCoef, remove.na = TRUE)

)

 

 

# Bollen Stine p-value and an alternative calculation of the p-value using the adjusted p-value approach

set.seed(1)

fit.bs <- sem(HS.model, HolzingerSwineford1939)

HS.boot.bs <- x.boot(fit.bs, R = 400, verbose = TRUE, type = "bollen.stine")

 

myChisqDif <- function(object, chisq) {

  fitMeasures(object, "chisq") - chisq

}

 

myBollenStine <- function(object) {

  if(object$args$type != "bollen.stine") stop("Bootstrap type is not suitable.")

  hist(object$c[object$t.names,])

  get.ci <- function(t, alpha) {

    res <- c(

      as.numeric(quantile(t, alpha[1], type = 6, na.rm = TRUE)),

      as.numeric(quantile(t, alpha[2], type = 6, na.rm = TRUE))

    )

  }

  t <- object$c[object$t.names,]

  p.bs <- sum(sapply(t, function(x) if(x > 0) 1 else 0)) / length(t)

  p.adj <- if(p.bs < 1) p.bs else 2

  if(!p.bs %in% c(0, 1)) {

    p.adj = 0.0001

    repeat {

      p.adj <- p.adj + 0.0001

      alpha <- 0.5 * c(p.adj, 2 - p.adj)

      ci.adj <- get.ci(t, alpha)

      if(sign(ci.adj[2]) == -1) break

    }

  }

  out <- paste(

    paste("bias            ", round(object$c["bias", 1], 3), sep = ""),

    paste("se              ", round(object$c["se", 1], 3), sep = ""),

    paste("ci              ", "[", round(object$c["ci.lower", 1], 3), ", ", round(object$c["ci.upper", 1], 3), "]", sep = ""),

    paste("p(adj)          ", format(round(p.adj / 2, 3), nsmall = 3),sep = ""),

    sep = "\n"

  )

  object$print <- paste(out, "\np(Bollen-Stine)", format(round(p.bs, 3), nsmall = 3), sep = " ")

  return(object)

}

 

res <- x.boot.out(

  HS.boot.bs,

  myFUN = myChisqDif,

  myArgs = list(chisq = fitMeasures(fit.bs, "chisq")),

  myFUN.post = myBollenStine

)

 

summary(res)

 

 


x.boot.r

Shu Fai Cheung

unread,
Jan 25, 2022, 12:33:24 AM1/25/22
to lavaan
Thanks for sharing these set of functions!

I also drafted a function for a project, but it does one thing only, generating the percentile bootstrapping CI fpr the standardized solution by extracting the bootstrap estimates already stored in the fit object if `se = "boot` is used:


It is short but it has some weaknesses. For example, it relies on assessing some internal components and also uses methods that are not officially documented or supported. Your functions are more reliable and less likely to break if the internal structure of lavaan changes in the future.

I do not yet have time to continue working on that function. Maybe I will use yours when the needs arise. Thanks.

Regards,
Shu Fai

HG Wells

unread,
Jan 25, 2022, 5:24:01 AM1/25/22
to lavaan
Hello, thank you so much!!!

I really needed this, this is so helpful, thank you so much for your hard work and help.

Emanuele

Christian Arnold

unread,
Jan 29, 2022, 8:58:09 AM1/29/22
to lavaan

lavaan community,

here is an additional example. It is about obtaining confidence intervals for indirect and total effects without having to create self-defined parameters (ab := a*b). This example is completely experimental. Please do not trust the results. I already know some bugs (see below). The source code (get.effects.r) is bad style and urgently needs to be reworked. I still want to share the example and am happy to get feedback. In particular, I'm interested if this solution could add value for you.

 To test, you need to save the attached file (get.effects.r) in your working directory. Here are a few examples:

source("x.boot.r")                  # Include x.boot

source("get.effects.r")             # Include get.effects


# Simulate a crazy model with two groups

pop.model.g1 <- "

f2 ~ 0.3 * f1

f3 ~ 0.5 * f1 + 0.2 * f2

f4 ~ 0.4 * f1 + 0.6 * f2 + 0.1 * f3

f5 ~ 0.1 * f2 + -0.2 * f4

f6 ~ 0.3 * f3 + 0.3 * f4 + -0.3 * f5

"

pop.model.g2 <- "

f2 ~ 0.1 * f1

f3 ~ 0.5 * f1 + 0.5 * f2

f4 ~ 0.3 * f1 + 0.2 * f2 + 0.4 * f3

f5 ~ 0.1 * f2 + -0.2 * f4

f6 ~ 0.3 * f3 + 0.3 * f4 + -0.7 * f5

"

 

s <- 1000

g <- c(rep("Group1", s), rep("Group2", s))

data <- rbind(

  simulateData(pop.model.g1, sample.nobs = 1000),

  simulateData(pop.model.g1, sample.nobs = 1000)

)

data <- cbind(data, g = c(rep("Group1", s), rep("Group2", s)))

 

 

 

# x.boot custom function

myEffects <- function(object, effects) {

  sapply(effects, function(x, object) {

    r <- unname(eval(parse(text = x)))

  }, object)

}

 

# A simple mediation example (single group)

model <- "

f2 ~ f1

f3 ~ f1 + f2

"

set.seed(1)

fit <- sem(model, data[data$g == "Group1",])

boot <- x.boot(fit, R = 400, verbose = TRUE)

 

# x.boot results

x.boot.out(boot, myFUN = myEffects, myArgs = list(effects = get.effects(fit)), ci.type = "bca.simple")

 

# 2 groups mediation model

set.seed(1)

fit.groups <- sem(model, data, group = "g")

boot.groups <- x.boot(fit.groups, R = 400, verbose = TRUE)

 

# x.boot results

x.boot.out(boot.groups, myFUN = myEffects, myArgs = list(effects = get.effects(fit.groups)), ci.type = "bca.simple")

 

 

# Check against "native" lavaan solution

model.lavaan <- "

f2 ~ c(a1, a2) * f1

f3 ~ c(c1, c2) * f1 + c(b1, b2) * f2

G1.IND := a1 * b1

G1.TOT := a1 * b1 + c1

G2.IND := a2 * b2

G2.TOT := a2 * b2 + c2

"

 

fit.groups.native <- sem(model.lavaan, data, group = "g")

pe <- parameterEstimates(fit.groups.native)

pe[pe$op == ":=",]

 

 

# The crazy model (this could make the application interesting)

model.crazy <- "

f2 ~ f1

f3 ~ f1 + f2

f4 ~ f1 + f2 + f3

f5 ~ f2 + f4

f6 ~ f3 + f4 + f5"

 

set.seed(1)

fit.groups.crazy <- sem(model.crazy, data[data$g == "Group1",])

boot.groups.crazy <- x.boot(fit.groups.crazy, R = 400, verbose = TRUE)

 
# x.boot results

x.boot.out(boot.groups.crazy, myFUN = myEffects, myArgs = list(effects = get.effects(fit.groups.crazy)), ci.type = "bca.simple")

 

The last example contains a lot of indirect effects. I did not check if the application could correctly identify all of them. With a simple mediation model it is easy to define new parameters. With the model above I perceive it more difficult and that is why I created this application.

Note: The application does not work if cross group constraints are used. Strictly speaking, the algorithm already fails if any parameter labels are defined. This does not have to be the case. That would be solvable. Group differences are not calculated. This could also be easily implemented. The meanstructure is not taken into account.

Before I completely rework the application (if I have the time), I would appreciate your feedback.

Regards

Christian

get.effects.r

Ibrahim Nasser

unread,
Feb 5, 2022, 4:41:59 PM2/5/22
to lavaan

This should be converted into a package
Message has been deleted

Christian Arnold

unread,
May 6, 2023, 5:43:41 AM5/6/23
to lavaan
x.boot.r
Reply all
Reply to author
Forward
0 new messages