se = "bootstrap" and fixed.x

185 views
Skip to first unread message

Shu Fai Cheung

unread,
Apr 18, 2022, 12:58:14 PM4/18/22
to lavaan
May I ask a quick question about lavaan's bootstrapping se/ci?

- When fixed.x = TRUE, how are the bootstrap samples drawn?

In bootstrapping in multiple regression, if predictors are treated as fixed, then residual bootstrapping will be used. Is this how lavaan does bootstrapping when fixed.x = TRUE (the default of lavaan), fixing the values of x variables across bootstrap samples?

I checked the "se" section of the help page of lavOptions but it does not mention whether fixed.x will be taken into account.

-- Shu Fai

Shu Fai Cheung (張樹輝)

unread,
Feb 2, 2023, 1:19:35 AM2/2/23
to lavaan
This is a follow-up to a question I posted myself a while ago.

I did some experiments and it seems that, when fixed.x = TRUE and se = "bootstrap", the default for a path model, the variances and covariances of x-variables are fixed to their sample values, as expected when they are treated as fixed.

However, how are the y-variables sampled? In multiple regression, if the predictors are treated as fixed, bootstrapping can be done on the residuals. I am not sure whether this is what lavaan does when se = "bootstrap" and fixed.x are used together.

In this setting, for a path model with only observed variables, is the bootstrapping conducted residual bootstrapping, using the residuals of the y-variables as in multiple regression?

If not, does it mean that the y-variables, instead of their residuals, are used in bootstrapping? This sounds strange because it is analogous to doing bootstrapping only on the outcome variable in multiple regression.

-- Shu Fai

Yves Rosseel

unread,
Feb 5, 2023, 10:44:02 AM2/5/23
to lav...@googlegroups.com
On 2/2/23 07:19, Shu Fai Cheung (張樹輝) wrote:
> However, how are the y-variables sampled? In multiple regression, if the
> predictors are treated as fixed, bootstrapping can be done on the
> residuals. I am not sure whether this is what lavaan does when se =
> "bootstrap" and fixed.x are used together.

Nothing special. It just samples the rows of the original sample as
usual, and for each fit, we use fixed.x = TRUE.

In that sense, using fixed.x = TRUE or fixed.x = FALSE has little impact
on the bootstrap standard errors of the regression coefficients. That
also seems true for the analytic standard errors. Consider these three
examples:

fit1 <- sem(model, data = HolzingerSwineford1939, fixed.x = FALSE)
parameterEstimates(fit1) |> subset(op == "~")

fit2 <- sem(model, data = HolzingerSwineford1939, fixed.x = TRUE)
parameterEstimates(fit2) |> subset(op == "~")

fit3 <- sem(model, data = HolzingerSwineford1939, conditional.x = TRUE)
parameterEstimates(fit3) |> subset(op == "~")

which results three times in

lhs op rhs est se z pvalue ci.lower ci.upper
1 x9 ~ x1 0.254 0.051 4.998 0.000 0.155 0.354
2 x9 ~ x2 0.049 0.048 1.025 0.305 -0.045 0.144
3 x9 ~ x3 0.160 0.053 3.004 0.003 0.056 0.265


Having said this, at the beginning of the file lav_bootstrap.R, I wrote
(circa 2012):

"Question: if fixed.x=TRUE, should we not keep X fixed, and bootstrap Y
only, conditional on X??"

So I wondered about this too...

Yves.

Shu Fai Cheung (張樹輝)

unread,
Feb 5, 2023, 11:54:16 AM2/5/23
to lavaan
Thanks a lot for your explanation! So it seems that, for se = "boot", fixed.x = TRUE or FALSE does not matter.

However, would it be an issue when users use bootstrapLavaan() directly, and do something that involves statistics related to the x-variables? This is an illustration:

library(lavaan)
#> This is lavaan 0.6-13
#> lavaan is FREE software! Please report any bugs.

set.seed(89415)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
m <- .7 * x1 + sqrt(1 - .7^2) * rnorm(n)
y <- .4 * m + .3 * x1 + sqrt(1 - .4^2 - .3^2) * rnorm(n)
dat <- data.frame(x1, x2, m, y)
dat <- data.frame(scale(dat, scale = FALSE))
head(dat)
#>           x1          x2           m           y
#> 1  0.1700550  1.62868780  1.01709713  0.72425196
#> 2  0.3696030 -0.05364172  0.59204034  3.25913778
#> 3  1.1762182  0.39409649  2.09291261  0.83745128
#> 4 -0.6597227 -1.48842864  0.07530407 -0.08926538
#> 5  0.6730548 -0.21396188  1.19325476  2.05601816
#> 6 -2.0949939 -0.84634186 -1.65592095 -1.71790067

mod <-
"
m ~ x1 + x2
y ~ m + x1 + x2
"

fit_fixed.x <- sem(model = mod, data = dat, fixed.x = TRUE)

get_x <- function(fit) {
    est_p <- c(1:3, 8:10)
    parameterEstimates(fit)[est_p, "est"]
  }
est_p <- c(1:3, 8:10)
est_names <- apply(parameterEstimates(fit_fixed.x)[, 1:3], 1,
                   paste0, collapse = "")[est_p]
boot_fixed.x <- bootstrapLavaan(fit_fixed.x, R = 100,
                          FUN = get_x,
                          iseed = 598327)
colnames(boot_fixed.x) <- est_names
# Variances and covariances of x1 and x2 in each bootstrap replication
head(boot_fixed.x, 3)
#>           m~x1        m~x2       y~m    x1~~x1     x1~~x2   x2~~x2
#> [1,] 0.6751240 -0.10585125 0.4970804 0.9122536 0.08525006 1.056212
#> [2,] 0.5492521  0.01722119 0.3113083 0.9122536 0.08525006 1.056212
#> [3,] 0.7429272 -0.19092327 0.1215204 0.9122536 0.08525006 1.056212
# Variances and covariances of x1 and x2 in original results
parameterEstimates(fit_fixed.x)[est_p, "est"]
#> [1]  0.62553604 -0.04867653  0.28130799  0.91225364  0.08525006  1.05621202

fit_random.x <- sem(model = mod, data = dat, fixed.x = FALSE)
boot_random.x <- bootstrapLavaan(fit_random.x, R = 100,
                          FUN = get_x,
                          iseed = 598327)
colnames(boot_random.x) <- est_names
# Variances and covariances of x1 and x2 in each bootstrap replication
head(boot_random.x, 3)
#>           m~x1        m~x2       y~m    x1~~x1     x1~~x2   x2~~x2
#> [1,] 0.6751240 -0.10585124 0.4970804 0.8123582 0.17905381 1.005422
#> [2,] 0.5492521  0.01722119 0.3113083 0.8331755 0.08982125 1.132758
#> [3,] 0.7429272 -0.19092328 0.1215204 0.7599992 0.02083548 1.075500


As you explained, the estimates of the free parameters (the three regression coefficients) are the same in each bootstrap sample, whether fixed.x is TRUE or FALSE.

However, the variances and covariances of the x-variables are indeed (correctly) fixed in each bootstrap sample when fixed.x = TRUE.

On the one hand, this is what we expect when we set fixed.x to TRUE.

On the other hand, with seed identical, comparing the corresponding bootstrap samples with fixed.x = TRUE (in boot_fixed.x) and fixed.x = FALSE (in boot_random.x), it sounds strange to have the same parameter estimates for the free parameters given that the variances and covariances of the x-variables are different.

Nevertheless, I do not yet have an idea whether and when things will go wrong. Maybe it is not an issue in common scenarios.

-- Shu Fai

Terrence Jorgensen

unread,
Feb 12, 2023, 2:36:33 AM2/12/23
to lavaan
 it sounds strange to have the same parameter estimates for the free parameters given that the variances and covariances of the x-variables are different.

I agree, a residual bootstrap would make more sense, but wouldn't that also make it consistent with conditional.x=TRUE?  Not sure what a bootstrap would look like with fixed.x=TRUE but conditional.x=FALSE.  Maybe in the case when fixed.x variables are categorical (e.g., experimental design, when X is truly fixed), we bootstrap within each group separately.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Yves Rosseel

unread,
Feb 12, 2023, 8:41:27 AM2/12/23
to lav...@googlegroups.com
We should also make a distinction between using the bootstrap to assess
the variability of point estimates (ie, standard error) on the one hand,
and using the bootstrap to compute a p-value for testing a hypothesis.

I believe that for the former case (standard errors), there should be no
difference between fixed.x = TRUE or fixed.x = FALSE, as the x-structure
is always saturated, and the point estimates for the regression
coefficients will always be the same. (Note that in linear regression,
fixed.x = TRUE is the default, even if the sampling model clearly
dictates that the x-values should change across samples. We use fixed.x
= TRUE anyway because it doesn't matter and it is much more convenient.)

In the latter case (p-value), we need to transform the data (eg using
the Bollen-Stine approach) so that it aligns with the null hypothesis.
In the regression world, this is what they do when they bootstrap the
(model-based) residuals.

Yves.

Yves Rosseel

unread,
Feb 12, 2023, 8:54:29 AM2/12/23
to lav...@googlegroups.com
On 2/5/23 17:54, Shu Fai Cheung (張樹輝) wrote:
> However, would it be an issue when users use bootstrapLavaan() directly,
> and do something that involves statistics related to the x-variables?

Potentially, but then again, one could argue that extracting x-related
parameters is strange if you assume fixed.x = TRUE, as they are not even
parameters in that case.

We can only hope that users will use bootstrapLavaan() wisely in that
respect.

Yves.

Shu Fai Cheung (張樹輝)

unread,
Feb 17, 2023, 8:53:15 PM2/17/23
to lavaan
Thanks, Terrence and Yves, for your comments and advice. This and related issue came to me (again and again) because I taught both AMOS and lavaan, sometimes in the same class, and there is no "fixed.x = TRUE" option in AMOS (users need to manually do this if they want to), leading to differences in some results. I don't know why but it seems that textbooks on SEM rarely talk about this issue (or maybe some do but I missed them).

-- Shu Fai
Reply all
Reply to author
Forward
0 new messages