Bootstrapping standardized coefficients

682 views
Skip to first unread message

Itzik Fradkin

unread,
Oct 31, 2017, 9:33:02 AM10/31/17
to lavaan
Dear Lavaan users/experts.

I'm trying to build a pretty simple model, but my main interest is in the correlation between two latent variables (rather than the covariance). I'm interested in both the correlation and it's confidence interval. The thing is that the output does not include a confidence interval, so I'm trying to use bootstrapping. I wanted to ask whether (and how) can I use bootstrapping on the standardized coefficients (e.g. correlation)?

Thanks a lot!
Isaac.

Mauricio Garnier-Villarreal

unread,
Nov 1, 2017, 1:09:58 AM11/1/17
to lavaan
Isaac

Here is an example code for something like that. Note that is you have a CFA and use std.lv=T, the covariance = correlation


####
library(lavaan)

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data=HolzingerSwineford1939, std.lv=T)
summary(fit, fit.measures=TRUE)

myFun <- function(x){
  ## get all parameter estimates, including standardized
  pts <- parameterestimates(x, standardized = T) 
  return(pts[22:24,"std.all"]) ## select the rows for the correlations, the standardized
}

l.boot <- bootstrapLavaan(fit, R=100, type="bollen.stine",
                          FUN=myFun)
l.boot ## is a 3 column matrix, each row is a bootstrap, and each row is a different correlation

## you can estimate the mean, sd, and quantiles with something like this
apply(l.boot, 2, FUN=function(x)c(mean=mean(x),sd=sd(x),quantile(x,probs=c(.025,.5,.975))))


bye

Itzik Fradkin

unread,
Nov 5, 2017, 9:56:05 AM11/5/17
to lavaan
Thanks a lot! That seems to work.

I hope it's ok if I ask two follow up questions:

1) When bootstrapping for the correlation coefficient many of the bootstrapped samples seem to be unreasonable (i.e. <-1 or >1). How would you suggest to deal with that (if my end goal is to obtain a CI for the correlation)?

2) How many bootstrapped samples would you say are necessary? Will this matter as a result of which bootstrapping technique am I using?


Thanks a lot
Isaac.

Felix Fischer

unread,
Nov 6, 2017, 7:38:08 AM11/6/17
to lavaan
Hi,

1) given the general logic of bootstrapping, there should be no out-of-bounds correlations... i am not familiar with bollen-stine, but my best guess would be that you have a misspecified model or you're extracting the wrong parameters.

2) 500 or 1000 are common.

Best, Felix

Terrence Jorgensen

unread,
Nov 6, 2017, 7:45:08 AM11/6/17
to lavaan
1) When bootstrapping for the correlation coefficient many of the bootstrapped samples seem to be unreasonable (i.e. <-1 or >1). How would you suggest to deal with that (if my end goal is to obtain a CI for the correlation)?

Sampling error can lead to values near the border to cross the border.  Is the correlation out of bounds in the original sample?  The bootstrap CI should reflect the true bootstrap sampling distribution, so include all samples.  As long as the CI includes plausible values (correlations with +/- 1), then you cannot reject the null hypothesis that the true population value is a correlation with a reasonable value.


2) How many bootstrapped samples would you say are necessary? Will this matter as a result of which bootstrapping technique am I using?

Since you are using the bootstrap distribution to estimate tail quantities (which are unlikely by definition), you need a lot.  If you are calculating a 95% CI, then you are estimating it from the bottom and top 2.5% of the data.  If you have 1000 bootstrap samples, then your estimate of each confidence limit is based on 25 bootstrap samples.  Would you be comfortable estimating a population mean with only = 25 people, especially when it costs you nothing but computation time to "gather" a larger sample?  I have seen some (e.g., Preach and Hayes) advise 5000 bootstrap samples, which bases the confidence limits on 125 in each tail.  I would be more comfortable that sampling error in that quantity is sufficiently small, regardless of whether the percentile or bias-corrected boostrap CI is calculated.

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Itzik Fradkin

unread,
Nov 9, 2017, 5:55:53 AM11/9/17
to lavaan
Thanks for the detailed answers. I'll explain my model in more detail:

It's a simple structural model where instead of a full measurement model, I have a good estimate of the reliability which I use to specify the residual variances. So let's say I have an observed variables O1 and O2 for N participants, and I want to estimate the true correlation between T1 and T2, as well as the predicted values of T1 and T2 for each participant. I know the reliabilities of both variables, so I can specify:
in r:

E1=(1-rel1)*var(O1)
E2=(1-rel2)*var(O2)

and in lavaan (by using paste to insert E1 and E2):

T1=~1*O1
T2=~1*O2
O1~~E1*O1
O2~~E2*O2


So my question is what would be the best way to get CIs for the correlation rT1T2, and for the individuals' estimates of T1i and T2i?
I'm not sure how the bootstrap function works in lavaan exactly, but I understand now that it might use incorrect estimates of E1 and E2 unless these values are also bootstrapped in some way. I was able to work out a simple bootstrap myself - by sampling with replacement from the dataset, calculating E1 and E2, and running the sem model for R times. This may work for the correlation, but surely this won't work well for getting predicted estimates for individual participants (lavPredict), because in simple bootstrap I get different participants each time...

If anyone has any idea what would be the best way to deal with this issue - I'll be very thankful.

Isaac.
Reply all
Reply to author
Forward
0 new messages