default sample size used when bootstrapping?

699 views
Skip to first unread message

olivier pahud

unread,
Jun 29, 2016, 5:28:23 AM6/29/16
to lavaan
What sample size does lavaan draw per default from my inital sample when I bootstrap with se = "boot"?

Can I define the sample size for each bootstrap iteration?
(how can I define it in the following two examples?)

fit.modelX <- sem(modelX,
                                data      = mydata
                                estimator = "ML",    
                                mimic     = "Mplus",  
                               se = "boot",
                               bootstrap = 5000,
                               verbose = T)

bootstrapLavaan(boot.object, R = 5000L, verbose = T, FUN = "coef")

Terrence Jorgensen

unread,
Jun 30, 2016, 5:15:00 AM6/30/16
to lavaan
What sample size does lavaan draw per default from my inital sample when I bootstrap with se = "boot"?

By definition, it is only bootstrapping if the same sample size is drawn (with replacement).  Drawing only a subset of the sample (albeit without replacement) would be a leave-k-out technique, like the jackknife (leave-1-out). 

Can I define the sample size for each bootstrap iteration?

No.  Although it is possible to imagine a method in which subsets are resampled with replacement, I am not aware of any such method having been developed:


Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Stas Kolenikov

unread,
Jun 30, 2016, 10:20:16 AM6/30/16
to lav...@googlegroups.com
Terrence,

hm?? https://www.amazon.com/Subsampling-Springer-Statistics-Dimitris-Politis/dp/0387988548
-- that stuff has been around for a couple of decades.

Subsampling (drawing samples with replacement of the size smaller than
the original sample) is a common technique that is used when the
"mainstream" bootstrap fails. This happens in many irregular problems
(see e.g. http://onlinelibrary.wiley.com/doi/10.1002/cjs.5550340103/abstract),
so basically the situation where the bootstrap definitely works
properly is limited to the confidence intervals for the mean of the
i.i.d. data (well... if the underlying distribution *does* have a mean
-- Cauchy doesn't, for instance). Nearly any other situation requires
an actual proof that the bootstrap works. Subsampling works by adding
another layer of asymptotics and potentially a special rate of
subsampling (say subsample size k=O(n^{1/3}) or something like that)
when the rate of convergence for the statistic in question makes the
"mainstream" bootstrap fail.

Awkward bootstrap situations in the SEM include testing the overall
fit (for which the Bollen-Stine modification, which should have been
called Beran-Srivastava modification, exists) and testing the zero
variances (and that just fails miserably, at any rate
http://dx.doi.org/10.1037/1082-989X.13.2.150; with references to
Donald Andrews' work on the bootstrap for the boundary situations).
What I continue to be personally uncomfortable with in SEM methodology
are situations where the covariance degrees of freedom exceed the
sample sizes (say a model with 30 variables and {30 \choose 2} = 435
covariances and sample size n=300). I suspect (but can't really
pinpoint) small sample problems in these situations. Any of the
mainstream bootstrap (that on average takes 63.2% of the original
sample points; http://stats.stackexchange.com/q/96739/5739) or
subsampling (that intentionally reduce the sample size) would
potentially only make matters worse. There's a trove of dissertation
research in figuring that stuff out, but it would end up too technical
for the SEM journal but not of general interest to JASA.

-- Stas Kolenikov, PhD, PStat (ASA, SSC)
-- Principal Survey Scientist, Abt SRBI
-- Education Officer, Survey Research Methods Section of the American
Statistical Association
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name
> --
> You received this message because you are subscribed to the Google Groups
> "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lavaan+un...@googlegroups.com.
> To post to this group, send email to lav...@googlegroups.com.
> Visit this group at https://groups.google.com/group/lavaan.
> For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,
Jul 1, 2016, 11:29:44 AM7/1/16
to lavaan
Happy to be corrected, thanks Stas!  That Rodgers (1999) article I linked to is almost 20 years old, but there was obviously work going on already that he wasn't aware of.

In that case, Olivier, you should use the boot package instead of bootstrapLavaan() so that you can define a function that takes a subsample of a generic (i.e., same N) bootstrap sample.  That is, the function you provide to the "statistic" argument of boot() should have arguments for the data and the indices (rows) to resample from.  By default boot() is just taking the same N, so if you save a bootstrap sample in your user-defined function, you can take a subset of that and perform your analysis.  For example,

bFunc <- function(data, indices, subN = nrow(data) - 1) {
  bootData <- data[indices, ]
  subData <- bootData[1:subN, ]
  fit <- lavaan(..., data = subData, ...)
  ...
}

olivier pahud

unread,
Jul 4, 2016, 7:52:18 AM7/4/16
to lavaan
thanks for your answers a literautre suggestions!
Reply all
Reply to author
Forward
0 new messages