I have a survey design with, let’s say 10 strata, and I want to derive estimates of variance and confidence limits, for each of the strata separately, through a bootstrap approach. Based on responses on the google mailing list, the Windows help files and literature, my understanding of setting up such bootstrap would follow these steps in resampling:
1. I would leave the strata fixed (i.e. not resample strata level)
2. within each stratum I would resample, with replacement, transects to an equal number of samples as in the original strata.
3. repeat 1. and 2. above a large number of times and summarize outputs of each iteration, at stratum level.
In a case where I do not need strata specific estimates, but overall survey variance/CI I would leave out step 1.
However, looking at the source code of the bootdht function a different resampling strategy is used when the arguments in the bootdht function are set to resample_strata = TRUE, resample_transect = TRUE.
1. strata are NOT fixed, but resampled, with replacement in each iteration. The bootstrap sample is balanced in the sense that the number of strata in the resample is equal to the number of strata in the survey (but a random selection of). This resampled strata level set is assigned to the object "bootdat".
2. the second iteration selects samples, with replacement. Resampling is done from the actual survey set (provided as a flatfile = dat) set and not the resampled set of step 1 (bootdat), with a resampled sample size equivalent to the original number of samples. However, the following steps in the code remove any transects not found in the strata selected in step 1. The result is that each resample by strata is unbalanced, 1) the number of transects in each strata is not equal to the survey and 2) the number of total transects in each resample varies.
The approach of bootdht seems rather odd (resampling of strata, with replacement), as it doesn’t really mirror the data collection process when surveying across different strata. Further, when applying a hierarchical resampling strategy (such as strata than samples) it seems counterintuitive to resample from the original set at both levels and end up with varying levels of effort at each resample?
I believe I probably do not fully understand the applied
resampling strategy and maybe somewhat unsure if the option provided in the Distance software/literature
is equivalent to the bootdht function in R. Is there any reference to the type of resampling implemented in the R package? Finally, practically, is setting
resample_strata = TRUE, resample_transect = TRUE the correct way when the objective is to get strata level CI's?
As always, many thanks for your incredible support provided and time, provided through this mailing list! Much appreciated.
I agree with your concern over how to treat strata when resampling. As I read the source code for the `bootdht` function (version 1.0.2), the default for the `resample_strata` argument is FALSE, rather than true as you suggest.
function (model, flatfile, resample_strata = FALSE, resample_obs = FALSE,
resample_transects = TRUE, nboot = 100, summary_fun = bootdht_Nhat_summarize,
convert.units = 1, select_adjustments = FALSE, sample_fraction = 1,
progress_bar = "base")
To obtain variance estimates for
stratum-specific abundance estimates with `bootdht`, I suggest
you use the default arguments of the function, namely
`resample_strata=FALSE` and `resample_transects=TRUE`.
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/CAFTQVoAYsVz02Cze9S0Dd%3D0hZ%3DOLzmf5pX7%2BLxaHGcp6pJmsow%40mail.gmail.com.
-- Eric Rexstad Centre for Ecological and Environmental Modelling University of St Andrews St Andrews is a charity registered in Scotland SC013532