Hi Bruce,
Thanks for your helpful feedback. I have looked into this a little bit further and have the following follow-up questions/comments, marked with ">" below:
"In my case, the estimated logB from my model (8.2) is outside the range
of the percentiles (14.09-24.96) and the median (19.47). How can I
interpret this?"
Usually this
would mean that the conditions for fitting the bootstrap model differ
from those used for the original model. Need to consider not just the
settings on the Model Options tab, but also the "minimum neighborhood
size for an estimate" on the Output Options tab.
> I did indeed decrease the "minimum neighborhood size for an estimate" from 5 to 4 for the resampling, as otherwise the procedure was failing. The rest of the settings remained as in the original model. More on this below...
"Either HyperNiche crashed, or I got an error message about
zero variance in one of the variables and advising me to increase sample
size or decrease minimum neighborhood size or both."
Some
data sets just don't work well with bootstrapping. To create an extreme
artificial example: say that many rows of the data are identical and
the response variable has 5 presences and 20 absences. There would be a
high likelihood that a bootstrap sample would have the case of all the
presences having identical values for the predictors.
> My data set has 525 SUs and the model was set up manually with a binary response (local mean), data:predictor ratio = 5, improvement criterion = 5%, minimum average neighborhood size (N*) = 21 (0.04 x SUs), minimum neighborhood size required for estimate = 5. Under these conditions the model returned logB = 8.2, N° = 25.6, and AUC = 0.65. It indicated that 35 SUs were in empty or too small neighborhoods and that 490 SUs were in populated neighborhoods. Of these 490 SUs, 240 SUs had presence values and 250 had absence values. Does this help in assessing whether this data set is not well suited for bootstrapping? For what is worth, the results from the randomization test appeared to be solid:
8.200 = Fit to REAL DATA
Fit to RANDOMIZED DATA
20 = total number of runs
0 = runs equal to or better than observed fit
20 = runs with less than observed fit
0 = runs resulting in missing value indicator for fit
3.4723 = best fit from randomized data
-0.44432 = worst fit from randomized data
0.51761 = mean fit from randomized data
0.04761905 = p = proportion of randomized runs with fit > or = observed fit, i.e., p = (1 + no. runs >= observed)/(1 + no. runs)
"And if this is the case, does this mean that I need to
re-fit my original model with a minimum neighborhood size of 4 instead
of 5 in order for the value of logB to be comparable with the results
from the bootstrap resampling?"
Well,
you could go that route, but I would personally rather use what I
consider the best model for a given data set, than change the conditions
of the model to get the bootstrap to work.
> My rationale for arriving at the model set up described above was to first run free searches under conservative, medium, and aggressive automatic scenarios, and assess the best models meeting these conditions that maximized logB. I selected a 4-predictor model and, after tunning and visually assessing the response curves/surfaces, I decided to settle on a model with manual controls somewhere between medium and aggressive, per the set up given above. Beyond that, I don't have a required value for minimum neighborhood size (or for any of the other overfitting controls), and I wouldn't think that a difference of 1 in minimum neighborhood size would have these repercussions.
Indeed, I have created a few new models from scratch with minimum neighborhood size = 4, and they all return logB estimates in the range 7.9-8.28 (even after tuning), so I don't understand why the resampled logB consistently gave much higher values than the estimated logB (the resampled minimum was 12.37, so the original model's logB was not even in the min-max range). Is it possible that I'm misinterpreting the output of the bootstrap resampling?
Thanks very much!
Daniel
On Wednesday, October 5, 2016 at 8:06:16 AM UTC-7, Bruce McCune wrote: