Interpretation of bootstrap resampling for logB and related issues

21 views
Skip to first unread message

Daniel Palacios

unread,
Oct 5, 2016, 12:38:58 AM10/5/16
to HyperNiche and NPMR
After a few days my machine just returned the results of bootstrap resampling for my "final model", for the purpose of describing the stability the logB statistic. I am following the HyperNiche2 built-in help page on the topic to interpret my own results. Below I'm pasting the relevant content from the example in the help (for xR²) and then my results (for logB):

"Example Bootstrap Results
In the following example, a data set of 868 sample units was sampled 100 times, with replacement, each sample consisting of 868 picks. The variation in fits of the model to the bootstrap samples is described by the mean, variance, extremes, and percentiles of xR². The fit is quite stable, 95% of the time falling within the range of 0.845 to 0.876 (the 5th and 95th percentiles). The median fit was 0.8595."

In my case, my data set has 525 SUs and the estimated logB for my model was 8.2. The bootstrap samples indicated that logB fell within the range of 14.09 to 24.96 (the 5th and 95th percentiles) 95% of the time. The median fit was 19.47, the mean was 19.45, and the variance was 10.16.

Going back to the example, I can see that the median fit from the bootstrap samples falls in the range of the 5th and 95th percentiles. But shouldn't the estimated xR² from the model being evaluated (not listed in the help example) also be included in this comparison? And what is the basis for determining that a fit is "quite stable"? Is it that the range of the percentiles is narrow around the median value of the fits? I'm just trying to find the appropriate words to use in my case for determining the stability of my solution.

In my case, the estimated logB from my model (8.2) is outside the range of the percentiles (14.09-24.96) and the median (19.47). How can I interpret this?

On a related note: my machine failed to conduct the bootstrap resampling when using the overfitting controls used in the original model formulation. Either HyperNiche crashed, or I got an error message about zero variance in one of the variables and advising me to increase sample size or decrease minimum neighborhood size or both. So I decreased the minimum neighborhood size from 5 for the original model to 4 for the bootstrap resampling, and that worked. But my question now is: did this change in minimum neighborhood size affect the values of the median and percentile range? And if this is the case, does this mean that I need to re-fit my original model with a minimum neighborhood size of 4 instead of 5 in order for the value of logB to be comparable with the results from the bootstrap resampling?

Thanks for any advice,

Daniel

Bruce McCune

unread,
Oct 5, 2016, 11:06:16 AM10/5/16
to hyper...@googlegroups.com
Daniel, I pasted your Q's and my A's below:


"But shouldn't the estimated xR² from the model being evaluated (not listed in the help example) also be included in this comparison?"
Yes.

"And what is the basis for determining that a fit is "quite stable"? Is it that the range of the percentiles is narrow around the median value of the fits?"
Yes and yes. Ultimately this is subjective, like "Is R2 = 0.xx big?"


In my case, the estimated logB from my model (8.2) is outside the range of the percentiles (14.09-24.96) and the median (19.47). How can I interpret this?
Usually this would mean that the conditions for fitting the bootstrap model differ from those used for the original model. Need to consider not just the settings on the Model Options tab, but also the "minimum neighborhood size for an estimate" on the Output Options tab.

Either HyperNiche crashed, or I got an error message about zero variance in one of the variables and advising me to increase sample size or decrease minimum neighborhood size or both.
Some data sets just don't work well with bootstrapping. To create an extreme artificial example: say that many rows of the data are identical and the response variable has 5 presences and 20 absences. There would be a high likelihood that a bootstrap sample would have the case of all the presences having identical values for the predictors.

Did this change in minimum neighborhood size affect the values of the median and percentile range?
Yes.

And if this is the case, does this mean that I need to re-fit my original model with a minimum neighborhood size of 4 instead of 5 in order for the value of logB to be comparable with the results from the bootstrap resampling?
Well, you could go that route, but I would personally rather use what I consider the best model for a given data set, than change the conditions of the model to get the bootstrap to work.

Bruce


--
You received this message because you are subscribed to the Google Groups "HyperNiche and NPMR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hyperniche+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Palacios

unread,
Oct 5, 2016, 2:08:16 PM10/5/16
to HyperNiche and NPMR
Bruce,

Thank you for your kind and very informative response!

I would also rather use what I consider the best model. This model used a minimum neighborhood size for estimate of 5, while I had to decrease it to 4 in order to get the bootstrap resampling to work. In trying to decide what to report, because of the different conditions between model and resampling, should I just not report the results of the bootstrap resampling for purposes of assessing the stability of logB? Otherwise, because my estimate of logB is outside the range of the bootstrap estimates I'm not sure how to report this result or if it is useful.

Also, I just started running the bootstrapped variability bands, and likewise requested a minimum neighborhood size of 4 instead of 5 (because otherwise the procedure fails). So I'm wondering if these variability bands will not be relevant because the conditions of the model and of the resampling are different?

Will I not be able to use bootstrapping to support my results at all, then?

Thanks!

Daniel
To unsubscribe from this group and stop receiving emails from it, send an email to hyperniche+...@googlegroups.com.

Bruce McCune

unread,
Oct 6, 2016, 10:53:28 AM10/6/16
to hyper...@googlegroups.com
Daniel, I don't understand why for your particular data set, but it does sound like the bootstrapping is of limited use. I would hesitate to include bootstrap results for a different model than the one you actually decided to use.
Bruce

To unsubscribe from this group and stop receiving emails from it, send an email to hyperniche+unsubscribe@googlegroups.com.

Daniel Palacios

unread,
Oct 6, 2016, 1:28:49 PM10/6/16
to hyper...@googlegroups.com
Thank you for this helpful clarification, Bruce.

Daniel

--
You received this message because you are subscribed to a topic in the Google Groups "HyperNiche and NPMR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hyperniche/oqwgZcGqDEc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hyperniche+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Daniel M. Palacios, Ph.D.
Assistant Professor (Sr. Res.)
Marine Mammal Institute, Oregon State University
Hatfield Marine Science Center
2030 SE Marine Science Drive
Newport, OR 97365, USA

Phone: 541-990-2750
MMI Profile | F&W Profile | 

Daniel Palacios

unread,
Oct 9, 2016, 11:37:51 AM10/9/16
to HyperNiche and NPMR
Hi Bruce,

Thanks for your helpful feedback. I have looked into this a little bit further and have the following follow-up questions/comments, marked with ">" below:



"In my case, the estimated logB from my model (8.2) is outside the range of the percentiles (14.09-24.96) and the median (19.47). How can I interpret this?"
Usually this would mean that the conditions for fitting the bootstrap model differ from those used for the original model. Need to consider not just the settings on the Model Options tab, but also the "minimum neighborhood size for an estimate" on the Output Options tab.
> I did indeed decrease the "minimum neighborhood size for an estimate" from 5 to 4 for the resampling, as otherwise the procedure was failing. The rest of the settings remained as in the original model. More on this below...



"Either HyperNiche crashed, or I got an error message about zero variance in one of the variables and advising me to increase sample size or decrease minimum neighborhood size or both."
Some data sets just don't work well with bootstrapping. To create an extreme artificial example: say that many rows of the data are identical and the response variable has 5 presences and 20 absences. There would be a high likelihood that a bootstrap sample would have the case of all the presences having identical values for the predictors.
> My data set has 525 SUs and the model was set up manually with a binary response (local mean), data:predictor ratio = 5, improvement criterion = 5%, minimum average neighborhood size (N*) = 21 (0.04 x SUs), minimum neighborhood size required for estimate = 5. Under these conditions the model returned logB = 8.2, N° = 25.6, and AUC = 0.65. It indicated that 35 SUs were in empty or too small neighborhoods and that 490 SUs were in populated neighborhoods. Of these 490 SUs, 240 SUs had presence values and 250 had absence values. Does this help in assessing whether this data set is not well suited for bootstrapping? For what is worth, the results from the randomization test appeared to be solid:

8.200 = Fit to REAL DATA
Fit to RANDOMIZED DATA
20 = total number of runs
0 = runs equal to or better than observed fit
20 = runs with less than observed fit
0 = runs resulting in missing value indicator for fit
3.4723     = best fit from randomized data
-0.44432     = worst fit from randomized data
0.51761     = mean fit from randomized data
0.04761905 = p = proportion of randomized runs with fit > or = observed fit, i.e., p  = (1 + no. runs >= observed)/(1 + no. runs)



"And if this is the case, does this mean that I need to re-fit my original model with a minimum neighborhood size of 4 instead of 5 in order for the value of logB to be comparable with the results from the bootstrap resampling?"
Well, you could go that route, but I would personally rather use what I consider the best model for a given data set, than change the conditions of the model to get the bootstrap to work.
> My rationale for arriving at the model set up described above was to first run free searches under conservative, medium, and aggressive automatic scenarios, and assess the best models meeting these conditions that maximized logB. I selected a 4-predictor model and, after tunning and visually assessing the response curves/surfaces, I decided to settle on a model with manual controls somewhere between medium and aggressive, per the set up given above. Beyond that, I don't have a required value for minimum neighborhood size (or for any of the other overfitting controls), and I wouldn't think that a difference of 1 in minimum neighborhood size would have these repercussions.

Indeed, I have created a few new models from scratch with minimum neighborhood size = 4, and they all return logB estimates in the range 7.9-8.28 (even after tuning), so I don't understand why the resampled logB consistently gave much higher values than the estimated logB (the resampled minimum was 12.37, so the original model's logB was not even in the min-max range). Is it possible that I'm misinterpreting the output of the bootstrap resampling?


Thanks very much!


Daniel


On Wednesday, October 5, 2016 at 8:06:16 AM UTC-7, Bruce McCune wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to hyperniche+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages