Distance sampling- possible issues with small sample size, any recourse?

183 views
Skip to first unread message

John-Lee Walker

unread,
Mar 30, 2024, 6:59:30 PM3/30/24
to unmarked
Hello,

First, I really appreciate the collective support in this group. I have browsed through the topics and am looking for some general advice. I hope I didn't miss a post where this topic was covered. 

We have collected distance sampling data on a rare plant. We surveyed 46 plots totaling 43 km and detected 66 plants. Around half of the plots had no detections, and a couple plots had 5 or 6. We also collected environmental measures at each plot to evaluate factors that may influence detection or density. 

I have been through the model selection process, and end up with a model that has a few covariates for density. When I evaluate model fit using bootstrapping (something I do not fully understand but get the general concept). The fit statistics suggest a poor model fit. I have also looked at an intercept-only model with the hope of getting some decent estimate of density. The density estimate I get is around 0.62 plants / unit area, but the SE is relatively high (0.21). Checking model fit with bootstrapping on the intercept-only model also suggests a poor model fit.

That leaves me wondering if there is any approach I can take to improve the modeling, or if any reliable inferences can be made. I have tried truncating about 5%, and then not truncating at all (the pattern of detection distances looks pretty clean for fitting the detection function). 

Any advice or suggestions would be greatly appreciated. 

Cheers,
John

histogram.jpegimage001 (1).png

Jeffrey Royle

unread,
Mar 30, 2024, 7:15:24 PM3/30/24
to unma...@googlegroups.com
hi John,
 I agree that your detection histogram looks pretty good and probably the detection part of the model fits well, but as often is the case having over-dispersion among plots can be a problem. I would make sure you try the NB and ZIP models in gdistsamp in case that helps, otherwise you probably have to try some "variance inflation" to adjust the SE for the non-fitting model although that just increases your uncertainty, obviously.  But overall a rare population combined with over-dispersion is probably a tough situation in general.  
 As always it's worth doing a simulation study to understand the performance of the model under relevant operating conditions.
regards
andy


--
*** Three hierarchical modeling email lists ***
(1) unmarked (this list): for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology: for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/55ac7e52-3752-4c11-94d6-7eea9512e2d4n%40googlegroups.com.

Marc Kery

unread,
Mar 31, 2024, 1:34:49 AM3/31/24
to unma...@googlegroups.com
Dear John,

two questions:
  • how bad is bad ? Can you give us the summary results from the bootstrap ?
  • and a plant ? That is cool .... what and where if I may ask ?

Happy Easter  --- Marc



From: unma...@googlegroups.com <unma...@googlegroups.com> on behalf of Jeffrey Royle <jar...@gmail.com>
Sent: Sunday, March 31, 2024 00:15
To: unma...@googlegroups.com <unma...@googlegroups.com>
Subject: Re: [unmarked] Distance sampling- possible issues with small sample size, any recourse?

John-Lee Walker

unread,
Apr 3, 2024, 2:42:18 PM4/3/24
to unmarked
Hi Andy and Marc,

Thank you kindly for replying. 

Andy, I will explore the gdistsamp NB and ZIP models, as well as a simulation study and may follow up here. I appreciate your suggestions.   

Marc, I have included summary results from the bootstrapping below. The first is a model with covariates, the second is an intercept only model. The model with variables included one variable for detection (size of rocks) and 5 for density. We had multiple measures of vegetation that were correlated, so I reduced those to a couple principle components. I did the same with several measures of the soil substrate that were correlated. 

The plant is Pima pineapple cactus, a really neat species that is federally protected. We had been involved with a pilot study to test the effectiveness of distance sampling for estimating density. In that study, we surveyed an area with a known, dense population that had been censused (Flesch et al. 2019, Application of distance sampling for assessing abundance and habitat relationships of a rare Sonoran Desert cactus). In our present study, our sample frame included areas that had not been previously surveyed. 

I hope you had a great Easter holiday, and thank you for your feedback. 
-John

Model with covariates

bootstrapping results, model with covariates.jpg

Null model, hazard rate detection function

bootstrapping results, null model.jpg


Jeffrey Royle

unread,
Apr 3, 2024, 9:13:00 PM4/3/24
to unma...@googlegroups.com
hi John,
 This actually looks ok if you ask me. The model with covariates is obviously improving the fit. The fit is not great mainly because of clustering that you mentioned (a few plots with 5 or more plants if I remember).  NB might improve this , ZIP might as well, but overall I think the result is decent.
regards
andy


--
*** Three hierarchical modeling email lists ***
(1) unmarked (this list): for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology: for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.

Marc Kery

unread,
Apr 4, 2024, 12:54:06 AM4/4/24
to unma...@googlegroups.com
Dear John,

thanks for the information. Googled the thing and it looks neat.

About the lack of fit. The ratio of the observed value of a test statistic and its mean in the bootstrap replicates is a measure of amount of extra variability in the data, also called the "overdispersion factor", or c-hat. From your output for the covariate model, and taking the median instead of the mean for simplicity, you get these values for the three discrepancy measures:

> c(85, 279, 62) / c(61, 262, 52)
[1] 1.393443 1.064885 1.192308

In the presence of lack of fit, in capture-recapture modeling but also elsewhere (e.g., in the attached DS example) people often make the (strong) assumption that all lack of fit is simply due to unstructured additional noise in the data. That is, that the mean of the model gets it about right, but that there is more noise around it than what the distributional assumptions of the model allow for. To account for this, they then "stretch" the uncertainty measures (SE, CI) by a function of that c-hat (e.g., multiply SE by sqrt(c-hat)). Folklore in capture-recapture has it that one can do that for milder amounts of c-hat, such as values smaller than about 2 or 3 (the Johnson et al, however, seem to have a value of about 5).

I am not saying that this is always good (although I have done it myself): part of me thinks it's a cheap trick and often there may be at least partly some structural failures in the model that are responsible for lack of fit, and not just innocuous (in terms of the mean) extra noise. But I just want to point out that this is quite standard practice in several fields, plus, your amount of lack of fit does not seem to be brutal, so perhaps you ought not to make your own life too difficult or throw away the data set.

And finally, although I do it myself far too rarely, I always think that when one gets a significant GoF test result, then it would be very rich to investigate the reasons for that further by inspecting residuals of the model and seeing whether there is some structure in them. Perhaps if you see such structure, then that can suggest ways in which you can improve the model (or 'expand it', as Gelman and others call it; see e.g., https://arxiv.org/abs/2011.01808)

Best regards --- Marc




Sent: Thursday, April 4, 2024 03:12

To: unma...@googlegroups.com <unma...@googlegroups.com>
Subject: Re: [unmarked] Distance sampling- possible issues with small sample size, any recourse?
Johnson_etal_Biometrics_2010_distance.sampling - Copy.pdf
Reply all
Reply to author
Forward
0 new messages