Rare Species Modeling

Skip to first unread message

Lianne Koczur

Aug 30, 2021, 5:05:40 PM8/30/21
to distance-sampling
I have point count distance sampling data collected from 32 points. Each point was surveyed 1-3 times (most points were surveyed 3 times). Only 3 of the 35 species recorded have more than 60 observations, the rest have sample sizes ranging from 5 to 56.

I would like to estimate abundance of all species, and saw that I can do this by combining all data and including species as a covariate. My questions are:

Is there a minimum number of observations needed for inclusion in the analysis? For example, is 5 too few observations? Could I include those species and examine the estimates/confidence intervals to determine if they should be removed?

Is there a limit to the number of 'rare' species that should be included? In my case, is 32 of the 35 total species too many?

Thanks for your help! 

Eric Rexstad

Aug 31, 2021, 3:32:25 AM8/31/21
to Lianne Koczur, distance-sampling

Good morning Lianne

I'm afraid there aren't simple answers to your "cutoff" questions.  Quite a few factors feed into the determination of adequate modelling of the detection function.

Starting from the beginning: there is nothing sacred regarding the "60 observation" rule of thumb mentioned in Buckland et al. (2001).  If you read Section 7.2.2 closely, you will see this comment regarding point transect data

Sample size in point transects can be misleading.  One might detect 60 objects from surveying $k$ points and believe this sample contains a great deal of information about density.  However, the area sampled increases with the square of distance, so that many of the observations are actually in the tail of $g(r)$ where detection probability is low.  Detections at some distance from the point may be numerous partially because the area sampled is relatively large.  Thus, sample size must be somewhat larger for point transect surveys than line transect surveys.  As a rough guideline, the sample size for point transects should be approximately 25% larger than for line transect surveys to attain the same level of precision.  This suggests a minimum sample size of around 75-100 for estimating a detection function, or average density within a study area.

The point in the study to consider adequacy of sample size is during survey design (when formula can be used to guide effort needed to achieve desired precision) rather than at time of analysis.  If rare species are the focus of your investigation, design the surveys such that the inference for those species is sound.  If resources do not permit the level of effort for those inferences to be sound, redefine the objectives and abandon hope of making inference for the rare species.

Using species as covariates to produce species-specific detection functions is useful, but again not a panacea.  The premise of this covariate approach is that multiple species analysed in this fashion share a common detection function shape (hazard or half normal) but that the basic shape is altered as a function of the covariate.  Hence it would be inappropriate to get far combining a species with a hazard rate and a species with a half normal via use of the covariate.  This is the point at which a challenge arises: if a species has only 5 detections, it is improbable we can determine whether those detections follow a hazard or half normal shape.  Therefore, as a practical matter, "enough" detections are needed to make an educated guess whether the underlying shape is hazard or half normal.

Your final question regarding number of rare species should also be viewed through the lens of which species share a common key function.  If the three species that exceed the 60 observation threshold all have half normal key functions, then you will struggle with species with few detections that might have hazard rate key functions.

The discussion of sufficient number of detections has a subjective element.  If I were reviewing a manuscript attempting to make inference for 35 species where the sample was dominated by a small number of species (~10) with most of the detections, I would suggest the inference for the other 25 species is likely weak.  As a researcher, you decide whether to dilute the strength of your findings with weak inference for under-represented species.

You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/fc26cb7b-f3e2-46f2-8301-42de64da4e92n%40googlegroups.com.
Eric Rexstad
Centre for Ecological and Environmental Modelling
University of St Andrews
St Andrews is a charity registered in Scotland SC013532

Tiago Marques

Aug 31, 2021, 6:07:30 AM8/31/21
to Eric Rexstad, Lianne Koczur, distance-sampling
Hi Lianne,

As I was reading Eric's message I was further thinking about a different point. It is hard to provide black and white answers in a world that is painted with a grey palette with many tones. Inferences based on a small number of observations will be problematic not only because of the detection function estimation - and Eric's answer covered all of that, and just because you can fit a model it does not mean you should ;) - but also because of the encounter rate component, and further, group size if that's involved (from your text nothing tells me that there is such a component, so I'll ignore it, but the same thinking below would apply).

Finding 2 or 6 individuals of a rare species means that you will increase/reduce by about 3 it's density estimate, given a constant detection function. But I think you will agree with me, for low density species, finding say 2, 3, 4, 5 or 6 animals in a given survey tells you very little about real density, it is much more about the randomness in small sample sizes and the specific day you surveyed each line, "contaminated" with a randomness we really can't control. I mean, 2000 observations versus 6000 observations tells me that density is probably very different, but 2 vs 6, while being the same proportionally, tells me little. So do keep that in mind even before thinking about the detection function component. You could get a really high precision for a low sample size species, especially if you get a low encounter rate variance across lines, just by chance, and then use a pooled (and potentially unrepresentative) detection function, and that might just be a fluke. It's a leap of faith to know if that is real or not. No stats will change that, it's about critical thinking on the process that leads from observations to density estimates. Sometimes when you want to know everything about everything you end up with nothing about most of it. As Eric hinted, what is best? To report sound results about some species or reporting estimates about all that are unreliable for most? That is a question for you in the end ;)



Mark Wilson

Aug 31, 2021, 6:31:35 AM8/31/21
to Tiago Marques, Eric Rexstad, Lianne Koczur, distance-sampling
Just one further thought about this. Everything Tiago wrote makes excellent sense, and I like the specific example of 2 v 6 versus 2000 versus 6000. I just wanted to add that density estimates based on such tiny samples might still be worthwhile if you are going to be working with many of them. So, while a comparison between two counts of 2 v 6 is demonstrably laughable in most sampling situations, a comparison between 20 count (or density) estimates with mean of around 2 and 20 with mean of around 6 will often (depending on variation between samples in each group) be much more robust. In this kind of scenario, you can obviously take advantage of pooled detections across different samples when generating detection functions. And it may be that the modelling frameworks available in Distance (or other distance analysis software) allow the comparisons or analyses you want to carry out to wrapped up in the Distance analysis itself. If not, however, then particularly when controlling for the influence of variables like vegetation structure or species identity on detection function, I think there may be a place for turning small counts into densities before using them in another modelling environment.



Dr Mark Wilson
Senior Research Ecologist
BTO Scotland

Unit 15
Beta Centre
Innovation Park
University of Stirling

Tel: 01786 458024

Registered Charity No SC039193 (Scotland) and No 216652 (England & Wales)
Company Limited by Guarantee No 357284 (England & Wales)


Jan 25, 2023, 3:48:21 PMJan 25
to distance-sampling
Hi all,

I'd like to add onto this thread with an additional question.  Recognizing that there is nothing sacred about the 60 or 75 minimum observation threshold, is there any further analysis or metric we should report about our models if we choose a lower minimum observation threshold?  I'm asking because I am currently attempting distance sampling density estimation of forest bird point count data, where there are many instances of 50-70 observations.  So, if I chose a 50 observation threshold, it makes a big difference vs. 75.  The models with 50-70 observations do fit the data and produce estimates that make sense, with reasonable precision.  

Thanks for any thoughts/comments.



Jan 25, 2023, 8:36:11 PMJan 25
to distance-sampling
Also, I should note that the "global" detection function was determined using pooled data that had well over 75 observations. My analysis is set up so that the global detection function is applied to observations in each individual year in order to get annual density estimates. Is there any justification in using a slightly lower minimum observation threshold (50) for each individual year, since well over 75 observations were used to estimate the global detection function?


Eric Rexstad

Jan 26, 2023, 4:50:19 AMJan 26
to leob...@gmail.com, distance-sampling

Your questions about numbers of detections to fit reliable detection function models to point transect data in a multispecies setting has several facets. It is tricky to give universal answers.  Let's explore a few scenarios regarding making robust inference about abundance of multiple species (or multiple years):
  • fit separate detection function models to each unit (species or year) for which estimates are required, each unit has a sufficient (~75) number of detections. This should result in sound inference.
    • fit separate detection function models to each unit for which estimates are required with insufficient (<50) detections. This is likely to result in weak inference, unless you were lucky.
  • fit a detection function model to all detections using unit (species or year) as a covariate. Inference is likely to be sound, but as Lianne's question in 2021 suggested, this can be taken to the extreme (some units with <5 detections) which will result in weak inference.
    • this approach carries the tacit assumption that all units share a common key function and differ only in the sigma parameter. This assumption, as I noted in 2021, cannot reliably be tested, particularly for units (species or year) with small numbers of detections.
  • fit a pooled detection function model merging data from all units. Couple the pooled detection model with the unit-specific encounter rates to produce unit-specific abundance or density estimates.
    • this approach will produce biased unit-specific estimates in all situations except the situation wherein units share the same detection function.
    • the magnitude of the bias in unit-specific estimates is proportional to the divergence in unit-specific detection functions.
    • we recently published a paper exploring this third approach (and comparing it to the other two approaches) in Ecology and Evolution:
      • Rexstad, E., Buckland, S., Marshall, L., & Borchers, D. (2023). Pooling robustness in distance sampling: Avoiding bias when there is unmodelled heterogeneity. Ecology and Evolution, 13, e9684. https://doi.org/10.1002/ece3.9684 
Sorry there is not a formulaic answer to the question of "how many detections are sufficient", but the ideas here may give you some issues to consider.

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of leob...@gmail.com <leob...@gmail.com>
Sent: 26 January 2023 01:36
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] Rare Species Modeling
Reply all
Reply to author
0 new messages