Rare Species Modeling

Skip to first unread message

Lianne Koczur

Aug 30, 2021, 5:05:40 PM8/30/21
to distance-sampling
I have point count distance sampling data collected from 32 points. Each point was surveyed 1-3 times (most points were surveyed 3 times). Only 3 of the 35 species recorded have more than 60 observations, the rest have sample sizes ranging from 5 to 56.

I would like to estimate abundance of all species, and saw that I can do this by combining all data and including species as a covariate. My questions are:

Is there a minimum number of observations needed for inclusion in the analysis? For example, is 5 too few observations? Could I include those species and examine the estimates/confidence intervals to determine if they should be removed?

Is there a limit to the number of 'rare' species that should be included? In my case, is 32 of the 35 total species too many?

Thanks for your help! 

Eric Rexstad

Aug 31, 2021, 3:32:25 AM8/31/21
to Lianne Koczur, distance-sampling

Good morning Lianne

I'm afraid there aren't simple answers to your "cutoff" questions.  Quite a few factors feed into the determination of adequate modelling of the detection function.

Starting from the beginning: there is nothing sacred regarding the "60 observation" rule of thumb mentioned in Buckland et al. (2001).  If you read Section 7.2.2 closely, you will see this comment regarding point transect data

Sample size in point transects can be misleading.  One might detect 60 objects from surveying $k$ points and believe this sample contains a great deal of information about density.  However, the area sampled increases with the square of distance, so that many of the observations are actually in the tail of $g(r)$ where detection probability is low.  Detections at some distance from the point may be numerous partially because the area sampled is relatively large.  Thus, sample size must be somewhat larger for point transect surveys than line transect surveys.  As a rough guideline, the sample size for point transects should be approximately 25% larger than for line transect surveys to attain the same level of precision.  This suggests a minimum sample size of around 75-100 for estimating a detection function, or average density within a study area.

The point in the study to consider adequacy of sample size is during survey design (when formula can be used to guide effort needed to achieve desired precision) rather than at time of analysis.  If rare species are the focus of your investigation, design the surveys such that the inference for those species is sound.  If resources do not permit the level of effort for those inferences to be sound, redefine the objectives and abandon hope of making inference for the rare species.

Using species as covariates to produce species-specific detection functions is useful, but again not a panacea.  The premise of this covariate approach is that multiple species analysed in this fashion share a common detection function shape (hazard or half normal) but that the basic shape is altered as a function of the covariate.  Hence it would be inappropriate to get far combining a species with a hazard rate and a species with a half normal via use of the covariate.  This is the point at which a challenge arises: if a species has only 5 detections, it is improbable we can determine whether those detections follow a hazard or half normal shape.  Therefore, as a practical matter, "enough" detections are needed to make an educated guess whether the underlying shape is hazard or half normal.

Your final question regarding number of rare species should also be viewed through the lens of which species share a common key function.  If the three species that exceed the 60 observation threshold all have half normal key functions, then you will struggle with species with few detections that might have hazard rate key functions.

The discussion of sufficient number of detections has a subjective element.  If I were reviewing a manuscript attempting to make inference for 35 species where the sample was dominated by a small number of species (~10) with most of the detections, I would suggest the inference for the other 25 species is likely weak.  As a researcher, you decide whether to dilute the strength of your findings with weak inference for under-represented species.

You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/fc26cb7b-f3e2-46f2-8301-42de64da4e92n%40googlegroups.com.
Eric Rexstad
Centre for Ecological and Environmental Modelling
University of St Andrews
St Andrews is a charity registered in Scotland SC013532

Tiago Marques

Aug 31, 2021, 6:07:30 AM8/31/21
to Eric Rexstad, Lianne Koczur, distance-sampling
Hi Lianne,

As I was reading Eric's message I was further thinking about a different point. It is hard to provide black and white answers in a world that is painted with a grey palette with many tones. Inferences based on a small number of observations will be problematic not only because of the detection function estimation - and Eric's answer covered all of that, and just because you can fit a model it does not mean you should ;) - but also because of the encounter rate component, and further, group size if that's involved (from your text nothing tells me that there is such a component, so I'll ignore it, but the same thinking below would apply).

Finding 2 or 6 individuals of a rare species means that you will increase/reduce by about 3 it's density estimate, given a constant detection function. But I think you will agree with me, for low density species, finding say 2, 3, 4, 5 or 6 animals in a given survey tells you very little about real density, it is much more about the randomness in small sample sizes and the specific day you surveyed each line, "contaminated" with a randomness we really can't control. I mean, 2000 observations versus 6000 observations tells me that density is probably very different, but 2 vs 6, while being the same proportionally, tells me little. So do keep that in mind even before thinking about the detection function component. You could get a really high precision for a low sample size species, especially if you get a low encounter rate variance across lines, just by chance, and then use a pooled (and potentially unrepresentative) detection function, and that might just be a fluke. It's a leap of faith to know if that is real or not. No stats will change that, it's about critical thinking on the process that leads from observations to density estimates. Sometimes when you want to know everything about everything you end up with nothing about most of it. As Eric hinted, what is best? To report sound results about some species or reporting estimates about all that are unreliable for most? That is a question for you in the end ;)



Mark Wilson

Aug 31, 2021, 6:31:35 AM8/31/21
to Tiago Marques, Eric Rexstad, Lianne Koczur, distance-sampling
Just one further thought about this. Everything Tiago wrote makes excellent sense, and I like the specific example of 2 v 6 versus 2000 versus 6000. I just wanted to add that density estimates based on such tiny samples might still be worthwhile if you are going to be working with many of them. So, while a comparison between two counts of 2 v 6 is demonstrably laughable in most sampling situations, a comparison between 20 count (or density) estimates with mean of around 2 and 20 with mean of around 6 will often (depending on variation between samples in each group) be much more robust. In this kind of scenario, you can obviously take advantage of pooled detections across different samples when generating detection functions. And it may be that the modelling frameworks available in Distance (or other distance analysis software) allow the comparisons or analyses you want to carry out to wrapped up in the Distance analysis itself. If not, however, then particularly when controlling for the influence of variables like vegetation structure or species identity on detection function, I think there may be a place for turning small counts into densities before using them in another modelling environment.



Dr Mark Wilson
Senior Research Ecologist
BTO Scotland

Unit 15
Beta Centre
Innovation Park
University of Stirling

Tel: 01786 458024

Registered Charity No SC039193 (Scotland) and No 216652 (England & Wales)
Company Limited by Guarantee No 357284 (England & Wales)
Reply all
Reply to author
0 new messages