Regarding model selection in CDS

Deepti Gupta

unread,

Feb 17, 2025, 5:14:05 AMFeb 17

to distance-sampling

Hello,

Greetings!

I have run CDS analysis for line transect data with single observer configuration. The species is found in clusters in undulating terrain hence the detectability is good close to the line and drops drastically after a certain distance. In this case, there were 47 observations with maximum sighting distance of 80 meters. Out of 47 sightings, 45 are within 30 meters of distance. The cluster size is 1 to 20 individuals. I have run a few combinations (screenshot is attached).

The AIC values of most of the models are same but the density estimates differs. I have attached the data distribution for further understanding of the data. Could you please suggest a strategy to select a model?

Thanks & Regards

Deepti Gupta

help with model selection.jpg

cluster size.jpeg

distance.tiff

Eric Rexstad

unread,

Feb 17, 2025, 6:12:38 AMFeb 17

to Deepti Gupta, distance-sampling

Deepi

Each time you change the truncation distance (which I see you have done with your analyses), the AIC scores are not comparable between analyses using different truncations.

My suggestion is to first decide upon a truncation distance; I would suggest 30m, ignoring the two most distant detections.

Once truncation distance is decided, then examine different model fits to the truncated data.

Given the range of observed cluster sizes, you will want to counteract the possible effect of size bias in your estimates of abundance of individuals. This could either be done using regression to estimate mean size of groups in the population or including groups size as a covariate in the detection function modelling. These two strategies are discussed and demonstrated here

https://distancesampling.org/online-course/09-clusters/clusterslanding.html

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Deepti Gupta <rics....@gmail.com>
Sent: 17 February 2025 10:14
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] Regarding model selection in CDS

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/distance-sampling/eb22b930-da27-48d0-839d-30af49ecb06fn%40googlegroups.com.

Rose Delamare

unread,

Oct 21, 2025, 1:53:42 PMOct 21

to distance-sampling

Hi everyone,

I seem to have the same problem as mentioned here.

I am studying the evolution of a hedgehog population through 20 years. I have data from 2006-2007, and used the same protocole in 2024-2025, and my purpose is to compare both density estimations.

For 2024-2025 I made 3 models using the 3 key function, with each there best adjustment term. They all have the same truncation, and no other parameters.

2 of them suggest very similar density (9.9 ± 3.5 hedgehogs / km2 and 10.4 ± 3.7 hedgehogs / km2), but the third, with the best AIC suggests a very different estimation : 16.9 ± 8.2 hedgehogs.

The plot of this last model seems weird to me (see graph), but maybe it's just because I am not so familiar with this types of graph.

The reason I am not comfortable choosing this model, is that the density estimation for 2006-2007 was 19 hedgehogs / km2, based on 103 observations, but I have only 56 observations in 2024-2025.

If the current density estimation is 16/km2, then the explanation would be that there isn't less hedgehog, but the detection was worse for 2024-2025 ?

Which seems weird to me, as I can't fins any field explanation for this ...

Could you help me choose the best model ?

Thank you for your time.

Regards,

Rose Delamare

PhD student

Université de Reims Champagne-Ardenne

Eric Rexstad

unread,

Oct 22, 2025, 3:09:48 AMOct 22

to Rose Delamare, distance-sampling

Rose

Welcome to the list.

Although you do not name the model with the smallest AIC (for which you show the fitted detection function model), I have a strong suspicion the chosen model is the hazard rate.

The other suspicion I have is that there is a surplus of detections at very small distances. This would be easier to see if the histogram bars were narrower (you can use the nc= argument to plot() to better show the distribution of detection distances below 10m. It is difficult to accurately count the number of circles in the plot you provide, but I count 6-8 detections made at distances <5m (that is more than 10% of your total detections)."Spikes" in the distribution of detection distances near zero create problems for fitting detection function models, leading to awkward model choices.

You are correct to be suspicious of this fitted hazard rate model. Observe this model (from the plot) predicts detectability is perfect at distance 0 (an assumption of conventional distance sampling), yet predicted detection probability drops to ~0.5 by 15m. You should ask whether it is biologically plausible for detection probability to fall so rapidly.

By employing the hazard rate detection function model that falls so rapidly with distance (perhaps inappropriately), the chosen model estimates average detection probability out to the truncation distance to be quite small. An underestimate in average detection probability produces an overestimate of population size.

In summary your data possesses a surplus of detections at small distances (you'll have to understand why that is—making detections along trails? inexact distance measurements?...). As a result AIC tries to fit that spike by using a flexible model that produces an implausible shape, leading to an overestimation of population size. Recognise AIC is a "guide" for model selection, not a "rule." I suggest you employ one of the other key function models, perhaps the half normal as the basis for your inference. You should however, assess if the half normal model fits your data; there is a danger that the half normal will not fit the "spike". You will then need to discuss why you make inference from a model that fails to fit the observed data.

Learn more about this from our online training materials

https://distancesampling.org/online-course/03-criticism/criticismlanding.html

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Rose Delamare <delam...@gmail.com>
Sent: 21 October 2025 17:15
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] Regarding model selection in CDS

To view this discussion visit https://groups.google.com/d/msgid/distance-sampling/a30092d5-37eb-4274-ab4c-bb582a9e7c55n%40googlegroups.com.

Rose Delamare

unread,

Oct 23, 2025, 10:18:35 AMOct 23

to distance-sampling

Thank you Eric for your insights.

Indeed my "weird" model is hazard-rate, but ins't it supposed to handle best the spike in short distances ?

I can't really understand why 2024-25 observations are spread so unevenly when observations from 2006-07 have a much wider shoulder.

I wonder if it's simply because I lack of observations, and therefore the distribution is impacted.

I watched the course you suggested, but I can't find any satisfactory solution to fit my data. I tried to binned it, but it only increased the estimation with hazard-rate function.

As you said, the uniform and half-normal models can't really handle the spike, and suggest detection probability of 1.5 between 0 and 10m ...

Is it enough to eliminates those models ?

And all of them seem robust enough, they all pass the goodness of fit tests (hn, hr and unit, with or without binned data).

Would you have any other suggestions for me ?

Thank you again.

Regards,

Rose D.

Rose Delamare

unread,

Oct 24, 2025, 3:53:08 AMOct 24

to distance-sampling

Thank you Eric for your insights.

Indeed my "weird" model is hazard-rate, but ins't it supposed to handle best the spike in short distances ?

I can't really understand why 2024-25 observations are spread so unevenly when observations from 2006-07 have a much wider shoulder.

I wonder if it's simply because I lack of observations, and therefore the distribution is impacted.

I watched the course you suggested, but I can't find any satisfactory solution to fit my data. I tried to binned it, but it only increased the estimation with hazard-rate function.

As you said, the uniform and half-normal models can't really handle the spike, and suggest detection probability of 1.5 between 0 and 10m ...

Is it enough to eliminates those models ?

And all of them seem robust enough, they all pass the goodness of fit tests (hn, hr and unit, with or without binned data).

Would you have any other suggestions for me ?

Thank you again.

Regards,

Rose D.

Le mercredi 22 octobre 2025 à 09:09:48 UTC+2, Eric Rexstad a écrit :

Eric Rexstad

unread,

Oct 24, 2025, 5:10:56 AMOct 24

to Rose Delamare, distance-sampling

Thanks for resending your message Rose. Your first attempt did eventually arrive on the list, but it was slow in arriving.

Nevertheless, back to your question.

The hazard rate model indeed has the flexibility to fit a variety of shapes. That is not the issue, the issue is the distribution of the detection distances themselves. I am not in a position to know the reason for the narrowness of the shoulder in the 24/25 data.

If the half normal and uniform-cosine models adequately fit the data, I would use one of them as the basis for your inference. You can explain that the hazard rate model, although being preferred by model selection, produces implausibly (I presume) rapid decline with distance in detection probability.

Below are a few examples of data with spikes at small distances that are not representative of the detection process (but instead bow riding by cetaceans)

Note the authors of this study made inference from models that did not attempt to fit the spike at small distances:

selected the half-normal detection function with no adjustments. Note that this was not the model with the lowest AIC (which was the hazard rate model), but the half-normal model was used to avoid fitting the spike at zero distance, which was believed to be an artefact of responsive movement towards the boat

Williams, R., & Thomas, L. (2007). Distribution and abundance of marine mammals in the coastal waters of British Columbia, Canada. Journal of Cetacean Research and Management, 9, 15–28. https://doi.org/10.47536/jcrm.v9i1.688

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Rose Delamare <delam...@gmail.com>

Sent: 23 October 2025 14:43

To view this discussion visit https://groups.google.com/d/msgid/distance-sampling/704cf9c0-b546-4e26-b68e-7f3a122d1214n%40googlegroups.com.

Reply all

Reply to author

Forward