Very inflated density estimates when using covariates but not on the null model

31 views
Skip to first unread message

Ben Apsley

unread,
Oct 12, 2025, 6:01:24 PMOct 12
to distance-sampling
I am trying to obtain density estimates for a large number of point survey sites for certain bird species. My hope is to obtain one density estimate per site in order to compare the amount of expected birds between each site. Each site has 1 or 2 visits, where point counts were performed with distance, size, and covariates noted.

I am using the ds() function to obtain a distance model using CDS and MCDS loops. I also created a null model with no adjustments or covariates.  Region.Label is the identifier for the individual survey points. I also included survey points with 0 observations of the species of interest and an Effort column that reflects the amount of visits each point received (1 or 2). The area column is set to 0, since I'm looking at density rather than abundance for now.

The best model by far, based on AIC, is a hazard-rate model using group size as the sole covariate (AIC = 1771.35). However, when looking at the density estimates I realized that they are extremely high compared to literature values (by at least one order of magnitude). 

Furthermore, I realized that the null model with no adjustments or covariates had much more accurate density estimates, even though it is a worse model based on AIC and goodness-of-fit (AIC = 2004.737).

I'm wondering if someone could give me some guidance as to why the estimates are so inflated and if there is anything I can do to fix this. The fact that they're so different makes me think I'm doing something wrong when introducing the covariates but I'm not seeing where. 

Here are the two ds() models I am using:

null:
m.null.hn <- ds(data = d,
                formula = ~1,
                transect = "point",
                key = "hn",
                order = 0,
                er_var = "P3",
                truncation = "15%",
                convert_units = conversion.factor)


covariate model:
mbest <- ds(data = d,
            formula = ~as.factor(size.factor),
            transect = "point",
            key = "hr",
            order = 0,
            er_var = "P3",
            truncation = "15%,
            convert_units = conversion.factor)



conversion.factor = 0.01 (since distance is measured in meters and I want density by hectare)



Eric Rexstad

unread,
Oct 13, 2025, 3:43:52 AMOct 13
to Ben Apsley, distance-sampling
Ben

I think I've interpreted your situation, but want to double check: when you say you want an estimate "per site", I take that to mean you want an estimate for each station at which you gathered data; i.e. site=point, correct?

It is a bit difficult to diagnose what is happening without plots or output, but I have two potential suspects for your problem namely positive bias from your hazard rate model with group size covariate: 

  • hazard rate model might be fitting a spike in your data (too many detections at distance=0) causing detection probability estimate to be too small, leading to large estimates of density
  • convergence problems resulting from the introduction of the group size covariate. You'd need to look closely at the output of the estimated beta coefficients associated with the covariate parameters and their measure of precision to diagnose this.
You don't provide information regarding goodness of fit of the models; do they fit? What is the range of observed group sizes; what does a plot of observed group size (y-axis) versus detection distance (x-axis) look likethis is visual assessment for effect of group size on detectability. What is the result of a hazard rate model without the size covariate (might help determine whether difficulties are with the covariate or with inherent nature of hazard rate key function).

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Ben Apsley <bap...@gmail.com>
Sent: 12 October 2025 23:01
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] Very inflated density estimates when using covariates but not on the null model
 
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/distance-sampling/e47cda28-9eee-4c98-94d6-2b64cc0b7a03n%40googlegroups.com.

Ben Apsley

unread,
Oct 14, 2025, 6:22:17 PMOct 14
to distance-sampling
Hi Eric,

Thanks for the quick response. You are correct- I am hoping to obtain abundance estimates for each station where data was collected.

I think you are right about using the hazard-rate model, since it seems to spike fairly harshly near 0, with detection probability going above 1. I've changed my model selection approach using QAIC and gof_ds to select the best fitting model. It seems like the best fit is to still use size as a covariate, but to use a half-normal distribution instead. I also added a slight left truncation since there is an absence of very short distances, which I suspected may have been causing issues. I've attached the plot and goodness of fit for this model.

Goodness of fit results for ddf object Distance sampling Cramer-von Mises test (unweighted) Test statistic = 0.259254 p-value = 0.176929

That being said, I'm still having issues with inflated density estimates. I suspect it is an issue with dht2(), since the estimates produced by ds() do seem to be accurate and in line with the literature. It's just when I use dht2() to obtain density estimates that they are extremely inflated. I thought it might be a problem with units but everything seems to be input correctly. I've attached the dht2() output file- for reference, we expect the density to be around ~10 individuals per ha

Here is the 
ds() model and subsequent dht2() code:

mbest <-  ds(data = d,
             truncation = list(left="1%", right="15%"),
             formula = ~as.factor(size),
             transect = "point",
             key = "hn",
             monotonicity = "none",
             convert_units = conversion.factor)

mbest_abunds <- dht2(mbest,
                     flatfile = d,
                     stratification = "geographical",
                     strat_formula = ~Region.Label,
                     convert_units = conversion.factor)
mbest_gof.pngmbest_plot.png
dht2_abundances_inflated.csv

Eric Rexstad

unread,
Oct 15, 2025, 3:16:03 AMOct 15
to Ben Apsley, distance-sampling
Ben

Thanks for the follow-up. There's a lot going on here. Is this point transect data? Is it camera trap data? 

  • I don't understand how your data can have "an absence of very short distances" as well as "a harsh spike near 0". Before resorting to left truncation, you should understand what is causing the situation with your data. Perhaps some more extensive plotting of histograms with small bin widths may be helpful.
  • Based on first principles, you should be using the plot of the probability density function to visually assess fit.
  • detections with a spike at 0 are very difficult to model, as you have seen, but you are still producing an acceptable CvM goodness of fit with your half normal model
  • I don't know why you are invoking dht2. If you have labelled your different points with different Region.Label identifiers; ds will spit out point-specific estimates of density without using dht2.

Sent: 14 October 2025 23:22
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] Very inflated density estimates when using covariates but not on the null model
 
Reply all
Reply to author
Forward
0 new messages