Understanding differences between detection functions

Chris

unread,

Jun 21, 2016, 4:46:28 AM6/21/16

to distance-sampling

Dear list,

I have stumbled at a simple hurdle while attempting to estimate and select a suitable detection function for my line transect data (CDS, via R).

The figure below shows detection functions (solid black line) and observed distances (histogram) for our aerial survey (binned data, left truncated).

The half-normal detection function (left) is much preferred by AIC relative to the hazard rate detection function (right).

I wish to understand two aspects better:

1. To me, the fitted line of the hazard rate model seems to ‘track’ the histogram much better than the fitted line of the half-normal model. Yet this model receives no AIC support at all, relative to the half-normal model. Why may this be?

2. The scale of the histograms differ between the two figures. In particular, should I be concerned that the half-normal histograms do not reach a detection probability of 1 in the first bin (g(0))?

Thanks in advance for any feedback, and I hope my ignorance don't offend anyone!

Chris

Tiago Marques

unread,

Jun 21, 2016, 7:02:08 AM6/21/16

to distance...@googlegroups.com

Hi Chris, list

I'll start by your second question, which is easier to address. You should not be worried about the fact that the "scale" of the histograms is different. The y-axis on the plot should be used only to read the detection probability as given by the fitted line, so by definition the line intercepts the y axis at one (since we assume g(0)=1 in conventional distance sampling). The histogram bars are simply rescaled such that the area above the line is the same as the area below the line. Something else that often raises questions (e.g. I have been explicitly asked by a reviewer about this, and it might look odd to folks not familiar with distance sampling data) is the fact that in your right plot one of the bars is above 1, yet a probability is bounded to be in the (0,1) interval. Again, this is not something to worry about, because the bars heights can't be interpreted on that y-axis scale.

Regarding your first question, sometimes a question just raises more questions ;) it is a bit difficult to know what might be going on without looking at the data, especially since you are using left truncation, which can be raise additional difficulties since to some extent one is asking the software to extrapolate a function to where there's no data. You do not really say much about sample size and the AIC values you are looking at. I know this is not your question, but why are you using left truncation? Is that due to the blind spot under the helicopter/airplane? I note that the HR model seems to fit worse in the tail, but in general it is not with the tail that you should be worried. I would say that the HN model seems to overshoot in an unexpected way, so might lead to an overestimation of density, and it being based on extrapolation of a fit in an area where you have no data, that might be questionable. As a side note, with not left truncated unbinned data often the opposite happens, with the HN model tending to average across spikes at small distances and the HR model leading to overestimation. How different are your density estimates from the two models? What are your sample sizes? Have you considered adjustment terms? Have you considered to just subtract 25 meters (or whatever distance you are left truncation the data at) to all the distances and fit a detection function to the data? I wonder if in such a case you will still get the HN model as the best fit, which if you do not might hint to problems with the current left truncation considered.

hope this helps, maybe others have additional comments re your first point

cheers

Tiago

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To post to this group, send email to distance...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/7148b90b-b6a3-40b0-857a-12385772b7b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sem vírus. www.avast.com

Chris

unread,

Jun 21, 2016, 9:16:47 AM6/21/16

to distance-sampling

Hi Tiago,

Thanks for the helpful comments on both questions.

You are right that I used left truncation because observers could not see directly below the aircraft. You were also right in your prediction that the density estimates from the HN model would be higher than that of the HR model.

What I failed to mention in my post was that the models I compared were actually also taking observer into account (left and right hand side of the helicopter).

For example,

models$hr.null.obs <- ds(distdata, key="hr", adjustment=NULL, truncation=list(left=35.2,right=345.73),

cutpoints=dist.bins, convert.units = 0.001, formula=~as.factor(Observer))

Having a closer look at the parameter estimates, my naïve guess is that the HR model struggles to converge (although no warning is given) as the standard error of the “observer” parameter equals 203. If one compare this model to the HN model with observer as a covariate (SE = 0.108), the difference in AIC is 25 units.

The problem with ‘observer’ ''disappears'' if I add ‘size’ (cluster size) to the HR model as an additional covariate (observer SE = 0.08, size SE = 0.07). This model is then also better ranked than the corresponding HN model with additive observer and size effects(~ 5 units).

The sample size is 662 observations. I have not looked what happens to the HR model with observer if the truncation distance is subtracted from all the distances, but this new detection function appears to work fine.

Thanks again!

Chris

Eric Rexstad

unread,

Jun 21, 2016, 10:56:50 AM6/21/16

to Chris, distance-sampling

Chris

Thanks for the additional detail.

Your keen observation that you have non-convergence with the model with an SE=203 is undoubtedly correct. If the model has not converged, then the likelihood reported for the model is wrong and consequently the AIC value is wrong and not to be believed.

I cannot make a general statement that applies to your situation, but the optimisation ability of the 'ds()' function in R does sometimes struggle to converge, particularly when incorporating covariate and most specifically when those covariates are factors rather than continuous.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/7d461345-4c82-4934-a840-67ecc58b03d9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

-- 
Eric Rexstad
Research Unit for Wildlife Population Assessment
Centre for Research into Ecological and Environmental Modelling
University of St. Andrews
St. Andrews Scotland KY16 9LZ
+44 (0)1334 461833
The University of St Andrews is a charity registered in Scotland : No SC013532

Reply all

Reply to author

Forward