R packages for fitting distance sampling functions

132 views
Skip to first unread message

Maik Henrich

unread,
Feb 19, 2021, 1:06:30 PM2/19/21
to distance-sampling
Dear everyone,

I have been using two R packages to fit Distance Sampling functions to the distribution of measured distances. I used "Rdistance", because it is very convinient to use the function effectiveDistance() for obtaining  detection radii of camera traps, which are needed to compute population density estimates with the RandomEncounter Model. I used non-parametric bootstrapping to obtain confidence intervals for the detection radii. On the other hand, the package does not show standard errors for the probability of detection and I also did not find a possbility to extract the detection probability similarly to the effective distance.
For this reason I started using the "Distance"-R package, since it shows the standard errors of the detection probabilities. It is also nice that model selection with regards to expansion terms is done automatically.

Now I have the problem when I want to look at the variation of detection radii and detection probabilities over the course of the year that in five of twelve months model selection with the two packages (half-normal, hazard rate, possible cosine expansions) delivers different results.

Bild1.png

I would very  much appreciate to hear your thoughts on this and I am very thankful for all suggestions of possible solutions.

All the best,
Maik





Eric Rexstad

unread,
Feb 20, 2021, 5:48:21 AM2/20/21
to Maik Henrich, distance-sampling

Maik

There are lots of topics packed into this message.  There seems to be two main issues:

  • you want a particular parameter estimate (effective detection radius) to use with Rowcliffe's REM abundance estimate for camera traps and
  • selected monthly detection function models differ between software used to fit detection functions.

I'll address the second question first.  Modelling detection functions involve many decisions, some objective and some subjective.  There are many reasons that different detection functions might be chosen for a given set of data.  Those reasons include choices the analyst (you) made: same truncation distance for both analyses.  Differences in the algorithm used by the software to produce the maximum likelihood estimates can produce different selected models.

There are several items you do not report in your comparison table: what is the magnitude of delta AIC scores in the comparisons.  Recognise that choice of preferred detection function for a data set is not an automated process: AIC is a tool to aide in model selection, you need to bring your biological insight to the process as well.

If the data are "well behaved", there is unlikely to be a demonstrable difference in EDR estimates between models that are close in AIC score.  This is particularly true if you take into account the uncertainty in estimated EDR.  Consequently, the impact of the ambiguity in model choice upon your estimates of population abundance I suspect is small relative to the uncertainty in the abundance estimate.

Regarding performing analyses twice with different R packages.  That seems to be extra work resulting in the situation in which you find yourself.  Recently, I wrote a function that computes EDR and its confidence interval for use with detection function objects produced by the Distance package.  Perhaps this will address the first of your issues:

https://github.com/DistanceDevelopment/mrds/issues/36#issuecomment-753462999

Let me know if this function does what you want it to do.

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/3553f763-30fb-4661-be69-4bf4111f956bn%40googlegroups.com.
-- 
Eric Rexstad
Centre for Ecological and Environmental Modelling
University of St Andrews
St Andrews is a charity registered in Scotland SC013532

Maik Henrich

unread,
Feb 22, 2021, 10:06:36 AM2/22/21
to distance-sampling
Dear Eric,

thank you very much, this function is great (and exactly what I wanted)!

For the 95 % confidence interval of the detection probability itself, would it be done in the same way as for the EDR ?
se.log.p_a<- sqrt(log(1 + (cv.p_a)^2))
c.mult <- exp(t.crit * se.log.p_a)
ci.p_a <- c(p_a / c.mult, p_a * c.mult)

The next question for me is how to integrate the detection probability and detection radius estimates into the bootstrapping of the population density estimates. Sampling from the confidence interval using rmorm(1,estimate, 1.96 *SE) for each bootstrap iteration is probably not the best approach. Is there an option to do something similar just based on the standard error or would it be best to do non-parametric bootstrapping of the measured distances and to reapeat fitting the detection function for each bootstrapping iteration?

All the best,
Maik

Eric Rexstad

unread,
Feb 22, 2021, 11:38:21 AM2/22/21
to Maik Henrich, distance-sampling

Maik

Because P_a is bounded by zero, it is plausible that the confidence interval bounds could be computed in the manner you describe.  However, as I noted in my reply to you on 15Feb21, I don't think you would use the confidence interval bounds as the range from which to perform a parametric bootstrap

I do not know the mechanics of the REM computations; so I don't know how many components of uncertainty are involved in deriving the uncertainty for your abundance estimates.  Consider the suggestions Prof Buckland made in his email of 15Feb21 regarding non-parametric bootstrapping approaches you might consider.

Eric Howe

unread,
Feb 24, 2021, 11:46:20 AM2/24/21
to distance-sampling
Good day Maik,

Following up on some of Eric's comments regarding model selection, I recommend against allowing automatic selection of expansion terms with camera trapping data. If we record distance to the same animal more than once during an independent encounter with a CT, or if we record distances to multiple animals within a group, we violate the independence assumption in which situation AIC is likely to select an overly complex model. Better to fit models with different numbers of expansion terms (including none) separately.

We proposed an alternative model selection criterion and procedure for CTDS data (link below), but it's still important to view fitted detection functions and probability density functions, etc., i.e., to not rely exclusively on model selection criteria. 


Eric H

Maik Henrich

unread,
Feb 26, 2021, 9:25:05 AM2/26/21
to distance-sampling
Dear Eric Howe, dear Eric Rexstad,

Thank you both for you comments!
In our dataset, distances to an animal were measured only once for each  independent observation, so the independence assumption should not be violated.
I am still wondering about the different results that I get with "Rdistance" or "Distance", although I am using the same data, the same truncation distance and the same function. For the November dataset for example, the delta AIC for the half-normal function with two cosine adjustment terms between "Rdistance" and "Distance" is 2.13.
The "Rdistance" results look better to me, but of course that is a very subjective judgement.

Are there other objective criteria that could bee used together with the AIC and how much weight should they be given in relation to each other?

All the best,
Maik

Stephen Buckland

unread,
Feb 26, 2021, 9:34:25 AM2/26/21
to Maik Henrich, distance-sampling

If you’re using camera traps and taking just one distance for each detected animal, you will biased estimation whatever software you are using.  The reason for this bias is given in

Howe, E.J., Buckland, S.T., Després-Einspenner, M.-L. and Kühl, H.S. 2017.  Distance sampling with camera traps.  Methods in Ecology and Evolution 8, 1558-1565.

 

If you fit the same model with the same constraints (if any) on parameters and the same truncation distance in two different software packages, then you need to check that both packages use exactly the same form of AIC, with any constant terms in the likelihood treated identically.  If that is the case, then the most likely reason for a difference is if one analysis has not converged, in which case the analysis with the smaller AIC should be the better.

 

Steve Buckland

--

You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.

Len Thomas

unread,
Feb 26, 2021, 9:53:04 AM2/26/21
to distance...@googlegroups.com
In addition to what Steve said, two subtle issues with adjustment terms
are (a) monotonicity and (b) scaling.

For monotonicity, it seems reasonable to constrain the fit so the
detection probability does not go up with increasing distance. This is
set with the monotonicity argument of the ds() function in Distance, and
is on by default when you don't have covariates. I have not used
Rdistance, but on first glance at the help file it appears not to have
the ability to enforce this constraint. So, if your data has any
"bumps" in detection frequency with increasing distance, you'll get a
different fit with the two packages.

For scaling, the way the adjustment term functions are affected by
distance depends on how the distances are scaled. This only matters
when you have covariates so is likely not an issue here (judging from
what you wrote). One explanation and illustration of this is in the
following paper.

Marques, T.A., L. Thomas, S.G. Fancy and S.T. Buckland. 2007. Improving
estimates of bird density using multiple covariate distance sampling.
The Auk 127: 1229-1243. https://doi.org/10.1093/auk/124.4.1229

Cheers, Len

On 26/02/2021 14:34, 'Stephen Buckland' via distance-sampling wrote:
> If you’re using camera traps and taking just one distance for each
> detected animal, you will biased estimation whatever software you are
> using.  The reason for this bias is given in
>
> Howe, E.J., Buckland, S.T., Després-Einspenner, M.-L. and Kühl, H.S.
> 2017.  Distance sampling with camera traps. /Methods in Ecology and
> Evolution/ *8*, 1558-1565.
>
> If you fit the same model with the same constraints (if any) on
> parameters and the same truncation distance in two different software
> packages, then you need to check that both packages use exactly the same
> form of AIC, with any constant terms in the likelihood treated
> identically.  If that is the case, then the most likely reason for a
> difference is if one analysis has not converged, in which case the
> analysis with the smaller AIC should be the better.
>
> Steve Buckland
>
> *From:*distance...@googlegroups.com
> [mailto:distance...@googlegroups.com] *On Behalf Of *Maik Henrich
> *Sent:* 26 February 2021 14:25
> *To:* distance-sampling <distance...@googlegroups.com>
> *Subject:* {Suspected Spam} [distance-sampling] Re: R packages for
> fitting distance sampling functions
>
> Dear Eric Howe, dear Eric Rexstad,
>
> Thank you both for you comments!
>
> In our dataset, distances to an animal were measured only once for each
> independent observation, so the independence assumption should not be
> violated.
>
> I am still wondering about the different results that I get with
> "Rdistance" or "Distance", although I am using the same data, the same
> truncation distance and the same function. For the November dataset for
> example, the delta AIC for the half-normal function with two cosine
> adjustment terms between "Rdistance" and "Distance" is 2.13.
>
> The "Rdistance" results look better to me, but of course that is a very
> subjective judgement.
>
> Are there other objective criteria that could bee used together with the
> AIC and how much weight should they be given in relation to each other?
>
> All the best,
>
> Maik
>
> eric...@gmail.com <mailto:eric...@gmail.com> schrieb am Mittwoch, 24.
> Bild1.png
>
> I would very  much appreciate to hear your thoughts on this and
> I am very thankful for all suggestions of possible solutions.
>
> All the best,
>
> Maik
>
> --
> You received this message because you are subscribed to the Google
> Groups "distance-sampling" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to distance-sampl...@googlegroups.com
> <mailto:distance-sampl...@googlegroups.com>.
> <https://groups.google.com/d/msgid/distance-sampling/262cc61f-1aaf-4c75-accc-18a80f8dcd9en%40googlegroups.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "distance-sampling" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to distance-sampl...@googlegroups.com
> <mailto:distance-sampl...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/distance-sampling/AM8PR06MB758552205A1A2FC4BB4DF8ACC59D9%40AM8PR06MB7585.eurprd06.prod.outlook.com
> <https://groups.google.com/d/msgid/distance-sampling/AM8PR06MB758552205A1A2FC4BB4DF8ACC59D9%40AM8PR06MB7585.eurprd06.prod.outlook.com?utm_medium=email&utm_source=footer>.

--
Len Thomas len.t...@st-andrews.ac.uk lenthomas.org @len_thom
Centre for Research into Ecological and Environmental Modelling
and School of Mathematics and Statistics
The Observatory, University of St Andrews, Scotland KY16 9LZ
Office: UK+1334-461801 Admin: UK+1334-461842

While I may be sending this email outside of my normal office hours,
I have no expectation to receive a reply outside of yours.

The University of St Andrews is a charity
registered in Scotland, No SC013532.

Maik Henrich

unread,
Mar 4, 2021, 6:40:02 AM3/4/21
to distance-sampling
Dear Len, dear Steve, dear Eric and everyone,

I have thought about the issue with the potential bias due to having only one distance measurement for each indepenent observation. I have read the paper before already and I also talked with Hjalmar Kühl, but we still could not figure out how excactly the bias arises. In the beginning when I was mainly aiming at the effective detection radius for the REM, it seemed logical to me to use the first photo of each observation for determining the outer bound of the detection zone. In additon, we do not have the resources to do a large amount of additional distance measurements.
This  is how the distribution of the corrected measured  red deer distances  looks like, as it is given by Distance:

Rdistance red deer all.jpeg
Is  there any possibility to check if there is a problem with bias, and if yes, also another possible solution?

Secondly, I also wondered if it would make sense to repeat the model selection procedure for each iteration of a non-parametric bootstrap  with the aim of obtaining a vector of  possible values for the detection probability/ radius. At the moment I only do the model selection for the initial dataset, but I would be very much interested to hear your opinion on this.

Regarding the different R packages, I found out now that I even get different results when fitting exactly the same function with the same truncation distance and expansion terms to the same dataset. For the red deer measurements in October and a hazard rate function with two cosine expansion terms, for example, Distance gives me a detection probability of 0.166 and Rdistance gives me a detection probability of 0.298.

Detfunc_red_deer_october_Rdistance<-dfuncEstim(formula = dist ~ 1, detectionData = Red_deer_October, pointSurvey = TRUE, likelihood = "hazrate",expansions=c(2),series="cosine",w.hi=14)
Detfunc2_red_deer_October_Distance<-ds(Red_deer_October, truncation=14, transect="point",formula=~1, key="hr", adjustment = "cos",order=2)

Although I agreed with my supervisor to just use Distance and to ignore the different results of Rdistance, it would still be good to find out how this happens and that everything is correct.

All the best and thanks in advance,

Maik

Stephen Buckland

unread,
Mar 4, 2021, 8:00:14 AM3/4/21
to Maik Henrich, distance-sampling

Maik, this does indeed show bias.  You have a large spike of detections at very short distance.  (The spike isn’t as bad as it appears, as the pdf plot is the one to look at for possible poor model fit.  The histograms bars at short distances get scaled up when plotting the detection function fit.)  Animals that approach the camera from behind will first be detected at very short distances as they pass the camera.  This bias probably is a result of you taking just one measurement per animal.

 

With a spike at zero, you have to be careful with the hazard-rate model.  It can fit very large spikes even when those spikes are a result of biased recording (as is likely here), or a result of attraction to the camera.  It may well be that Distance and RDistance set different lower bounds for the hazard-rate shape parameter, and this will result in very different estimates when that lower bound is reached – as happens when data are spiked like yours.

 

You can go through model selection for each bootstrap resample – it means that your precision estimates incorporate model uncertainty.  However, given the artificial spike in your data, I would not include the hazard-rate model in your set of possible models.  It is too good at fitting spikes, and giving you a detection function that falls improbably fast with distance from the camera.

 

Steve

 

From: distance...@googlegroups.com [mailto:distance...@googlegroups.com] On Behalf Of Maik Henrich


Sent: 04 March 2021 11:40
To: distance-sampling <distance...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "distance-sampling" group.

To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/136a5be5-e785-45e0-81e7-06f8eadd87d9n%40googlegroups.com.

Maik Henrich

unread,
Mar 4, 2021, 8:43:36 AM3/4/21
to distance-sampling
Dear Steve,

thanks a lot for this very helpful answer!
Would it be a valid approach then to use the current set of measurements that we have and to restrict the model selection to half-normal models (without expansions and with two cosine expansions)?
It would be great to have such an easy solution, I just want to make sure that not all the reviewers of such a study would say that using only one measurement per detection would render the fitted detection functions totally unusable.
I am really grateful for the support from all of you.

All the best,
Maik

Stephen Buckland

unread,
Mar 4, 2021, 9:37:07 AM3/4/21
to Maik Henrich, distance-sampling

I think that would reduce the bias.  Difficult to say how much bias might remain.

Maik Henrich

unread,
Mar 8, 2021, 12:42:49 PM3/8/21
to distance-sampling
Dear Steve, dear everyone,

for my whole red deer dataset, everything worked very well now. It was also important to bin the distances between 0 and 2 m, because distances closer than 1 m could not be properly estimated.

Red deer Distances fit all.jpeg

For the non-parametric bootstrapping I thought it would make sense to include only iterations with models that fitted the data well , as indicated by a non-significant Chi² test.

However,  with that pre-condition I cannot even fit a model to my whole roe deer data set:

Roe deer distances fit all.jpeg

Here the p-value of the GOF test is 0.001 and everything that I tried did not help to obtain a non-significant value.

Within the red deer dataset,  fitting detection functions for spring (Mar-May), summer (Jun-Aug) and winter (Dec-Feb) works, only autumn (Sep-Nov) also causes problems despite a large sample size of over 800 measurements:
Red deer Distances fit autumn.jpeg

Are there any more options that I could try to make the detection functions fit well enough to the data? How can I deal with these cases?
Any input is again highly appreciated!

Thanky you and all the best,

Maik

Stephen Buckland

unread,
Mar 8, 2021, 1:06:30 PM3/8/21
to Maik Henrich, distance-sampling

Maik, you should not reject bootstrap replicates with a poor fit.  You have a large sample size.  Small departures from the true model will therefore translate to significant goodness-of-fit tests.  Here, the apparent lack of fit is probably a result of small rounding errors in your distances, and are not anything to worry about.  Part of the problem may be that you have selected interval endpoints that equate to a whole number of metres.  For setting intervals for goodness-of-fit tests, you need to avoid values that correspond to favoured rounding distances.

Reply all
Reply to author
Forward
0 new messages