Deriving estimates manually based of clustered observations

Don Carlos

unread,

Apr 23, 2021, 4:47:30 AM4/23/21

to distance-sampling

Dear Eric et al.,

I am trying to understand some of the math in deriving the estimates based on surveys with cluster size variable. I have tried to dig into the source code of Distance/mrds, but cannot produce equivalent outputs. Any pointer and help hugely appreciated. I have to apologise in advance for any basic math mistakes I must have made...

Using the mink ClusterExercise from Distance as a reproducible example.

library(Distance)
library(tidyverse)
options(digits = 7)
data(ClusterExercise)
mod.cs <- ds(ClusterExercise)

Question 1: How is the Expected.S derived?

In the link below your response indicates that this is derived from the average observed group size in the sample.
https://groups.google.com/g/distance-sampling/c/2W7DJtaB82Q/m/eg3f1BzNBAAJ

However, although that seems to be the case of the strata specific estimates, it does not seem to be the case for the total mean - as given in the output in ds.summary().

CALCULATED.Expected.cluster.size <- ClusterExercise %>%
group_by(Region.Label) %>%
summarise(mean=mean(size, na.rm=T)) %>%
as.data.frame() %>%
pull(mean)

DS.Expected.cluster.size <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][-3]
all.equal(CALCULATED.Expected.cluster.size, DS.Expected.cluster.size) # TRUE

Calculating the overall survey wide mean cluster size

CALCULATED.TOTAL.Expected.cluster.size.total <- ClusterExercise %>%
summarise(mean=mean(size, na.rm=T)) %>%
as.data.frame() %>%
pull(mean)

DS.TOTAL.Expected.cluster.size <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][3]
all.equal(CALCULATED.TOTAL.Expected.cluster.size.total, DS.TOTAL.Expected.cluster.size) # FALSE

Not equivalent, and there is a difference of 0.05 in mean. Guess this is a relatively small number, but as it is also happening in my actual analysis I would like to know if there is some weighting of means going on?

Question 2: Getting the SE and CV for the Expected.S

Following the above attempt to derive Expected.S I have tried to estimate its SE and CV, which do not add up to what Distance provides me, based on the following steps, and referencing the following link:
https://groups.google.com/g/distance-sampling/c/QYQ4Oysf9JY/m/aEkV6_JpAwAJ

# using outputs from ds

size.vct <- mod.cs[["ddf"]][["data"]][["size"]]
n.clst <- length(size.vct)

var.s <- var(size.vct, na.rm=T)
se.s <- sqrt(var.s/n.clst)
cv.s <- se.s/mean(size.vct) # mean not equivilant to ds Expected.S as per Q1
cv.s

I get cv.s = 0.1013096 and ds outputs cv.s = 0.2072411

Question 3: Deriving final estimates

# pulling the results from the mod.cs, I get:
average.p <- summary(mod.cs)[["ds"]][["average.p"]]
Expected.S <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][3] # mean group size
effort <- summary(mod.cs)[["dht"]][["clusters"]][["summary"]][["Effort"]][[3]] #total effort
truncation <- summary(mod.cs)[["ddf"]][["meta.data"]][["int.range"]][2]

# and estimating D.hat, based on clustered group size, I get outputs which are not equivalent to the ds outputs:

# Using D = n * f(0) * E(s) / 2L

D.hat <- n.clst *(1/average.p)*Expected.S/((2*truncation)*effort)
summary(mod.cs)

I get D.hat = 0.06556279 and ds gets D.hat = 0.05723343, similar, but not equivalent.

cv.er <- 0.2416576
cv.pa <- 0.07679084
cv.s <- 0.2072411
ds.D.hat <- 0.05723343

se.D <- ds.D.hat * sqrt(cv.er^2 + cv.pa^2 + cv.s^2)
cv.D <- se.D / ds.D.hat

I get cv.D = 0.3274815 and ds gets cv.D = 0.3682045

Eric Rexstad

unread,

Apr 25, 2021, 10:24:54 AM4/25/21

to Don Carlos, distance-sampling

Don Carlos

Detailed question and supporting calculations regarding computations involving animals occurring in groups. The fundamental challenge with your work is the choice of data set. Indeed this data set does include animals (minke whales) that occur in groups, but with the added complication that the survey employed stratification. Hence the computations include not only details involved in group size, but also details involved in stratified estimates. I think it is the later that is the cause of the discrepancies you describe.

Sprinkled below are some modifications to your calculations along with some narrative and supporting documentation

library(Distance)
data(ClusterExercise)
mod.cs <- ds(ClusterExercise)
thesummary <- summary(mod.cs)

Expected.S <- thesummary[["dht"]][["Expected.S"]][["Expected.S"]][3] # mean group size
effort <- thesummary[["dht"]][["clusters"]][["summary"]][["Effort"]] #effort
abundance.indiv <- thesummary[["dht"]][["individuals"]][["N"]][["Estimate"]] #abund indiv
abundance.groups <- thesummary[["dht"]][["clusters"]][["N"]][["Estimate"]] #abund group

Question 1 (expected group size)

For the study area, expected group size is the ratio of estimated individual abundance to estimated group abundance:

expected.group.size.studyarea <- abundance.indiv / abundance.groups
print(expected.group.size.studyarea)
print(thesummary$dht$Expected.S)

Question 2 (precision of estimated expected group size)

From comments in the function dht.se in the mrds package:

This computes the se(E(s)). It essentially uses 3.37 from Buckland et al. (2004) but in place of using 3.25, 3.34 and 3.38, it uses 3.27, 3.35 and an equivalent cov replacement term for 3.38. This uses line to line variability whereas the other formula measure the variance of E(s) within the lines and it goes to zero as p approaches 1.

Here is Eqn. 3.37

Question 3 (density in study area)

Because the study area was stratified, the overall density is a weighted mean of stratum-specific densities, with the weighting factor being the sizes of the respective strata. See Buckland et al. (2001) Sect 3.7.1, Eqn 3.122 or equivalently (below) Buckland et al. (1993), Sect 3.8.1

area <- thesummary[["dht"]][["clusters"]][["summary"]][["Area"]] #area
density <- thesummary[["dht"]][["individuals"]][["D"]][["Estimate"]] #density

density.studyarea <- (area[1]/area[3] * density[1]) + (area[2]/area[3] * density[2])
print(density.studyarea)
print(density[3])

Question 3b (precision of overall density estimate)

Variance (or other measures of precision) of the density estimate from a stratified survey is described in Section 3.7.1 of Buckland et al. (2001) detailed in Eqns. 3.123 through 3.126; equivalently Buckland et al. (1993), Sect. 3.8.1 (below)

Perhaps others can provide more detailed answers to your questions.

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/CAFTQVoD%3D22CNpA5FB3s2tbywkbdp9zzw0MKrGgVK3ykz-_ZS9g%40mail.gmail.com.

-- 
Eric Rexstad
Centre for Ecological and Environmental Modelling
University of St Andrews
St Andrews is a charity registered in Scotland SC013532

Jamie McKaughan

unread,

May 20, 2021, 6:03:40 AM5/20/21

to distance-sampling

Hi team,

I was hoping you might be able to help me on a similar scenario. I have used three camera grids in one survey area to increase the number of survey locations without needing more cameras. This has given me three strata that I have fitted different detection functions for, and whose combined AIC is better than a common detection function. I have used the above directions to estimate an overall density using the weighted mean of stratum-specific densities (my final subset of code below).

Eff1 <- bh1.90.hn0$dht$individuals$summary$Effort

Eff2 <- bh2.90.hr0$dht$individuals$summary$Effort

Eff3 <- bh3.90.hr0$dht$individuals$summary$Effort

TotalArea <- sum(Eff1, Eff2, Eff3)

Dens1 <- bh1.90.hn0$dht$individuals$D$Estimate

Dens2 <- bh2.90.hr0$dht$individuals$D$Estimate

Dens3 <- bh3.90.hr0$dht$individuals$D$Estimate

density.studyarea <- (Eff1/TotalArea * Dens1) + (Eff2/TotalArea * Dens2) + (Eff3/TotalArea * Dens3)

print(density.studyarea)

To calculate the other elements that are present in the normal summary can I just substitute the density for the respective criteria I want to estimate (e.g. se or %cv) into the above code?

e.g.

se1 <- bh1.90.hn0$dht$individuals$D$se

se2 <- bh2.90.hr0$dht$individuals$D$se

se3 <- bh3.90.hr0$dht$individuals$D$se

se.studyarea <- (Eff1/TotalArea * se1) + (Eff2/TotalArea * se2) + (Eff3/TotalArea * se3)

I also would like to bootstrap my estimates - is this simply a case of bootstrapping all three of my strata models and then using the combination method as above to establish the weighted bootstrap LCI and UCI for example?

I hope that makes sense.

Many thanks in advance,

Jamie

Eric Rexstad

unread,

May 21, 2021, 3:48:27 AM5/21/21

to Jamie McKaughan, distance-sampling

Jamie

I would approach your problem in a slightly different coding way. It is as if you've conducted multiple surveys in the same area. I would combine the three grids into a single data set and use "grid" as a covariate in the detection function model and also as you stratification criterion.

Using the function 'dht2' in the Distance package, it can perform the effort-weighted computations you are performing by hand. The function will also properly handle the computation of uncertainty. The scenario is depicted thus

For your second question about the bootstrap, the function `bootdht` in the Distance package can perform bootstrap computations, but it is not yet capable of bootstrapping complex situations such as yours.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/ef89a4de-a2c4-482f-8ad3-e63b2468d6e8n%40googlegroups.com.

Jamie McKaughan

unread,

May 21, 2021, 4:42:04 AM5/21/21

to distance-sampling

Hi Eric

Thanks for this - I will give it a go instead!

Many thanks

Jamie

Jamie McKaughan

unread,

May 25, 2021, 11:33:59 AM5/25/21

to distance-sampling

Hi Eric

Does using covariates restrict each covariate to have to use the same detection function? Is there a way to make R choose which is best for each? i.e. Grid1 might use HR, but Grid3 use HN, and provide effort-weighted computations accordingly? Or is that not good practice regardless?

Thanks

Jamie

Eric Rexstad

unread,

May 25, 2021, 11:49:48 AM5/25/21

to Jamie McKaughan, distance-sampling

Jamie

Use of "grid" as a covariate in the detection function will make the scale parameter (sigma) of a key function differ for each grid. Catch being that you must assume that all grids share the same key function (no mixing of hazard rate and half normals between grids).

You can contrast the grid-specific detection function model against a model without the grid covariate--that model then assumes the detection function is the same across all grids.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/8bf0127f-5058-4c2c-9eb9-758770792d9an%40googlegroups.com.

Jamie McKaughan

unread,

May 25, 2021, 11:55:10 AM5/25/21

to distance-sampling

Okay - cool.

Thanks

Jamie

Jamie McKaughan

unread,

May 26, 2021, 9:53:35 AM5/26/21

to distance-sampling

Hi Eric

I have run a couple of analyses this way, but in all cases the 'total' value does not provide se, cv or CI's of any use - I was expecting a weighted version from what you wrote before - is this normal or does this suggest an error in my code/data files?

Summary statistics:

Region.Label Area CoveredArea Effort n k ER se.ER cv.ER

Grid1 1 3583.315 3520385 234 19 0 0 0.228

Grid2 1 2768.840 2720213 276 19 0 0 0.460

Grid3 1 3458.497 3397759 452 21 0 0 0.450

Total 1 9810.652 9638357 962 59 0 0 0.256

Density estimates:

Region.Label Estimate se cv LCI UCI df

Grid1 0.2035 0.047 0.232 0.1262 0.3282 19.239

Grid2 0.3106 0.144 0.462 0.1233 0.7824 18.299

Grid3 0.4073 0.184 0.452 0.1657 1.0009 20.347

Total 0.3056 0.000 0.000 0.3056 0.3056 38.195

Component percentages of variance:

Region.Label Detection ER

Grid1 3.27 96.73

Grid2 0.82 99.18

Grid3 0.86 99.14

Total 2.62 97.38

Many thanks

Jamie

Eric Rexstad

unread,

May 26, 2021, 10:42:51 AM5/26/21

to Jamie McKaughan, distance-sampling

Jamie

Not sure what's happening here, not surprised you are concerned. Note that the first table of results (Summary statistics). I'm guessing the encounter rates are minute (because they are detections per snapshot). The cv.ER values in that first table seem plausible, so those zeros are likely just formatting problems.

But the line you have highlighted does not seem right--the upper and lower confidence interval bounds are equal, implying SE=0, which clearly isn't right. What code did you use to get this result?

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/d090f3fa-f0f5-4b5c-a9b9-ae9a594dd3bcn%40googlegroups.com.

Jamie McKaughan

unread,

May 27, 2021, 11:13:37 AM5/27/21

to distance-sampling

Hi Eric

Yes I think you are right regarding encounter rate. Table below shows the value a little better! Like you say the remaining elements are not plausible though.

I used the below to apply my model:

bh.90.hr0.Grid <- ds(bh.90, transect = "point", key="hr", adjustment = NULL,

cutpoints = mybreaks, truncation = trunc.list, formula = ~Region.Label)

And then applied the stratification as you showed above:

bh.90.hr0.Grid.dens <- dht2(bh.90.hr0.Grid, flatfile=bh.90, strat_formula = ~Region.Label,

er_est = "P2", convert_units = conversion, stratification = 'replicate')

Then produced the density report:

print(bh.90.hr0.Grid.dens, report="density")

Thanks,

Jamie

Jamie McKaughan

unread,

Jun 1, 2021, 6:31:13 AM6/1/21

to distance-sampling

Hi Eric,

I quit restarted R and reran all my code and unfortunately the same result occurs. Does my code look correct?

Many thanks

Jamie

Eric Rexstad

unread,

Jun 1, 2021, 7:29:29 AM6/1/21

to Jamie McKaughan, distance-sampling

Jamie

I haven't spotted problems with your code. If you want to send more details to me offline, I might be able to have a look tomorrow.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/778d40d2-4701-443a-a126-556948e16f11n%40googlegroups.com.

leob...@gmail.com

unread,

Sep 1, 2021, 12:59:39 PM9/1/21

to distance-sampling

Hi all,

I have a follow up question to Eric's response to Question 3b regarding the calculation of precision of the overall density estimate.

In a line transect, unstratified design, where objects are recorded as clusters, and data are grouped into distance class bins for analysis,

Would Buckland et al. 2001 pg. 52, eqn. 3.4 be the best documentation for what is going on in under the hood to produce the output of density precisions estimates?

So the citation would essentially be -- "variance was calculated using the delta method (Buckland et al. 2001 pg. 52, Seber 1982 pgs. 7-9)"?

Thanks,

Eric Rexstad

unread,

Sep 1, 2021, 1:34:29 PM9/1/21

to leob...@gmail.com, distance-sampling

Leo

For your scenario (no strata, animals in clusters, regardless of whether distances are binned), Eqn 3.4 of Buckland et al. (2001) holds. Eqn 6.19 of Buckland et al. (2015) is equivalent. Both equations show uncertainty propagating from three sources: variability in encounter rate, uncertainty in the parameters of the detection function and variability in mean group size. The three sources are combined via the delta method.

An alternative to an analytical approach to measuring precision is to employ bootstrap methods, described in Section 6.3.1.2 of Buckland et al. (2015).

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/e8c4625f-e515-4148-bb6b-c0b81bc2f69dn%40googlegroups.com.

Brian Leo

unread,

Sep 1, 2021, 3:34:02 PM9/1/21

to Eric Rexstad, distance-sampling

Thanks Eric, just to confirm -- those equations also hold when group size is added as a covariate?

I also had some discrepancies when I attempt to use the standard error generated from the package to manually calculate 95% confidence intervals i.e. Nhat+(SE*1.96), Nhat-(SE*1.96). The differences are small but I'm wondering what the source of the difference is?

Thanks again.

Rachel Fewster

unread,

Sep 1, 2021, 4:58:54 PM9/1/21

to Brian Leo, Eric Rexstad, distance-sampling

Hi Leo,

Is the discrepancy fixed if you shift to lognormal CIs for N ?

Normal CIs: est +/- 1.96 * SE

Lognormal CIs: est/C to est*C,

where C = exp(1.96 * sqrt(log(1 + SE^2 / est^2)))

Here, "est" is Nhat.

Eric will correct me if wrong, but I believe the calculations typically use lognormal intervals for N, and normal intervals for all other parameters.

There is good theoretical backing for this choice: e.g. Fewster & Jupp (Biometrika 2009) show that it is log(Nhat), rather than Nhat, that follows the usual asymptotic scale in these sorts of models. The other parameters follow the usual scale, so their CIs typically follow the usual calculation of +/- 1.96*SE.

Best wishes,
Rachel

--
Rachel Fewster (r.fe...@auckland.ac.nz)
Department of Statistics, University of Auckland,
Private Bag 92019, Auckland, New Zealand.
ph: 64 9 923 3946
https://www.stat.auckland.ac.nz/~fewster/

On Thu, 2 Sep 2021, Brian Leo wrote:

> Thanks Eric, just to confirm -- those equations also hold when group size
> is added as a covariate?
>
> I also had some discrepancies when I attempt to use the standard error
> generated from the package to manually calculate 95% confidence intervals
> i.e. Nhat+(SE*1.96), Nhat-(SE*1.96). The differences are small but I'm
> wondering what the source of the difference is?
>
> Thanks again.
>
> On Wed, Sep 1, 2021 at 7:34 AM Eric Rexstad <er...@st-andrews.ac.uk> wrote:
>
>> Leo
>>
>> For your scenario (no strata, animals in clusters, regardless of whether
>> distances are binned), Eqn 3.4 of Buckland et al. (2001) holds. Eqn 6.19
>> of Buckland et al. (2015) is equivalent. Both equations show uncertainty
>> propagating from three sources: variability in encounter rate, uncertainty
>> in the parameters of the detection function and variability in mean group
>> size. The three sources are combined via the delta method.
>>
>> An alternative to an analytical approach to measuring precision is to

>> employ bootstrap methods, described in Section http://6.3.1.2 of Buckland et al.

>> (2015).
>> On 01-09-2021 17:59, leob...@gmail.com wrote:
>>
>> Hi all,
>>

>> I have a follow up question to Eric's response to *Question 3b *regarding

>>> *Question 1 (expected group size)*

>>>
>>> For the study area, expected group size is the ratio of estimated
>>> individual abundance to estimated group abundance:
>>>
>>> expected.group.size.studyarea <- abundance.indiv / abundance.groups
>>> print(expected.group.size.studyarea)
>>> print(thesummary$dht$Expected.S)
>>>

>>> *Question 2 (precision of estimated expected group size)*

>>>
>>> From comments in the function dht.se in the mrds package:
>>>
>>> This computes the se(E(s)). It essentially uses 3.37 from Buckland et al.
>>> (2004) but in place of using 3.25, 3.34 and 3.38, it uses 3.27, 3.35 and an
>>> equivalent cov replacement term for 3.38. This uses line to line
>>> variability whereas the other formula measure the variance of E(s) within
>>> the lines and it goes to zero as p approaches 1.
>>>
>>> Here is Eqn. 3.37
>>>
>>>

>>> *Question 3 (density in study area)*

>>>
>>> Because the study area was stratified, the overall density is a weighted
>>> mean of stratum-specific densities, with the weighting factor being the
>>> sizes of the respective strata. See Buckland et al. (2001) Sect 3.7.1, Eqn
>>> 3.122 or equivalently (below) Buckland et al. (1993), Sect 3.8.1
>>>
>>> area <- thesummary[["dht"]][["clusters"]][["summary"]][["Area"]] #area
>>> density <- thesummary[["dht"]][["individuals"]][["D"]][["Estimate"]]
>>> #density
>>>
>>> density.studyarea <- (area[1]/area[3] * density[1]) + (area[2]/area[3] *
>>> density[2])
>>> print(density.studyarea)
>>> print(density[3])
>>>

>>> *Question 3b (precision of overall density estimate)*

>>>
>>> Variance (or other measures of precision) of the density estimate from a
>>> stratified survey is described in Section 3.7.1 of Buckland et al. (2001)
>>> detailed in Eqns. 3.123 through 3.126; equivalently Buckland et al. (1993),
>>> Sect. 3.8.1 (below)
>>>

>>> Perhaps others can provide more detailed answers to your questions.
>>> On 23/04/2021 09:47, Don Carlos wrote:
>>>
>>> Dear Eric et al.,
>>>
>>> I am trying to understand some of the math in deriving the estimates
>>> based on surveys with cluster size variable. I have tried to dig into the
>>> source code of Distance/mrds, but cannot produce equivalent outputs. Any
>>> pointer and help hugely appreciated. I have to apologise in advance for any
>>> basic math mistakes I must have made...
>>>
>>> Using the mink ClusterExercise from Distance as a reproducible example.
>>>
>>> library(Distance)
>>> library(tidyverse)
>>> options(digits = 7)
>>> data(ClusterExercise)
>>> mod.cs <- ds(ClusterExercise)
>>>

>>> *Question 1: How is the Expected.S derived? *

>>> *Question 2: Getting the SE and CV for the Expected.S *

>>>
>>> Following the above attempt to derive Expected.S I have tried to estimate
>>> its SE and CV, which do not add up to what Distance provides me, based on
>>> the following steps, and referencing the following link:
>>>
>>> https://groups.google.com/g/distance-sampling/c/QYQ4Oysf9JY/m/aEkV6_JpAwAJ
>>>
>>> # using outputs from ds
>>> size.vct <- mod.cs[["ddf"]][["data"]][["size"]]
>>> n.clst <- length(size.vct)
>>>
>>> var.s <- var(size.vct, na.rm=T)
>>> se.s <- sqrt(var.s/n.clst)
>>> cv.s <- se.s/mean(size.vct) # mean not equivilant to ds Expected.S as per
>>> Q1
>>> cv.s
>>>
>>> I get cv.s = 0.1013096 and ds outputs cv.s = 0.2072411
>>>

>>> *Question 3: Deriving final estimates *

>>> <https://groups.google.com/d/msgid/distance-sampling/CAFTQVoD%3D22CNpA5FB3s2tbywkbdp9zzw0MKrGgVK3ykz-_ZS9g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .

>>>
>>> --
>>> Eric Rexstad
>>> Centre for Ecological and Environmental Modelling
>>> University of St Andrews
>>> St Andrews is a charity registered in Scotland SC013532
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "distance-sampling" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to distance-sampl...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/distance-sampling/e8c4625f-e515-4148-bb6b-c0b81bc2f69dn%40googlegroups.com

>> <https://groups.google.com/d/msgid/distance-sampling/e8c4625f-e515-4148-bb6b-c0b81bc2f69dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .

>>
>> --
>> Eric Rexstad
>> Centre for Ecological and Environmental Modelling
>> University of St Andrews
>> St Andrews is a charity registered in Scotland SC013532
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups "distance-sampling" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/CAJi8COsSarM1MYJu=JTogiTmeVM64kouQ58ebHaFnf=+_X...@mail.gmail.com.
>

Eric Rexstad

unread,

Sep 2, 2021, 3:17:18 AM9/2/21

to Brian Leo, distance-sampling

Brian

Prof Fewster is correct that log-based confidence intervals are reported by the distance sampling software for both abundance (N) and density (D) estimates.

The estimation of variance in abundance when group size is a covariate in the detection function is a bit messy. In this situation, abundance is estimated using the Horvitz-Thompson-like estimators. Details can be found in Sect 3.3.3.2 of Chapter 3 in the Advanced Distance Sampling book edited by Buckland et al. (2004) and also in Sect 6.4.3.3 of Buckland et al. (2015).

Abundance of individuals is estimated directly via (from Marques and Buckland (2004))

The variance of abundance is estimated as

where there is a component of encounter rate variance coupled with uncertainty in the parameters of the detection function that now includes a parameter for cluster size influence upon detectability (that bit is the partial derivatives and Hessian matrix at the end).

For more about the Horvitz-Thompson-like estimation with group size as a covariate, I recommend watching the lecture by Prof Thomas on this subject in our online distance sampling materials:

https://workshops.distancesampling.org/online-course/syllabus/Chapter5/

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/CAJi8COsSarM1MYJu%3DJTogiTmeVM64kouQ58ebHaFnf%3D%2B_XwvmQ%40mail.gmail.com.

leob...@gmail.com

unread,

Sep 2, 2021, 2:17:58 PM9/2/21

to distance-sampling

Hi Eric,

Thank you, your explanation and also the online lectures by Prof. Thomas were very helpful. In his lecture, Prof. Thomas says that when cluster size is a covariate, the program Distance calculates abundance using a slightly different equation, and does not estimate variance using analytic methods; rather the bootstrap must be used. I am not using the windows-based Distance software - I am using Distance and related packages in R, so I'd like to confirm that when I run the following the code, that the ds function recognizes that my data frame is in flatfile format (it is), and therefore recognizes "size" as cluster size, and calculates abundance accordingly, and variance using the formula you provided above with the bootstrap method?

model <- ds(df, truncation = 450, formula = ~size)

Thanks

Eric Rexstad

unread,

Sep 3, 2021, 3:27:55 AM9/3/21

to leob...@gmail.com, distance-sampling

Brian

The syntax you provide will recognise the reserved word "size" as group size of the detection and incorporate it into the detection function (you can check this with the following code):

library(Distance)
data("ClusterExercise")
with.size <- ds(ClusterExercise, key="hr",
                truncation=1.5, formula=~size)
summary(with.size$ddf)
Summary for ds object
Number of observations : 88
Distance range         : 0 - 1.5
AIC                    : 45.94455

Detection function:
Hazard-rate key function

Detection function parameters
Scale coefficient(s):
               estimate        se
(Intercept) -0.47284868 0.2488602
size         0.09369241 0.1052601

Shape coefficient(s):
            estimate        se
(Intercept) 1.122796 0.3236553

                       Estimate          SE         CV
Average p             0.6152724 0.06074802 0.09873354
N in covered region 143.0260764 17.06587062 0.11931999

I highlight the estimate of the coefficient in the detection function for the `size` covariate.

It is correct that the Distance R package computes variance of N-hat without the need for bootstrapping. Recognise, you could employ the bootstrap from within the R package by use of the `bootdht` function if you wish.

Given your persistence, I believe you are dubious about your results. If you want to discuss reservations in context of your data, we could do so off-list if you wish.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/20cc0246-1aed-46e6-99b9-9315547b97dfn%40googlegroups.com.

MARIAM HISHAM

unread,

Sep 3, 2021, 9:57:07 AM9/3/21

to distance-sampling

hi, how are u? can you help me I want some detail from the article for density estimation please tell me if you can

Eric Rexstad

unread,

Sep 3, 2021, 10:02:38 AM9/3/21

to MARIAM HISHAM, distance-sampling

Miriam

What question do you wish to ask?

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/e96c79c7-2a36-45a6-b59e-a280e3672b53n%40googlegroups.com.

MARIAM HISHAM

unread,

Sep 11, 2021, 2:36:06 PM9/11/21

to distance-sampling

hi all

can any body help me to write code on R programming for density estimation of local minimum distance

MARIAM HISHAM

unread,

Sep 11, 2021, 4:03:55 PM9/11/21

to distance-sampling

hi all,

can help me to write code on R programming for density destination of minimum local distance

Virus-free. www.avast.com

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/d54cd316-a573-452f-8fac-484ff5d79616n%40googlegroups.com.

Virus-free. www.avast.com

Reply all

Reply to author

Forward