48 views

Skip to first unread message

Apr 23, 2021, 4:47:30 AMApr 23

to distance-sampling

Dear Eric et al.,

library(Distance)

library(tidyverse)

options(digits = 7)

data(ClusterExercise)

mod.cs <- ds(ClusterExercise)

**Question 1: How is the Expected.S derived? **

# pulling the results from the mod.cs, I get:

average.p <- summary(mod.cs)[["ds"]][["average.p"]]

Expected.S <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][3] # mean group size

effort <- summary(mod.cs)[["dht"]][["clusters"]][["summary"]][["Effort"]][[3]] #total effort

truncation <- summary(mod.cs)[["ddf"]][["meta.data"]][["int.range"]][2]

# and estimating D.hat, based on clustered group size, I get outputs which are not equivalent to the ds outputs:

I am trying to understand some of the math in deriving the estimates based on surveys with cluster size variable. I have tried to dig into the source code of Distance/mrds, but cannot produce equivalent outputs. Any pointer and help hugely appreciated. I have to apologise in advance for any basic math mistakes I must have made...

Using the mink ClusterExercise from Distance as a reproducible example.

library(tidyverse)

options(digits = 7)

data(ClusterExercise)

mod.cs <- ds(ClusterExercise)

In the link below your response indicates that this is derived from the average observed group size in the sample.

https://groups.google.com/g/distance-sampling/c/2W7DJtaB82Q/m/eg3f1BzNBAAJ

However, although that seems to be the case of the strata specific estimates, it does not seem to be the case for the total mean - as given in the output in ds.summary().

CALCULATED.Expected.cluster.size <- ClusterExercise %>%

group_by(Region.Label) %>%

summarise(mean=mean(size, na.rm=T)) %>%

as.data.frame() %>%

pull(mean)

DS.Expected.cluster.size <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][-3]

all.equal(CALCULATED.Expected.cluster.size, DS.Expected.cluster.size) # TRUE

Calculating the overall survey wide mean cluster size

https://groups.google.com/g/distance-sampling/c/2W7DJtaB82Q/m/eg3f1BzNBAAJ

However, although that seems to be the case of the strata specific estimates, it does not seem to be the case for the total mean - as given in the output in ds.summary().

CALCULATED.Expected.cluster.size <- ClusterExercise %>%

group_by(Region.Label) %>%

summarise(mean=mean(size, na.rm=T)) %>%

as.data.frame() %>%

pull(mean)

DS.Expected.cluster.size <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][-3]

all.equal(CALCULATED.Expected.cluster.size, DS.Expected.cluster.size) # TRUE

Calculating the overall survey wide mean cluster size

CALCULATED.TOTAL.Expected.cluster.size.total <- ClusterExercise %>%

summarise(mean=mean(size, na.rm=T)) %>%

as.data.frame() %>%

pull(mean)

DS.TOTAL.Expected.cluster.size <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][3]

all.equal(CALCULATED.TOTAL.Expected.cluster.size.total, DS.TOTAL.Expected.cluster.size) # FALSE

Not equivalent, and there is a difference of 0.05 in mean. Guess this is a relatively small number, but as it is also happening in my actual analysis I would like to know if there is some weighting of means going on?

**Question 2: Getting the SE and CV for the
Expected.S **

summarise(mean=mean(size, na.rm=T)) %>%

as.data.frame() %>%

pull(mean)

DS.TOTAL.Expected.cluster.size <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][3]

all.equal(CALCULATED.TOTAL.Expected.cluster.size.total, DS.TOTAL.Expected.cluster.size) # FALSE

Not equivalent, and there is a difference of 0.05 in mean. Guess this is a relatively small number, but as it is also happening in my actual analysis I would like to know if there is some weighting of means going on?

Following the above attempt to derive Expected.S I have tried to estimate its SE and CV, which do not add up to what Distance provides me, based on the following steps, and referencing the following link:

https://groups.google.com/g/distance-sampling/c/QYQ4Oysf9JY/m/aEkV6_JpAwAJ

https://groups.google.com/g/distance-sampling/c/QYQ4Oysf9JY/m/aEkV6_JpAwAJ

# using outputs from ds

size.vct <- mod.cs[["ddf"]][["data"]][["size"]]

n.clst <- length(size.vct)

var.s <- var(size.vct, na.rm=T)

se.s <- sqrt(var.s/n.clst)

cv.s <- se.s/mean(size.vct) # mean not equivilant to ds Expected.S as per Q1

cv.s

I get cv.s = 0.1013096 and ds outputs cv.s = 0.2072411

**Question 3: Deriving final estimates **

n.clst <- length(size.vct)

var.s <- var(size.vct, na.rm=T)

se.s <- sqrt(var.s/n.clst)

cv.s <- se.s/mean(size.vct) # mean not equivilant to ds Expected.S as per Q1

cv.s

I get cv.s = 0.1013096 and ds outputs cv.s = 0.2072411

# pulling the results from the mod.cs, I get:

average.p <- summary(mod.cs)[["ds"]][["average.p"]]

Expected.S <- summary(mod.cs)[["dht"]][["Expected.S"]][["Expected.S"]][3] # mean group size

effort <- summary(mod.cs)[["dht"]][["clusters"]][["summary"]][["Effort"]][[3]] #total effort

truncation <- summary(mod.cs)[["ddf"]][["meta.data"]][["int.range"]][2]

# and estimating D.hat, based on clustered group size, I get outputs which are not equivalent to the ds outputs:

# Using D = n * f(0) * E(s) / 2L

D.hat <- n.clst *(1/average.p)*Expected.S/((2*truncation)*effort)

summary(mod.cs)

I get D.hat = 0.06556279 and ds gets D.hat = 0.05723343, similar, but not equivalent.

cv.er <- 0.2416576

cv.pa <- 0.07679084

cv.s <- 0.2072411

ds.D.hat <- 0.05723343

se.D <- ds.D.hat * sqrt(cv.er^2 + cv.pa^2 + cv.s^2)

cv.D <- se.D / ds.D.hat

I get cv.D = 0.3274815 and ds gets cv.D = 0.3682045

summary(mod.cs)

I get D.hat = 0.06556279 and ds gets D.hat = 0.05723343, similar, but not equivalent.

cv.er <- 0.2416576

cv.pa <- 0.07679084

cv.s <- 0.2072411

ds.D.hat <- 0.05723343

se.D <- ds.D.hat * sqrt(cv.er^2 + cv.pa^2 + cv.s^2)

cv.D <- se.D / ds.D.hat

I get cv.D = 0.3274815 and ds gets cv.D = 0.3682045

Apr 25, 2021, 10:24:54 AMApr 25

to Don Carlos, distance-sampling

Don Carlos

Detailed question and supporting
calculations regarding computations involving animals occurring
in groups. The fundamental challenge with your work is the
choice of data set. Indeed this data set does include animals
(minke whales) that occur in groups, but with the added
complication that the survey employed stratification. Hence the
computations include not only details involved in group size,
but also details involved in stratified estimates. I think it
is the later that is the cause of the discrepancies you
describe.

Sprinkled below are some modifications to your calculations along with some narrative and supporting documentation

library(Distance)

data(ClusterExercise)

mod.cs <- ds(ClusterExercise)

thesummary <- summary(mod.cs)

Expected.S <-
thesummary[["dht"]][["Expected.S"]][["Expected.S"]][3] # mean
group size

effort <-
thesummary[["dht"]][["clusters"]][["summary"]][["Effort"]]
#effort

abundance.indiv <-
thesummary[["dht"]][["individuals"]][["N"]][["Estimate"]] #abund
indiv

abundance.groups <-
thesummary[["dht"]][["clusters"]][["N"]][["Estimate"]] #abund
group

**Question 1 (expected group size)**

For the study area, expected group size is
the ratio of estimated individual abundance to estimated group
abundance:

expected.group.size.studyarea <-
abundance.indiv / abundance.groups

print(expected.group.size.studyarea)

print(thesummary$dht$Expected.S)

**Question 2 (precision of estimated
expected group size)**

From comments in the function dht.se in the mrds package:

This computes the se(E(s)). It essentially uses 3.37 from Buckland et al. (2004) but in place of using 3.25, 3.34 and 3.38, it uses 3.27, 3.35 and an equivalent cov replacement term for 3.38. This uses line to line variability whereas the other formula measure the variance of E(s) within the lines and it goes to zero as p approaches 1.

Here is Eqn. 3.37

**Question 3 (density in study area)**

Because the study area was stratified, the overall density is a weighted mean of stratum-specific densities, with the weighting factor being the sizes of the respective strata. See Buckland et al. (2001) Sect 3.7.1, Eqn 3.122 or equivalently (below) Buckland et al. (1993), Sect 3.8.1

area <-
thesummary[["dht"]][["clusters"]][["summary"]][["Area"]] #area

density <-
thesummary[["dht"]][["individuals"]][["D"]][["Estimate"]]
#density

density.studyarea <- (area[1]/area[3] * density[1]) +
(area[2]/area[3] * density[2])

print(density.studyarea)

print(density[3])

**Question 3b (precision of overall
density estimate)**

Variance (or other measures of precision)
of the density estimate from a stratified survey is described in
Section 3.7.1 of Buckland et al. (2001) detailed in Eqns. 3.123
through 3.126; equivalently Buckland et al. (1993), Sect. 3.8.1
(below)

Perhaps others can provide more detailed
answers to your questions.

--

You received this message because you are subscribed to the Google Groups "distance-sampling" group.

To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/CAFTQVoD%3D22CNpA5FB3s2tbywkbdp9zzw0MKrGgVK3ykz-_ZS9g%40mail.gmail.com.

-- Eric Rexstad Centre for Ecological and Environmental Modelling University of St Andrews St Andrews is a charity registered in Scotland SC013532

May 20, 2021, 6:03:40 AMMay 20

to distance-sampling

Hi team,

I was hoping you might be able to help me on a similar scenario. I have used three camera grids in one survey area to increase the number of survey locations without needing more cameras. This has given me three strata that I have fitted different detection functions for, and whose combined AIC is better than a common detection function. I have used the above directions to estimate an overall density using the weighted mean of stratum-specific densities (my final subset of code below).

Eff1 <- bh1.90.hn0$dht$individuals$summary$Effort

Eff2 <- bh2.90.hr0$dht$individuals$summary$Effort

Eff3 <- bh3.90.hr0$dht$individuals$summary$Effort

TotalArea <- sum(Eff1, Eff2, Eff3)

Dens1 <- bh1.90.hn0$dht$individuals$D$Estimate

Dens2 <- bh2.90.hr0$dht$individuals$D$Estimate

Dens3 <- bh3.90.hr0$dht$individuals$D$Estimate

density.studyarea <- (Eff1/TotalArea * Dens1) + (Eff2/TotalArea * Dens2) + (Eff3/TotalArea * Dens3)

print(density.studyarea)

To calculate the other elements that are present in the normal summary can I just substitute the density for the respective criteria I want to estimate (e.g. se or %cv) into the above code?

e.g.

se1 <- bh1.90.hn0$dht$individuals$D$se

se2 <- bh2.90.hr0$dht$individuals$D$se

se3 <- bh3.90.hr0$dht$individuals$D$se

se.studyarea <- (Eff1/TotalArea * se1) + (Eff2/TotalArea * se2) + (Eff3/TotalArea * se3)

I also would like to bootstrap my estimates - is this simply a case of bootstrapping all three of my strata models and then using the combination method as above to establish the weighted bootstrap LCI and UCI for example?

I hope that makes sense.

Many thanks in advance,

Jamie

May 21, 2021, 3:48:27 AMMay 21

to Jamie McKaughan, distance-sampling

Jamie

I would approach your problem in a slightly different coding way. It is as if you've conducted multiple surveys in the same area. I would combine the three grids into a single data set and use "grid" as a covariate in the detection function model and also as you stratification criterion.

Using the function 'dht2' in the Distance package, it can perform the effort-weighted computations you are performing by hand. The function will also properly handle the computation of uncertainty. The scenario is depicted thus

For your second question about the
bootstrap, the function `bootdht` in the Distance package can
perform bootstrap computations, but it is not yet capable of
bootstrapping complex situations such as yours.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/ef89a4de-a2c4-482f-8ad3-e63b2468d6e8n%40googlegroups.com.

May 21, 2021, 4:42:04 AMMay 21

to distance-sampling

Hi Eric

Thanks for this - I will give it a go instead!

Many thanks

Jamie

May 25, 2021, 11:33:59 AMMay 25

to distance-sampling

Hi Eric

Does using covariates restrict each covariate to have to use the same detection function? Is there a way to make R choose which is best for each? i.e. Grid1 might use HR, but Grid3 use HN, and provide effort-weighted computations accordingly? Or is that not good practice regardless?

Thanks

Jamie

May 25, 2021, 11:49:48 AMMay 25

to Jamie McKaughan, distance-sampling

Jamie

Use of "grid" as a covariate in the detection function will make the scale parameter (sigma) of a key function differ for each grid. Catch being that you must assume that all grids share the same key function (no mixing of hazard rate and half normals between grids).

You can contrast the grid-specific
detection function model against a model without the grid
covariate--that model then assumes the detection function is the
same across all grids.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/8bf0127f-5058-4c2c-9eb9-758770792d9an%40googlegroups.com.

May 25, 2021, 11:55:10 AMMay 25

to distance-sampling

Okay - cool.

Thanks

Jamie

May 26, 2021, 9:53:35 AMMay 26

to distance-sampling

Hi Eric

I have run a couple of analyses this way, but in all cases the 'total' value does not provide se, cv or CI's of any use - I was expecting a weighted version from what you wrote before - is this normal or does this suggest an error in my code/data files?

Summary statistics:

Region.Label Area CoveredArea Effort n k ER se.ER cv.ER

Grid1 1 3583.315 3520385 234 19 0 0 0.228

Grid2 1 2768.840 2720213 276 19 0 0 0.460

Grid3 1 3458.497 3397759 452 21 0 0 0.450

Total 1 9810.652 9638357 962 59 0 0 0.256

Density estimates:

Region.Label Estimate se cv LCI UCI df

Grid1 0.2035 0.047 0.232 0.1262 0.3282 19.239

Grid2 0.3106 0.144 0.462 0.1233 0.7824 18.299

Grid3 0.4073 0.184 0.452 0.1657 1.0009 20.347

Total 0.3056 0.000 0.000 0.3056 0.3056 38.195

Component percentages of variance:

Region.Label Detection ER

Grid1 3.27 96.73

Grid2 0.82 99.18

Grid3 0.86 99.14

Total 2.62 97.38

Many thanks

Jamie

May 26, 2021, 10:42:51 AMMay 26

to Jamie McKaughan, distance-sampling

Jamie

Not sure what's happening here, not surprised you are concerned. Note that the first table of results (Summary statistics). I'm guessing the encounter rates are minute (because they are detections per snapshot). The cv.ER values in that first table seem plausible, so those zeros are likely just formatting problems.

But the line you have highlighted does not
seem right--the upper and lower confidence interval bounds are
equal, implying SE=0, which clearly isn't right. What code did
you use to get this result?

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/d090f3fa-f0f5-4b5c-a9b9-ae9a594dd3bcn%40googlegroups.com.

May 27, 2021, 11:13:37 AMMay 27

to distance-sampling

Hi Eric

Yes I think you are right regarding encounter rate. Table below shows the value a little better! Like you say the remaining elements are not plausible though.

I used the below to apply my model:

bh.90.hr0.Grid <- ds(bh.90, transect = "point", key="hr", adjustment = NULL,

cutpoints = mybreaks, truncation = trunc.list, formula = ~Region.Label)

And then applied the stratification as you showed above:

bh.90.hr0.Grid.dens <- dht2(bh.90.hr0.Grid, flatfile=bh.90, strat_formula = ~Region.Label,

er_est = "P2", convert_units = conversion, stratification = 'replicate')

Then produced the density report:

print(bh.90.hr0.Grid.dens, report="density")

Thanks,

Jamie

Jun 1, 2021, 6:31:13 AMJun 1

to distance-sampling

Hi Eric,

I quit restarted R and reran all my code and unfortunately the same result occurs. Does my code look correct?

Many thanks

Jamie

Jun 1, 2021, 7:29:29 AMJun 1

to Jamie McKaughan, distance-sampling

Jamie

I haven't spotted problems with your
code. If you want to send more details to me offline, I might
be able to have a look tomorrow.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/778d40d2-4701-443a-a126-556948e16f11n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages