# Using relative cluster size bias to correct for expected cluster sizes and calculate density and its variance in line transects

39 views

### ducros....@gmail.com

Feb 16, 2022, 3:14:15 AMFeb 16
to distance-sampling

Dear Distance users,

I have two questions.

1.      I wish to estimate cluster size bias in a population that was surveyed using line transects, and use this size bias to correct for cluster sizes observed in a neighbor population.

I would follow the following procedure.

I first extract an estimation of expected cluster size (CSexp) from the summary of the retained DS model.

I also have the observed mean cluster size (CSobs) for each of my species.

Then, to estimate the relative bias associated with cluster size estimation, I use:

Relative Bias = (CSobs – CSexp) / CSexp

Next, I apply this relative bias to each cluster size observed in the neighbor site where we have line transects for which we tried to estimate distances, but unfortunately, distance data are not satisfactory. Thus we do not have distance data for that neighbor site. Then I use the equation 3.2 provided in Buckland et al. (2001) (page 51) to estimate density :

D = E(n)*E(s)/(a*Pa), with E(n) the expected number of clusters, E(s) the expectation of cluster size for the population, a the area covered by the survey and Pa the probability of detection for an object within this area.

Would this approach be correct ?

Also, the ‘cluster size’ covariate is not always retained in the most supported DS model. In that case, does that mean there is no size bias, and thus, we would not need to correct for cluster sizes obtained for the neighboor site to estimate densities ?

2.      Question regarding Eqn 3.2 and 3.3 of Buckland et al. (2001) – page 51

To estimate densities for species in the neighbor site, I use Equation 3.2 presented page 51 of Buckland et al. (2001), which uses E(n), the expected number of clusters in the surveyed area.

To me, the estimator of E(n) would be n, the total number of clusters sampled in the surveyed area. This number is a fixed number in my dataset, as I have a unique and precise number of clusters that were observed. However, page 52 of the book provides the equation (3.4) that is based notably on the variance of n. However, I do not understand how we can have a variance associated to n, as this number is fixed in our dataset.

There is probably something that I am missing here, but I cannot understand what, and I would greatly appreciate your help to better understand those equations and how to implement them for my data.

Cheers,

Delphine

### Stephen Buckland

Feb 16, 2022, 4:01:45 AMFeb 16
to ducros....@gmail.com, distance-sampling
1. This seems OK, though you seem to be making it more complicated than necessary.  Assuming CSexp was calculated adjusting for size bias, then you can just take E(s) from the neighbouring site and multiply it by CSexp/CSobs.  A couple of provisos here.  First, you must assume that the data from the neighbouring site are similar to those from the first site, so that you can assume the same correction factor applies.  Second, you should estimate the contribution to variance of this correction factor, for example using a bootstrap.
2. Your n is the observed sample size.  If you were to repeat the same survey, you would observe a different n.  Thus it is not fixed.  (Contrast this with L, which if you repeat the first survey using the same design, would remain unchanged.)

Steve Buckland

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.

### ducros....@gmail.com

Feb 16, 2022, 7:47:03 AMFeb 16
to distance-sampling

Dear Professor Buckland,

1.      Thank you for your explanation. Indeed, I may have complicated things more than necessary and I will use the method you suggest. About the variance, cannot we propagate the variance of the different estimators (E(s), Csexp and Csobs) using the delta method ? Could you explain me otherwise why it is better to use a bootstrap ?

2.      For n, I understand what you mean, but in my case, the survey was performed only once. Then, how can I calculate a variance for n if I only have one "replicate" for this survey ?

Many thank again for your help,

Cheers,

Delphine

### Stephen Buckland

Feb 16, 2022, 9:24:23 AMFeb 16
to ducros....@gmail.com, distance-sampling
• 1.      Thank you for your explanation. Indeed, I may have complicated things more than necessary and I will use the method you suggest. About the variance, cannot we propagate the variance of the different estimators (E(s), Csexp and Csobs) using the delta method ? Could you explain me otherwise why it is better to use a bootstrap ?

The delta method could be used, but the problem is that Csexp and Csobs cannot be assumed to be independent – you would overestimate variance if you assume they are.

• 2.      For n, I understand what you mean, but in my case, the survey was performed only once. Then, how can I calculate a variance for n if I only have one "replicate" for this survey ?

If you want to draw inference on a wider area, you need to have a design that has replicate lines through the study area.  We use variation in encounter rate among lines to estimate variance in n.  If you have only a single line, inference is restricted to the single strip you surveyed.  In that case, you might assume that N is distributed as Poisson, with var(n)=E(n), estimated by n.  This generally underestimates true variance, as most natural populations are overdispersed relative to Poisson.

Steve

### ducros....@gmail.com

Mar 1, 2022, 9:25:07 AMMar 1
to distance-sampling

Dear Professor Buckland,

Thank you very much for your precisions. I still have a few questions.

1.    You said: "The delta method could be used, but the problem is that Csexp and Csobs cannot be assumed to be independent – you would overestimate variance if you assume they are."

=> I understand, thank you for explaining that to me. I think I have to use a bootstrap for both CSobs and CSexp, is that right ? Or is it better to compute var(CSobs) from the observed data ?

2.    You said: "If you want to draw inference on a wider area, you need to have a design that has replicate lines through the study area.  We use variation in encounter rate among lines to estimate variance in n.  If you have only a single line, inference is restricted to the single strip you surveyed.  In that case, you might assume that N is distributed as Poisson, with var(n)=E(n), estimated by n.  This generally underestimates true variance, as most natural populations are overdispersed relative to Poisson."

=>  I apologise, I believe I may not have explained well enough my study design. In the neighbor site, in which we want to estimate densities, I do not have a single transect in my survey, but 43 line transects. Those 43 lines transects were surveyed only one year.

In your previous message, you said we can use variation in encounter rate among lines to estimate variance in n. I understand that I simply have to compute var(n) over my sample of 43 transects. However, did you rather mean to estimate Var [n/L] = Var [n] =Var [n1] + Var [n2] +Var [nk] ?

I apologise again if my previous messages were not clear, and thank you very much for your help.

Best regards,

Delphine

### Stephen Buckland

Mar 1, 2022, 10:26:40 AMMar 1
to ducros....@gmail.com, distance-sampling

The bootstrap resampling lines handles the overdispersion correctly.  No, that is not how Distance calculates the encounter rate variance.  If you need to repeat the calculation outside of Distance, you’ll need to use the formula in the distance sampling books.  If you don’t have any of these, the original 1993 book is available online:

### ducros....@gmail.com

Mar 3, 2022, 9:29:36 AMMar 3
to distance-sampling

Dear Professor Buckland,

Many thanks for your answer and your help. I think I now have all the information I need to pursue my analyses.

Many thanks again !

Best regards,

Delphine