Expected cluster size

488 views
Skip to first unread message

Terry Koen

unread,
Mar 7, 2016, 8:44:00 PM3/7/16
to distance...@googlegroups.com
Dear Distance Samplers

When working with clustered data, DISTANCE software can be used to estimate expected cluster size via a regression approach. Page 75 of Intro to Distance Sampling (Reprint 2011) describes an approach to investigate size bias, namely the regression of cluster size (or log cluster size) on distance is expected to have a POSITIVE slope, while a regression of cluster size (or log cluster size) on probability of detection g(i) to have a NEGATIVE slope.  

Page 121 of the same text shows Fig 4.3 which is a graphic of the former combination above. Does anyone know of a reference which has graphics of the latter combination - log cluster size versus detection probability please?

I am reading a paper at the moment that interprets a NEGATIVE slope of log(s(i)) on g(x(i)) as being somewhat detrimental, "... lead to a regression line with a negative slope, and thus the computed expected group size was lower than the observed group size. Using these results is likely to lead to increased inaccuracy". Does this seem sound judgement?

Also, when characterizing a detection function, we need some data. I guess the more bins to the Perpendicular Distance axis the better - 3 or less too few, 5 or more desirable? Then, as a guide, how many clusters (or individuals) are needed to accurately define a detection function? Maybe 200, 300 ...? Some managers desire an estimate of animal population within quite a small specific area, where only 20 or 30 clusters have been spotted. I worry that this is too few to use to estimate a detection function. 

Any thoughts or experiences would be appreciated.

Regards, Terry

Eric Rexstad

unread,
Mar 8, 2016, 6:07:45 AM3/8/16
to Terry Koen, distance...@googlegroups.com
Terry

You've done a lot of detective work to try to sort out the matter of size bias regression.  I had a look through some reprints in search of a graph of ln(cluster size) vs g-hat(x) and I failed to find a figure.  However, here is a figure we use as part of our distance sampling training materials, showing 4 combinations of independent and response variable transformed and untransformed, with the combination about which you are asking highlighted:

In each instance the horizontal arrow is pointing at the size-bias adjusted estimate of expected group size.  As you suggest, the slope of the relationship of ln(cluster size) vs g-hat(x) is negative.  The adjusted estimate of group size should be smaller than the mean size of observed clusters, because the detection process tends to miss the small clusters at large perpendicular distances.  g-hat(x) is closer to 1 at small distances from the transect, which is why the slope is reversed when g-hat(x) is used as the independent variable as opposed to x as the dependent variable.  Hence you are justified in questioning what you have read .

Think of bins as "points of support" for detection function modelling.  Those points of support serve two purposes: a) characterising the shape of the detection function and b) assessing the goodness of fit of detection function to data (when using chi-square GOF testing, which is all that is available when data are binned at time of collection).  When fitting a 2-parameter hazard rate model to 3 bin data, there are no degrees of freedom remaining after fitting to assess goodness of fit.  That can serve as a guide about number of bins during analysis; however, data collection in the field may place additional constraints upon reliable placement of detections into the proper bin.

The question "how many detections are needed to characterise a detection function" comes up often.  For line transects, we recommend 60-80 detections (see Buckland et al. 2015:23  Distance sampling: methods and applications), however if the data are "well behaved" then reasonable results can arise from fewer detections.  If there are multiple strata (either geographic or treating species as strata), some gains can arise from using strata as a covariate in the detection function modelling.
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To post to this group, send email to distance...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/CANwG95K6Z2j%3D82t0oL15HZxXBXWnddFR_Tf5p8unCVXhm6sroA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Eric Rexstad
Research Unit for Wildlife Population Assessment
Centre for Research into Ecological and Environmental Modelling
University of St. Andrews
St. Andrews Scotland KY16 9LZ
+44 (0)1334 461833
The University of St Andrews is a charity registered in Scotland : No SC013532

Tiago Marques

unread,
Mar 8, 2016, 9:55:53 AM3/8/16
to distance...@googlegroups.com
Hi Terry,

Just to add to what Eric said, regarding your first question about the authors of a paper you were reading and cluster size bias regression. Indeed one would expect in most circumstances that the observed mean cluster size might be higher than the true mean cluster size, since larger groups are easier to detect (but note a possible opposite effect could be observed if say you tended to underestimate group sizes away from the line only!).

Since you provided a verbatim sentence I used google to track it down, there's only 1 hit, so this must be the paper you were reading:

http://crowhops.com/category/uncategorized/page/198/

When one looks in there the authors seem to believe that group sizes are underestimated from the air, based on a previous survey. This is the reason they then state that the size bias regression is leading to problems, because they already believe that the group sizes they have estimated are underestimating the true group sizes, so they do not want an analysis that makes the mean group size even smaller. Not sure this is is a sound strategy. Without seeing the data is hard to make additional comments, but I note that there are at least two problems involved (1. size bias; 2. underestimation of group size) and the way these interact will dictate how one should move forward. But in my opinion, it is not sound to state that you should ignore the size bias, you are trying to trade biases with different directions but unknown magnitudes.

hope this helps

Tiago

For more options, visit https://groups.google.com/d/optout.





Avast logo

Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
www.avast.com


Terry Koen

unread,
Mar 8, 2016, 8:35:34 PM3/8/16
to distance...@googlegroups.com
Thanks Eric and Tiago for your prompt replies.

Eric, thanks very much for the graphic of expected group size. After I sent my questions yesterday, I was working my way through your very informative videos at http://distancesampling.org/videos.html and at about 16:35 minutes into the Distance Sampling Assumptions II video, heard you make mention of a Clustered Populations lecture. I don't think this is available yet to folk outside your Training Courses.
 
Also, I found your Assumptions II video to offer the suggestion to aim for >60 data points upon which to form a frequency histogram and thence the detection function. The Detection Function video also reminded me that the model complexity - the numbers of parameters to estimate in the detection function and any added series expansion - will be limited should the number of distance bins be restrictingly small. 

Tiago, thanks for reminding me that situations could arise where the slope of this regression could be significantly positive, as outlined also on page 75 of the Intro to Distance Sampling text. Such is a dilemma when analysing clustered data; which estimate of group size do you use - the mean observed group size from the air survey (and assume accuracy), a size-bias corrected estimate of group size from the same survey, an estimate from an earlier aerial survey of the same area, an estimate from a ground survey conducted some years earlier, or something else.

Thanks for sharing your knowledge, Terry

  

Eric Rexstad

unread,
Mar 9, 2016, 4:55:39 AM3/9/16
to Terry Koen, distance...@googlegroups.com
Glad you found the videos Terry.  You are correct that I've not made the clusters lecture material available on line.

One comment on your final thought regarding what estimate of group size to use.  Presented with the choices you pose, I would use data collected at the same time from the same platform as data that will be used to estimate group density.  That way I need not make assumptions about constancy of group size over time or constant with sighting conditions.  Similar discussions take place when needing to estimate g(0) because not all animals are seen on the transect; what estimate of g(0) do you use?  An estimate coming from the data collected on the same survey, or an estimate from another place or time?

For more options, visit https://groups.google.com/d/optout.

JOHN CHAN

unread,
Dec 20, 2017, 2:38:47 AM12/20/17
to distance-sampling
What is the default way of estimating expected cluster size in distance R?

Is there any way to specify the estimation method?

Eric Rexstad

unread,
Dec 20, 2017, 5:36:32 AM12/20/17
to JOHN CHAN, distance-sampling
Morning John

I believe use of ds() to produce abundance estimates for detections made
in clusters uses average observed group size in the sample to scale up
density of clusters to density of individuals.

If you expect size-bias in your sample (i.e. small samples are
undetected at large distances), the customary remedy is to include
cluster size as a covariate in the detection function modelling.


On 20/12/2017 07:38, JOHN CHAN wrote:
> What is the default way of estimating expected cluster size in distance R?
>
> Is there any way to specify the estimation method?
>

JOHN CHAN

unread,
Dec 20, 2017, 5:59:25 AM12/20/17
to distance-sampling
Morning Eric,

something like this?

 formula = ~as.factor(data$size)



Eric Rexstad

unread,
Dec 20, 2017, 6:05:00 AM12/20/17
to JOHN CHAN, distance-sampling

That's essentially correct John.  Here's an example from our vignette on minke whales

whales$stratum <- ifelse(whales$Region.Label=="North", "N", "S")
whale.strat.covariate <- ds(whales, truncation=whale.trunc, quiet=TRUE,
                  formula = ~as.factor(stratum),
                  key="hr",  adjustment=NULL)
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To post to this group, send email to distance...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

JOHN CHAN

unread,
Dec 20, 2017, 9:52:18 PM12/20/17
to distance-sampling
Thank Eric. If the methodology specifically said "size-bias corrected estimate of group size was calculated by regressing loge of group size against distance", is there any way to model that in Distance in R/ mrds in R?

Eric Rexstad

unread,
Dec 21, 2017, 10:03:39 AM12/21/17
to JOHN CHAN, distance-sampling

John

Not exactly trivial.  Here's some poorly documented R code working with the online crabeater seal data.

It performs the log(group size) vs Pr(detection) regression.  Then it performs back-calculation to derive a point estimate of expected group size when detection probability=1.

However, the hard part is producing a measure of precision for that point estimate.  You need a measure of precision (a variance) so that you can propagate the uncertainty of the detection function, encounter rate AND expected group size to produce overall uncertainty in your estimate of abundance.

I've take an easy case here, with a half-normal detection function model without adjustment terms.  If you had a more complex model, you would need more sophistication in the computation of sigma and phats.

library(Distance)
seals <- read.csv(file = MailScanner has detected a possible fraud attempt from "distancesampling.org" claiming to be "http://distancesampling.org/R/vignettes/crabbieMCDS.csv")
halfnorm <- ds(seals, adjustment = NULL)
plot(halfnorm)
table(seals$size)  # not a good example because there is little variation in school size
sigma <- exp(halfnorm$ddf$par[1])
phats <- exp(-seals$distance^2/(2 * sigma^2))
log.schools <- log(seals$size)
size.regr <- lm(log.schools ~ phats)
my.value <- data.frame(phats=c(1))
log.expected.school.size <- predict(size.regr, se.fit = TRUE, newdata = my.value)
expected.school.size <- exp(log.expected.school.size$fit)
term1.for.var <- log.expected.school.size$se.fit^2
var.expected.school.size <- expected.school.size^2 * term1.for.var
se.expected.school.size <- sqrt(var.expected.school.size)


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages