Effect of habitat covariate upon estimated density

Two stage analysis propagating detection function uncertainty into GLM predictions.

Eric Rexstad http://distancesampling.org (CREEM, Univ of St Andrews)https://creem.st-andrews.ac.uk
2021-03-09

Investigators often wish to make inferences from their surveys beyond simply reporting “What is the animal density?” Ecological curiosity desires answers to questions such as

What effect does change in habitat characteristic X have upon animal density?

Answers to such questions are sought through use of generalised linear modelling (of which analysis of variance and regression are subsets). However, simple modelling of counts fails to take into account imperfect detectability. When detectability is imperfect, the response variable possesses uncertainty derived from uncertainty in the probability of detection.

A number of authors have recognised the need to incorporate this uncertainty into the analysis of habitat effects upon estimated density. One approach (employed here) conducts the analyses in two stages: first detectability is estimated and \(P_a\) or \(\mu\) or \(\rho\) is estimated. A second stage fits a model to count data adjusting using the estimated detectability parameters. This approach is exemplified by Buckland et al. (2009). Alternatively, single state approaches have been described by Oedekoven et al. (2013) and Oedekoven et al. (2014).

Propagation of uncertainty from the first step into the second step is accomplished via a bootstrap, with the effective area sampled serving as an offset in the generalised linear modelling (GLM) analysis. The method described herein only applies when the link function is the log link, for the following reason.

We want to model bird density in our GLM, derived from the counts we observed on each transect. Density is defined as

\[D = \frac{n}{\text{Eff Area}} \] where \(n\) is the count on each transect.

Therefore

\[\frac{n}{\text{Eff Area}} = e^{\text{linear predictor}}\] when using a log link function.

Our count model works with n as the response variable \[\begin{align} n &= \text{Eff Area} \cdot e^{\text{linear predictor}} \\ &= e^{\text{linear predictor} + log(\text{Eff Area})} \end{align} \]

making \(log(\text{Eff Area})\) the offset. If other link functions are used, the offset will be different.

Ecological question and data

We present two examples of such analyses, using data from the package RDistance (McDonald et al., 2019). In both cases, we fit detection detection functions to data using the Distance package Miller et al. (2019). We analyse a line transect survey consisting of 72 transects with interest in effect of habitat features upon density of Brewer’s sparrows (Spizella breweri) (J. Carlisle & Chalfoun, 2020). The second example is of 120 point transects examining habitat influences upon density of sage thrashers (Oreoscoptes montanus) (J. D. Carlisle et al., 2018). The data organisation is similar for each data set; one data frame containing information on detections and another data frame containing information for each transect.

Programming matters

Before walking through the code, we provide some guidance regarding adaptation of this approach for your data. The code is not sufficiently modularised so that you can merely slot your data into a function. There are matters to consider regarding data organisation, naming of data frame columns, choice of detection functions to fit and specification of count model used for inference. These points are summarised in the box below.

Programming details to adapt for your use

Analysis of line transect survey

library(Rdistance)
data("sparrowDetectionData")
data("sparrowSiteData")
sparrowSiteData$myCount <- tapply(sparrowDetectionData$groupsize,
                                  sparrowDetectionData$siteID, sum)
sparrowSiteData$myCount[is.na(sparrowSiteData$myCount)] <- 0
library(Distance)

Data organisation

In Carlisle’s data set, sightings information is kept separate from information about each site. For our purposes, we will merge those together. In addition, some field names are changed for consistency with functions in the Distance package.

newsparrow <- merge(sparrowDetectionData, sparrowSiteData, by="siteID", all=TRUE)
names(newsparrow) <- sub("observer", "obs", names(newsparrow))
names(newsparrow) <- sub("dist", "distance", names(newsparrow))
names(newsparrow) <- sub("length", "Effort", names(newsparrow))

Analysis parameters specification

Although not formally written as a set of functions, we bring to the front of the code arguments the user will need to change to alter to suite their needs. Candidate transect-level predictors for the detection function are observer, bare, herb, shrub, height, shrubclass. The same predictors, with the exception of observer could be used to model the sparrow counts.

pointtransect <- FALSE        # survey conducted using lines or points
dettrunc <- 100  # truncation for detection function
myglmmodel <- as.formula(myCount ~ shrub + offset(log(effArea)))  # habitat model to fit to animal density
nboot <- 150       # number of bootstrap replicates
set.seed(1900)   # random number seed

Fit detection function

We fit a series of detection function models using the transect-level covariates available in the Brewer’s sparrow data set.

surveytype <- ifelse(pointtransect, "point", "line")
woo.o <- ds(data=newsparrow, truncation=dettrunc, formula=~obs, 
            transect=surveytype, quiet=TRUE)
# woo.ob <- ds(data=newsparrow, truncation=dettrunc, formula=~obs+bare, transect=surveytype, quiet=TRUE)
woo.oh <- ds(data=newsparrow, truncation=dettrunc, formula=~obs+height, 
             transect=surveytype, quiet=TRUE)
woo.os <- ds(data=newsparrow, truncation=dettrunc, formula=~obs+shrubclass, 
             transect=surveytype, quiet=TRUE)
woo.h <- ds(data=newsparrow, truncation=dettrunc, formula=~height, 
            transect=surveytype, quiet=TRUE)
woo.s <- ds(data=newsparrow, truncation=dettrunc, formula=~shrubclass, 
            transect=surveytype, quiet=TRUE)
woo <- ds(data=newsparrow, truncation=dettrunc, formula=~1, 
          transect=surveytype, quiet=TRUE)
knitr::kable(summarize_ds_models(woo.o,  woo.oh, woo.os, woo.h,  woo.s, woo), digits=3, 
             caption="Halfnormal detection function model selection of covariates.", row.names = FALSE)
Table 1: Halfnormal detection function model selection of covariates.
Model Key function Formula C-vM p-value \(\hat{P_a}\) se(\(\hat{P_a}\)) \(\Delta\)AIC
Half-normal ~shrubclass 0.511 0.559 0.025 0.000
Half-normal ~height 0.492 0.558 0.025 0.325
Half-normal ~1 0.469 0.563 0.025 3.154
Half-normal ~obs + shrubclass 0.652 0.556 0.025 5.533
Half-normal ~obs + height 0.659 0.556 0.025 5.672
Half-normal ~obs 0.587 0.559 0.025 7.148

All of the candidate models fit the Brewer’s sparrow line transect data. Also note that the estimate detection probability of all six models is the same to the third decimal. There is a small difference in AIC between the factor covariate shrubclass and the continuous covariate height. Simply for the purposes of demonstration, we will base our inference on the detection function that includes observer as a covariate. There is likely little effect of this model selection choice upon the ecological question of interest.

Plot of detection function

plot(woo.o, showpoints=FALSE, main="Brewer's sparrow\nDetection with observer covariate",
     pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs1"), col="red", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs2"), col="green", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs3"), col="dark green", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs4"), col="blue", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs5"), col="purple", lwd=2, pdf=pointtransect)
legend("topright", title="Observer", legend=1:5,
       lwd=2, lty=2, col=c("red", "green", "dark green", "blue", "purple"))

Prepare for GLM computation

Considerable processing is necessary to correctly calculate the offset value for the GLM. The offset is the effective area sampled by each transect. Effective area \((EA)\) is computed differently, depending upon whether sampling was done with line or point transects.

There is also a distinction between line and point transect computations regarding units in which data are recorded. For line transect example of Sage Sparrows, Effort was measured in meters, but we wish to produce our estimates of density in numbers per hectare. With Thresher Sparrows, radial detection distance was measured in meters, but we wish to measure bird density in numbers per hectare. In both cases, division by 10000 to convert square meters to hectares.

Because only need transect level information, the observation-level information can be removed, to make computation easier. However, transects on which animals were not detected (counts=0) must also be carried forward to the GLM analysis; because 0’s constitute legitimate observations. If the offset values remained missing values (NA), the relationship between animal density and the predictor covariate(s) would not be properly estimated.

effAreafn <- function(detfnobj, newdata, areaconv, truncation, pointflag) {
#
# Input: detfnobj - model object created by `ds()`
#        newdata - data frame of detections and site-level covariates
#        areaconv - conversion of transect length units to area units (m2 to ha for example)
#        truncation - truncation distance for detection function fit
#        pointflag - TRUE if point transect survey, FALSE for line transects
#
# Output: data frame of transects with effective area appended for each transect (to be used as offset)
#
  newdata$Pa <- NA
  k <- 0
  for (i in 1:dim(newdata)[1]) {
    if(newdata[i, "distance"] <= truncation & !is.na(newdata[i,"distance"])) {
      k <- k+1
      newdata$Pa[i] <- detfnobj$ddf$fitted[k]
    }
  }
  if(!pointflag) {
    newdata$effArea <- ifelse(is.na(newdata$Pa), NA, 2 * newdata$Pa * truncation * newdata$Effort / areaconv)
  }else{
    newdata$effArea <- ifelse(is.na(newdata$Pa), NA, newdata$Pa * truncation^2 * pi / areaconv)
  }
  result <- newdata[!duplicated(newdata$siteID),]
#    newdata must contain all covariates used in the detection function!  
  if(!pointflag) {
#     predict for line transects when esw=TRUE returns the ESW that needs to be converted to effective area    
    result$effArea <- ifelse(result$myCount==0,
                              2 * predict(detfnobj, 
                                          newdata=data.frame(obs=result$obs, bare=result$bare,
                                                             herb=result$herb, shrub=result$shrub,
                                                             height=result$height),
                                          esw = TRUE)$fitted * result$Effort / areaconv,
                              result$effArea)
  }else{
#     predict for point transects when esw=TRUE returns the effective area    
    result$effArea <- ifelse(result$myCount==0,
                              predict(detfnobj, 
                                          newdata=data.frame(obs=result$obs, bare=result$bare,
                                                             herb=result$herb, shrub=result$shrub,
                                                             height=result$height),
                                          esw = TRUE)$fitted / areaconv,
                              result$effArea)
    
  }
  return(result)
}

sitesadj <- effAreafn(woo.o, newsparrow, 10000, dettrunc, pointflag = FALSE)

Estimate relationship of density and covariate

Fit a GLM to the observed counts, using as an offset the estimated effective area. By default, specifying family as poisson assumes a log link function. Consequently, the offset must also use the log transform.

To generalise the code, and recognising the same call to glm() will need to be made elsewhere in this analysis, we specify the GLM model we wish to fit as an object of type formula. This was specified in the Analysis parameters specification section above.

univarpredictor <- all.vars(myglmmodel)[2]
glmmodel <- glm(formula=myglmmodel, family="poisson", data=sitesadj)
modelsum <- summary(glmmodel)
tablecaption <- paste("GLM coefficients from counts as function of", 
                      univarpredictor, "with log(effective area) offset.")
kable(modelsum$coef, digits=4, caption=tablecaption)
Table 2: GLM coefficients from counts as function of shrub with log(effective area) offset.
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0566 0.1397 -7.5650 0
shrub 0.0789 0.0096 8.1836 0

Visualise

Even though we have computed the effective area for each transect based upon the fitted detection function, we have not used that effective area to adjust the observed counts. The simple formula

\[\hat{D}_i = \frac{n_i}{EA_i}, i = 1, \ldots , n_{transects}\] where \(EA_i\) is the effective area for the \(i^{th}\) transect, describes this adjustment.

We plot the estimated density against the continuous univariate predictor.

sitesadj$density <- sitesadj$myCount / sitesadj$effArea 
plot(sitesadj[, univarpredictor], sitesadj$density, pch=20,
     xlab=univarpredictor, ylab="Density of sparrows per ha")

Next we superimpose the shape of the relationship derived from the estimated coefficients of the GLM. We are effectively creating our own version of predict on the scale of the response by exponentiating the predicted value of density over the observed range of the continuous univariate predictor.

predData <- data.frame(predictor=seq(min(sparrowSiteData[ , univarpredictor]),
                                 max(sparrowSiteData[, univarpredictor]), length.out=50)) 
lines(predData$predictor, exp(coef(glmmodel)[1] + (coef(glmmodel)[2]*predData$predictor)), lwd=2)
text(predData$predictor[1], max(sitesadj$density,na.rm=TRUE)*0.98, pos=4,
     "GLM coefficients (exponentiated)") 
text(predData$predictor[1], max(sitesadj$density,na.rm=TRUE)*0.88, pos=4,
     bquote(hat(beta[0]) == .(round(exp(coef(glmmodel)[1]), 3)))) 
text(predData$predictor[1], max(sitesadj$density,na.rm=TRUE)*0.78, pos=4, 
     bquote(hat(beta[1]) == .(round(exp(coef(glmmodel)[2]), 3))))

Incorporate uncertainty

We resample our transects with replacement to assess the uncertainty in our point estimates of the relationship between the habitat covariate and the response variable. Specify the number of bootstrap resamples required and allocate storage for the replicate estimates of the GLM parameters: intercept and slope.

intercept.est <- vector("numeric", length=nboot)
slope.est <- vector("numeric", length=nboot)

The code below does the hard work of performing a bootstrap of the data, refitting a detection function to each replicate, producing new transect-specific estimates of covered area and refitting a GLM with the values of covered area as offsets.

Resample original transect data frame. Rebuild the detection data frame corresponding to the transects selected in the bootstrap replicate, manufacturing pseudo-transect IDs and sorting upon those IDs.

for (theboot in 1:nboot) {
  newdetects <- data.frame() 
  bob <- sample(sparrowSiteData$siteID, replace=TRUE, size=length(unique(sparrowSiteData$siteID)))
  for (bootsite in 1:length(bob)) { 
    thissite <- bob[bootsite] 
    glob <- newsparrow[newsparrow$siteID==thissite, ]
    glob$siteID <- sprintf("rep%02d", bootsite)
    newdetects <- rbind(newdetects, glob)  
    }
  newdetects <- newdetects[order(newdetects$siteID), ]
# Refit the detection function model, using observer as a covariate, to the bootstrap replicate
  detfnmod <- ds(data=newdetects, truncation=dettrunc, formula=~shrub, quiet=TRUE)
#  Compute effective area offset for each transect
  bootsitesadj <- effAreafn(detfnmod, newdetects, 10000, dettrunc, pointflag = FALSE)
#   refit the GLM for this bootstrap replicate
  glmresult <- glm(formula= myglmmodel, family="poisson", data=bootsitesadj)
  intercept.est[theboot] <- coef(glmresult)[1]
  slope.est[theboot] <- coef(glmresult)[2] 
}

Having refitted the original GLM model for each of our bootstrap replicates with recalculated detection functions and effective areas, we can assess the uncertainty associated with the presumed GLM relationship of the original model.

The code will plot one trace of the predicted relationship for each bootstrap replicate. Then over the observed range of the covariate, predicted densities are computed at a large number of covariate values. A \(1-\alpha\) level confidence interval is computed for each of the covariate values. The point-by-point confidence interval values are then plotted to indicate the uncertainty about the predicted form of the modelled relationship. There is a discernible effect of shrub cover upon density of Brewer’s sparrows.

for (i in 1:nboot) lines(predData$predictor, exp(intercept.est[i] + (slope.est[i]*predData$predictor)),
                         col=rgb(1,0,0, alpha=0.1), lwd=0.5)
bootLow <- vector("numeric", length=dim(predData)[1]) 
bootUpp <- vector("numeric", length=dim(predData)[1])
for (i in 1:dim(predData)[1]) { 
  y <- vector("numeric", length=nboot) 
  for (j in 1:nboot) { 
    y[j] <- exp(intercept.est[j] + (slope.est[j]*predData$predictor[i])) 
    } 
  bootLow[i] <- quantile(y, 0.025, na.rm=TRUE) 
  bootUpp[i] <- quantile(y, 0.975, na.rm=TRUE)
}
lines(predData$predictor, bootLow, lwd=2, lty=2) 
lines(predData$predictor, bootUpp, lwd=2, lty=2)
Relationship between univariate predictor and Brewer's sparrow density as modelled by GLM.  Offset is estimated covered area. Confidence intervals incorporate uncertainty from imperfect detectability.

Figure 1: Relationship between univariate predictor and Brewer’s sparrow density as modelled by GLM. Offset is estimated covered area. Confidence intervals incorporate uncertainty from imperfect detectability.

Inference regarding the habitat covariate

To assess the effect of the transect-level environmental covariate upon bird density, we focus our attention on \(\hat{\beta_1}\). The greater in magnitude the slope of the covariate, the greater the effect of the covariate upon animal density. If the estimated slope (\(\hat{\beta_1}\)) is indistinguishable from zero, we infer the habitat covariate has no influence upon animal density. For the habitat characteristic shrub, there is a positive response of Brewer’s sparrow to increasing shrub cover.

xlabel <- paste("Estimated slope of", univarpredictor, "and count relationship.")
hist.slope <- hist(slope.est, main="Sampling distribution of slope parameter",
                   xlab=xlabel) 
cibounds <- quantile(slope.est, probs = c(.025,.975), na.rm=TRUE)
abline(v=cibounds, lty=3) 
text(cibounds, max(hist.slope$counts), round(cibounds,3))
Sampling distribution of the parameter of interest for Brewer's sparrow.

Figure 2: Sampling distribution of the parameter of interest for Brewer’s sparrow.

Analysis of point count survey

This second analysis closely mimics the previous analysis of the line transect data. Therefore we present the code blocks with a much reduced amount of narrative.

library(Rdistance)
data("thrasherDetectionData")
data("thrasherSiteData")
thrasherSiteData$myCount <- tapply(thrasherDetectionData$groupsize,
                                  thrasherDetectionData$siteID, sum)
thrasherSiteData$myCount[is.na(thrasherSiteData$myCount)] <- 0
library(Distance)

Data organisation

Note there is no Effort field for these point count data.

newthrasher <- merge(thrasherDetectionData, thrasherSiteData, by="siteID", all=TRUE)
names(newthrasher) <- sub("observer", "obs", names(newthrasher))
names(newthrasher) <- sub("dist", "distance", names(newthrasher))

Analysis parameters specification

Specification of run-specific parameters to analyse the Sage thrasher data set. Particularly note the logical value assigned to pointtransect. Candidate transect-level predictors for the detection function are observer, bare, herb, shrub, height. The same predictors, with the exception of observer could be used to model the thrasher counts.

pointtransect <- TRUE        # survey conducted using lines or points
dettrunc <- 170  # truncation for detection function
myglmmodel <- as.formula(myCount ~ shrub + offset(log(effArea)))  # habitat model to fit to animal density
nboot <- 150       # number of bootstrap replicates
set.seed(1935)   # random number seed

Fit detection function

We fit a series of detection function models using the transect-level covariates available in the Brewer’s sparrow data set.

surveytype <- ifelse(pointtransect, "point", "line")
woo.o <- ds(data=newthrasher, truncation=dettrunc, formula=~obs, 
            transect=surveytype, quiet=TRUE)
woo.h <- ds(data=newthrasher, truncation=dettrunc, formula=~height, 
            transect=surveytype, quiet=TRUE)
woo <- ds(data=newthrasher, truncation=dettrunc, formula=~1, 
          transect=surveytype, quiet=TRUE)
knitr::kable(summarize_ds_models(woo.o, woo.h, woo), digits=3, 
             caption="Halfnormal detection function model selection of covariates.", row.names = FALSE)
Table 3: Halfnormal detection function model selection of covariates.
Model Key function Formula C-vM p-value \(\hat{P_a}\) se(\(\hat{P_a}\)) \(\Delta\)AIC
Half-normal ~obs 0.009 0.292 0.039 0.000
Half-normal with cosine adjustment term of order 2 ~1 0.096 0.394 0.123 12.887
Half-normal ~height 0.037 0.333 0.034 19.168

None of the covariates contribute to fit of the detection function models for thrashers. Inference should be based upon the no covariate model (that passes the goodness of fit test), but for testing purposes, we will use the model with the observer covariate. Use of the observer covariate model will retain between-point variability in effective area; useful for testing purposes.

Plot of detection function

plot(woo.o, showpoints=FALSE, main="Sage thrasher\nDetection with observer covariate",
     pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs1"), col="red", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs2"), col="green", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs3"), col="dark green", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs4"), col="blue", lwd=2, pdf=pointtransect) 
add_df_covar_line(woo.o, data.frame(obs="obs5"), col="purple", lwd=2, pdf=pointtransect)
add_df_covar_line(woo.o, data.frame(obs="obs6"), col="coral", lwd=2, pdf=pointtransect)
legend("topright", title="Observer", legend=1:6,
       lwd=2, lty=2, col=c("red", "green", "dark green", "blue", "purple", "coral"))

Prepare for GLM computation

The same function as above effAreafn is used to compute effective area for point transects. There is no need to duplicate the function definition here. The function has an argument pointflag used to indicate whether point transect sampling was used. If so, the correct effective area calculations are performed.

sitesadj <- effAreafn(woo.o, newthrasher, 10000, dettrunc, pointflag = pointtransect)

Estimate relationship of density and covariate

As for the line transect example, fit a GLM to the observed counts, using as an offset the estimated effective area. By default, specifying family as poisson assumes a log link function.

univarpredictor <- all.vars(myglmmodel)[2]
glmmodel <- glm(formula=myglmmodel, family="poisson", data=sitesadj)
modelsum <- summary(glmmodel)
tablecaption <- paste("GLM coefficients from counts as function of", 
                      univarpredictor, "with log(effective area) offset.")
kable(modelsum$coef, digits=4, caption=tablecaption)
Table 4: GLM coefficients from counts as function of shrub with log(effective area) offset.
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3977 0.8794 -2.7266 0.0064
shrub 0.0818 0.0419 1.9538 0.0507

Visualise

Even though we have computed the effective area for each transect based upon the fitted detection function, we have not used that effective area to adjust the observed counts. The simple formula

\[\hat{D}_i = \frac{n_i}{EA_i}, i = 1, \ldots , n_{transects}\] where \(EA_i\) is the effective area for the \(i^{th}\) transect, describes this adjustment.

We plot the estimated density against the continuous univariate predictor.

sitesadj$density <- sitesadj$myCount / sitesadj$effArea 
plot(sitesadj[, univarpredictor], sitesadj$density, pch=20,
     xlab=univarpredictor, ylab="Density of threshers per ha")

The basic background plot onto which our bootstrap replicates will be superimposed.

predData <- data.frame(predictor=seq(min(thrasherSiteData[ , univarpredictor]),
                                 max(thrasherSiteData[, univarpredictor]), length.out=50)) 
lines(predData$predictor, exp(coef(glmmodel)[1] + (coef(glmmodel)[2]*predData$predictor)), lwd=2)
text(predData$predictor[1], max(sitesadj$density,na.rm=TRUE)*0.98, pos=4,
     "GLM coefficients (exponentiated)") 
text(predData$predictor[1], max(sitesadj$density,na.rm=TRUE)*0.88, pos=4,
     bquote(hat(beta[0]) == .(round(exp(coef(glmmodel)[1]), 3)))) 
text(predData$predictor[1], max(sitesadj$density,na.rm=TRUE)*0.78, pos=4, 
     bquote(hat(beta[1]) == .(round(exp(coef(glmmodel)[2]), 3))))

Incorporate uncertainty

We resample our transects with replacement to assess the uncertainty in our point estimates of the relationship between the habitat covariate and the response variable. Specify the number of bootstrap resamples required and allocate storage for the replicate estimates of the GLM parameters: intercept and slope.

intercept.est <- vector("numeric", length=nboot)
slope.est <- vector("numeric", length=nboot)

The only change in the code chunk below from the line transect analysis is the change of original site data frame thrasherSiteData.

for (theboot in 1:nboot) {
  newdetects <- data.frame() 
  bob <- sample(thrasherSiteData$siteID, replace=TRUE, size=length(unique(thrasherSiteData$siteID)))
  for (bootsite in 1:length(bob)) { 
    thissite <- bob[bootsite] 
    glob <- newthrasher[newthrasher$siteID==thissite, ]
    glob$siteID <- sprintf("rep%02d", bootsite)
    newdetects <- rbind(newdetects, glob)  
    }
  newdetects <- newdetects[order(newdetects$siteID), ]
# Refit the detection function model, using observer as a covariate, to the bootstrap replicate
  detfnmod <- ds(data=newdetects, truncation=dettrunc, formula=~obs, transect=surveytype, quiet=TRUE)
#  Compute effective area offset for each transect
  bootsitesadj <- effAreafn(detfnmod, newdetects, 10000, dettrunc, pointflag = pointtransect)
#   refit the GLM for this bootstrap replicate
  glmresult <- glm(formula= myglmmodel, family="poisson", data=bootsitesadj)
  intercept.est[theboot] <- coef(glmresult)[1]
  slope.est[theboot] <- coef(glmresult)[2] 
}

The only change to the chunk below is to change the figure caption in the chunk title to indicate the name of the bird species being examined.

for (i in 1:nboot) lines(predData$predictor, exp(intercept.est[i] + (slope.est[i]*predData$predictor)),
                         col=rgb(1,0,0, alpha=0.1), lwd=0.5)
bootLow <- vector("numeric", length=dim(predData)[1]) 
bootUpp <- vector("numeric", length=dim(predData)[1])
for (i in 1:dim(predData)[1]) { 
  y <- vector("numeric", length=nboot) 
  for (j in 1:nboot) { 
    y[j] <- exp(intercept.est[j] + (slope.est[j]*predData$predictor[i])) 
    } 
  bootLow[i] <- quantile(y, 0.025, na.rm=TRUE) 
  bootUpp[i] <- quantile(y, 0.975, na.rm=TRUE)
}
lines(predData$predictor, bootLow, lwd=2, lty=2) 
lines(predData$predictor, bootUpp, lwd=2, lty=2)
Relationship between univariate predictor and Sage Thrasher density as modelled by GLM.  Offset is estimated covered area. Confidence intervals incorporate uncertainty from imperfect detectability.

Figure 3: Relationship between univariate predictor and Sage Thrasher density as modelled by GLM. Offset is estimated covered area. Confidence intervals incorporate uncertainty from imperfect detectability.

Inference regarding the habitat covariate

To assess the effect of the transect-level environmental covariate upon bird density, we focus our attention on \(\hat{\beta_1}\). The greater in magnitude the slope of the covariate, the greater the effect of the covariate upon animal density. If the estimated slope (\(\hat{\beta_1}\)) is indistinguishable from zero, we infer the habitat covariate has no influence upon animal density. In the case of the Sage Thrasher, it appears to exhibit little response to the presence of shrub cover.

xlabel <- paste("Estimated slope of", univarpredictor, "and count relationship.")
hist.slope <- hist(slope.est, main="Sampling distribution of slope parameter",
                   xlab=xlabel) 
cibounds <- quantile(slope.est, probs = c(.025,.975), na.rm=TRUE)
abline(v=cibounds, lty=3) 
text(cibounds, max(hist.slope$counts), round(cibounds,3))
Sampling distribution of the parameter of interest for Sage Thrasher.

Figure 4: Sampling distribution of the parameter of interest for Sage Thrasher.

Summary

Inference regarding the relationship between animal density and habitat covariates is invariably made more difficult when detectability of animals is imperfect. Adjusting number of animals detected for this estimated probability of detection introduces uncertainty into investigation of the relationship. If this added uncertainty is disregarded, faulty inference (believing a relationship exists where there is none) can result.

In reality, it is more challenging to propagate the uncertainty via the bootstrap, than to calculate the point estimate of the relationship. Nevertheless, sound inference is achieved by adequately incorporating uncertainty in all accountable forms in our analysis.

This document contains code to perform this count model analysis for either line or point transects. What remains is to generalise this framework further, e.g. multiple predictor variables or more complex survey designs.

Acknowledgements

The heart of the analysis presented was based upon code and ideas presented by Jason Carlisle and Trent McDonald in their RDistance wiki.

Buckland, S. T., Russell, R. E., Dickson, B. G., Saab, V. A., Gorman, D. N., & Block, W. M. (2009). Analysing designed experiments in distance sampling. Journal of Agricultural, Biological, and Environmental Statistics, 14, 432–442. https://doi.org/10.1198/jabes.2009.08030
Carlisle, J., & Chalfoun, A. (2020). The abundance of Greater Sage-Grouse as a proxy for the abundance of sagebrush-associated songbirds in Wyoming, USA. Avian Conservation and Ecology, 15(2). https://doi.org/10.5751/ACE-01702-150216
Carlisle, J. D., Chalfoun, A. D., Smith, K. T., & Beck, J. L. (2018). Nontarget effects on songbirds from habitat manipulation for Greater Sage-Grouse: Implications for the umbrella species concept. The Condor, 120(2), 439–455. https://doi.org/10.1650/CONDOR-17-200.1
McDonald, T., Carlisle, J., & McDonald, A. (2019). Rdistance: Distance sampling analysis for density and abundance estimation.
Miller, D. L., Rexstad, E., Thomas, L., Marshall, L., & Laake, J. L. (2019). Distance sampling in R. Journal of Statistical Software, 89(1), 1–28. https://doi.org/10.18637/jss.v089.i01
Oedekoven, C. S., Buckland, S. T., Mackenzie, M. L., Evans, K. O., & Burger, L. W. (2013). Improving distance sampling: Accounting for covariates and non-independency between sampled sites. Journal of Applied Ecology, 50, 786–793. https://doi.org/10.1111/1365-2664.12065
Oedekoven, C. S., Buckland, S. T., Mackenzie, M. L., King, R., Evans, K. O., & Burger, L. W. (2014). Bayesian methods for hierarchical distance sampling models. Journal of Agricultural, Biological, and Environmental Statistics, 19, 219–239. https://doi.org/10.1007/s13253-014-0167-0

References