Leave-one-out validation/AUC for occu() model

400 views
Skip to first unread message

Danielle Rappaport

unread,
Dec 3, 2018, 9:22:39 PM12/3/18
to unmarked
Hello,

How have folks performed leave-one-out cross validation to assess the predictive power of their occupancy models? 

I have an independent observation that was not used to calibrate my occu() model, which I would like to compare to my predicted occupancy to assess how well the model generalizes. I've been sorting through the listserv and the literature, and gleaned from a previous post that I might not even be able to use traditional AUC for occupancy models since they acknowledge false negatives, and the naive absences are subjected to bias. Is there any way around that? Are there any other techniques/measures for estimating predictive power by generalizing to an outside observation? 

Many thanks in advance, 
Danielle 



Giancarlo Sadoti

unread,
Dec 4, 2018, 4:35:11 AM12/4/18
to unmarked
Hi Danielle,

You could calculate the leave-one-out cross-validated AUC of a model in a way folks have calculated residuals at the site or observation level. Each gives different AUC values, but in the same neighborhood.

I'll first paste a modified version of the occu helpfile to show how this works

data(frogs)
pferUMF <- unmarkedFrameOccu(pfer.bin)
plot(pferUMF, panels=4)
set.seed(10)

# add some fake covariates for illustration
siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF)))

# observation covariates are in site-major, observation-minor order
obsCovs(pferUMF) <- data.frame(obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF)))

# fitted model
fm <- occu(~ obsvar1 ~ sitevar1, pferUMF)

First, calculate your leave-one (site)-out predictions. This is a general approach using sites/observations within this data set, but you could do something similar with independent data, it would just require the same occupancy and detection covariates (formatted correctly) for the independent sample so you can generate predictions for psi (for each site) and p (for each survey).

# generate leave-one-out predictions (a vector for psi and a matrix for p)
# pardon the looping

nSites = nrow(pferUMF@y)
nSurvs = ncol(pferUMF@y)
predPsi = rep(NA,nSites)
predDet = matrix(nrow=nSites,ncol=nSurvs,NA)

for(i in 1:nSites){  cat('fitting site',i,'\n')
  trainData = pferUMF[-i,]
  testData = pferUMF[i,]
  fmTrain <- update(fm,data=trainData)
  predPsi[i] <- unlist(predict(fmTrain,'state',newdat=testData)[1])
  predDet[i,] <- unlist(predict(fmTrain,'det',newdat=testData)[1])
}


At the observation level, AUC is calculated by comparing your detection vector of 0s and 1s to the predicted psi*p value (this can be extracted from models using the handy fitted() function if not doing it via loocv).

# detection history and predictions in a data.frame
obsLevel = data.frame(dethistory = as.vector(fm@data@y),
                      psiP = as.vector(predPsi*predDet))

# remove missing surveys
obsLevel = na.omit(obsLevel)

# crummy observation-level AUC, as expected from random data
with(obsLevel,auc(roc(psiP,factor(dethistory))))


Calculating AUC at the site level gets a little more complicated; this requires calculating the (modeled) probability of detecting a species at least once on a given site (psi * (1 - (1-p[1])*(1-p[2])...).

# observed (naive) occupancy
naiveOcc = apply(fm@data@y,1,function(x) max(x,na.rm=T))

# probability of at least one detection.

#  This called "D-hat" in Moore and Swihart 2005,
# (I use the same terminology in Sadoti et al 2013 and 2017)
predictedOcc = predPsi*(1-apply(1-predDet,1,FUN=prod))

require(AUC)

# crummy site-level AUC, as expected from random data
auc(roc(predictedOcc,factor(naiveOcc)))


HTH,

Giancarlo

Danielle Rappaport

unread,
Dec 4, 2018, 6:24:16 PM12/4/18
to unma...@googlegroups.com
Giancarlo, 
Terrific. Many thanks for your thorough explanation. I quickly reviewed your code and everything seems straightforward but I may loop back with some additional questions over the next day or two once I apply it to my unwieldy dataset. 
Thanks again!
Danielle

--
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/wqpIPuz0Tao/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages