Weird Pattern in Residuals

547 views
Skip to first unread message

Katherine Berthon

unread,
Oct 12, 2020, 6:33:32 AM10/12/20
to unmarked
Hi all,

I am new to occupancy models and have been running single season occupancy for 10 functional groups (so not species, but limited species within each group e.g. Large Bees (mostly Bombus sp.), Medium bees (Apis melifera and Andrenidae), small bees (Hylaeus and Halictids), Syrphidae, other flies etc) at 102 sites, across 3 sampling periods. I have spent some time assessing collinearity, AIC and biological judgement to generate a set of best models for each of  I am modelling each group as a separate single season model, and my original global model has 8 detection covariates and 8 occupancy (site-level) covariates. 

All of my best models pass the mb.gof.test from AICcmodAvg, but there is a persistent pattern when I plot residuals vs fit. Below is a plot example for medium bees.

mod <- occu(formula = ~Cloudcover + logFlAbund + Placement + Round ~ GreenBuffer, data = HB_umf)
mb.gof.test(mod,nsim=1000,parallel=FALSE) 

# Chi-square statistic = 8.3601 
# Number of bootstrap samples = 1000
# P-value = 0.154
# Quantiles of bootstrapped statistics:
#     0%    25%    50%    75%   100% 
#   0.37   2.69   4.38   6.73 154.28 
# Estimate of c-hat = 1.33 
# passes 

plot(residuals(mod)~fitted(mod))


I have looked for spatial and temporal autocorrelation, and there is none. I have looked at models with individual variables to see if it was being caused by one particular variable and there are multiple possible culprits. That is, I tested occu(~v~1) or occu(~1~v) to look for similar patterns and found the pattern seems to pop up for any influential variable with P(>|z|)<0.05 (can post plots of those if of interest)
 
I have also been exploring dunn-smyth residuals as in Warton et al 2017 (https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12761) but since the code was originally written for RPresence, I have assumed (perhaps incorrectly?) that detection residuals would be the observed detection value (0 or 1) minus the predicted detection probability; and the occupancy residuals would be the observed occupancy value (0 or 1) minus the predicted. I have then plotted as they have suggested, the qqplot for the residuals, residuals vs fitted values (the predictions of p and psi), and residuals vs covariates to help assess missing covariates. Below is an example following on from the above model, and given several occupancy covariates with Green Buffer (the worst looking plot) being the only variable in the model.



Sorry that this is long winded, but I didn't know where to begin explaining. In short, I have three questions:
1. have you ever seen such a pattern in the residuals and could hint to me a cause?
2. does this pattern completely invalidate my models despite passing the gof test (is my data unanalysable?)?
3. Am I calculating the Dunn-Smyth residuals properly? Am I interpreting correctly that the pattern might indicate a missing covariate OR a polynomial term of some kind for Green Buffer? 
4. The residuals are worse with less detections, is there a minimum number of detections that you should have before you can build an accurate model? e.g. is 25 detections across 306 events (102 sites) too few to model?

Thanks in advance for your help,
Katherine

p.s. plots show centre-scaled values for the co-variates 

Katherine Berthon

unread,
Oct 12, 2020, 9:07:38 PM10/12/20
to unmarked

Hi all,

Seems my pictures didn't work, and that was kind of the most important point! The key plot of residuals vs fitted values, which shows two parallel lines of points in a negative linear relationship (that is residuals are more positive for lower values of fit, and lower for higher values of fit):


Thanks,
Katherine

ken.k...@gmail.com

unread,
Oct 12, 2020, 9:33:44 PM10/12/20
to unmarked
Hi Katherine,

My quick answer to (1) and (2)  related to the residuals ~ fitted plot:

Yes, this is pretty much the expected plot. Unmarked calculates the residual for observation [i] as

y[i] - fitted[i]

Since the only two possible values of y are 1 and 0, you basically plot two lines: fitted on the x-axis vs. (1-fitted) on the y-axis, and fitted on the x-axis vs. (0-fitted) on the y-axis. That's why you get those two perfectly parallel lines. If you run the first example model in the ?occu help and generate the same plot, you'll see the same pattern. My personal opinion is that this type of plot is not very useful, and should not be a reason to discard a model particularly if the GOF test looks OK.

I don't have enough experience to offer any thoughts on Dunn-Smyth residuals but I can say that I have applied the approach of Wright et al. 2019 (https://doi.org/10.1002/ecy.2703) , which also separates detection and occupancy residuals, to simulated and real occupancy datasets and been pleased with the results so far. So that might be another place to look.

Ken

John C

unread,
Oct 12, 2020, 9:54:51 PM10/12/20
to unmarked
Hi Katherine and Ken, 

Just wanted to note that D-S residuals use a CDF transformation, so they would look a little different than unmarked::residuals (i.e., not [two] discrete bands).  I *think* I'd modified the script produced by Warton et al. here(https://groups.google.com/g/unmarked/c/hBBpnRlMp0A) to work with unmarked (the change is almost entirely related to using unmarked::predict vs. rpresence::fitted or whatever), although might be worth a closer look. Totally agree with Ken that the fitted binary stuff is not really particularly useful and that Wright's approach is useful.

John

Katie B

unread,
Oct 13, 2020, 6:09:11 PM10/13/20
to unma...@googlegroups.com
Hi Ken and John,

Thanks for your quick reply! I have been stuck on this 'pattern' for so long, it is encouraging to hear that the plot is as expected. You have also given me some good leads to go on, thank you!

I had a bit of a play with the Dunn-Smyth residuals in Warton et al. 2017 based on expecting the detection residuals to be: yi - predict("det"), and for the occupancy residuals to be Y(site) - predict("state") - does that sound right? 

I obtained the same pattern in the residuals, but I will look closer at your code, John, because it is likely that I am making some poor assumptions. Thanks Ken for suggesting the Wright paper, too.

Best wishes,
Katherine

--
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/aNp6s09kMUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/c4ad1236-747f-4a04-9900-f1f8ba3b5d40n%40googlegroups.com.

Pablo García Díaz

unread,
Oct 14, 2020, 4:13:36 AM10/14/20
to unma...@googlegroups.com

Hi Katherine,

 I would suggest having a look at R package DHARMa (https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html) – I found it quite useful, although I am not sure whether it will work with unmarked objects.

 Cheers

 Pablo


You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/CAFAe7-LsfPi_Gw0Yk-zCeUMAiw%2B9eOxSX-noxYOo8wcv3tRoEg%40mail.gmail.com.

John C

unread,
Oct 14, 2020, 1:20:54 PM10/14/20
to unmarked
Hi Katherine,
 
I had a bit of a play with the Dunn-Smyth residuals in Warton et al. 2017 based on expecting the detection residuals to be: yi - predict("det"), and for the occupancy residuals to be Y(site) - predict("state") - does that sound right? 

 I think this is not quite correct.  There's a bunch going on to construct the residuals in the function they provide (below) involved with jittering, applying a normal cdf, and . It looks like google groups thinks something like object *at* data is an email address, so not quite sure this will look as intended.

Cheers,

John

residuals.occModum<-function(object,is.detect.constant=FALSE)
{
  det.data<-object@data@y;     # Binary matrix of detections, sites in rows, visits in columns.
  nSites<-dim(det.data)[1];           # Number of sites.
  psi<-predict(object, 'state')[, 1];           # Occupancy probabilities, assuming a vector.
  
  ## Get number of visits (ni) and probability of non-detection (prob0).
  
  ni<-apply(is.na(det.data)==FALSE,1,sum);    # Number of site visits, a vector across sites.
  xi<-apply(det.data,1,sum,na.rm=TRUE);       # Number of detections.
  xiOcc<-pmin(xi,1);                          # Make it binary to study occupancy "only".
  
  pi<-matrix(predict(object, 'det')[, 1],nrow=nSites);  # Detection probabilities.
  
  ##### need to include some code to compute is.detect.constant from pi #####
  
  ## Below assumes equal probabilities of detection for each sampling time!!!
  
  if(is.detect.constant==TRUE)
  {
    pi<-predict(object, 'det')[1:nsites, 1];   # Detection probabilities, assuming equal across sites.
    prob0<-pbinom(0,ni,pi);            # Probability of no detections when present, a site vector.
    
    ## Get cdf's for detection residuals - as a function of sum of detection events
    
    xi[xi==0]<-NA;                               # Ignore sites with no detections here.
    pdet<-(pbinom(xi,ni,pi)-prob0)/(1-prob0);   # CDF for number of detections xi, positive binomial.
    pdetMinus<-(pbinom(xi-1,ni,pi)-prob0)/(1-prob0); # Previous value of the cdf of xi.
  }
  
  if(is.detect.constant==FALSE)
  {
    prob0<-apply(1-pi,1,prod);    # Probability of no detections when present, a site vector.
    
    ## Define a function to get the pdf under unequal detections.
    
    hetpdf<-function(xiSite,niSite,piSite)
    {
      ind<-combn(niSite,xiSite);
      piMat<-matrix(piSite[ind],nrow=xiSite);
      
      return(sum(apply(piMat/(1-piMat),2,prod))*prod(1-piSite));
    }
    
    hetcdf<-function(xiSite,niSite,piSite)
    {
      if(xiSite==0){cdf<-0;}
      else
      {
        detiSite<-rep(NA,xiSite);
        
        for(iX in 1:xiSite)
        {
          detiSite[iX]<-hetpdf(iX,niSite,piSite);
        }
        
        cdf<-sum(detiSite);       
      }
      
      return(cdf);
    }
    
    ## Get cdf's for detection residuals - as a function of sum of detection events.
    
    isDetected<-xi>0;
    xi[isDetected==FALSE]<-NA;  # Ignores sites with no detections.
    
    pdet=pdetMinus=rep(NA,nSites);
    
    for(iSite in which(isDetected))
    {
      xiSite<-xi[iSite];
      niSite<-ni[iSite];
      piSite<-pi[iSite,];
      pdetMinus[iSite]<-hetcdf(xiSite-1,niSite,piSite);
      pdet[iSite]<-pdetMinus[iSite]+hetpdf(xiSite,niSite,piSite);
    }
    
    pdet<-pdet/(1-prob0);   # 'CDF' for number of detections xi in heterogeneous case.
    pdetMinus<-pdetMinus/(1-prob0); # Previous value of the cdf of xi.
  }
  
  ## Get cdf's for occupancy residuals - as a function of binary detected/not.
  
  probOcc<-psi*(1-prob0);                     # Probability of occupancy.
  pOcc<-1-probOcc+xiOcc*probOcc;              # CDF for occupancy, Bernoulli variable with param probOcc.
  pOccMinus<-xiOcc*(1-probOcc);               # Previous value of the cdf of occupancy.
  
  ## Jitter and get occupancy residuals.
  
  uOcc<-runif(nSites);                             # Standard uniform value to "jitter" the cdf.
  residOcc<-qnorm(pOcc*uOcc+pOccMinus*(1-uOcc));   # Dunn-Smyth residual, standard normal if cdf correct.
  
  ## Jitter and get detection residuals.
  
  u<-runif(nSites);                             # Standard uniform value to "jitter" the cdf.
  residDet<-qnorm(pdet*u+pdetMinus*(1-u));      # Dunn-Smyth residual, standard normal if cdf correct.
  
  residuals<-list(occ=residOcc,det=residDet);  
  
  return(residuals) # Return output (i.e., a list with occupancy residuals (occ) and detection residuals (det)).
}

Katie B

unread,
Oct 14, 2020, 10:53:57 PM10/14/20
to unma...@googlegroups.com
Hi John,

Many thanks for that code and for your time in answering my questions! The code is working well for calculating occupancy residuals, but I am having some trouble working out the detection residuals. I have annotated my queries in red below, but here is my general issue: The function works to calculate detection residuals only for sites that have at least one detection - the rest are NAs - this means that the plotting of residuals vs predicted has unequal lengths - I am not sure why we exclude sites with no detections from this part of the function (annotated below)? Can I still calculate detection residuals for sites with no detections?

Also, what is a pdf as opposed to a cdf? or are they just the same kind of distribution function,but named after what they are distributing?
  
Thanks,
Katherine

On Thu, Oct 15, 2020 at 4:22 AM 'John C' via unmarked <unma...@googlegroups.com> wrote:
Hi Katherine,
 
I had a bit of a play with the Dunn-Smyth residuals in Warton et al. 2017 based on expecting the detection residuals to be: yi - predict("det"), and for the occupancy residuals to be Y(site) - predict("state") - does that sound right? 

 I think this is not quite correct.  There's a bunch going on to construct the residuals in the function they provide (below) involved with jittering, applying a normal cdf, and . It looks like google groups thinks something like object *at* data is an email address, so not quite sure this will look as intended.

Cheers,

John

residuals.occModum<-function(object,is.detect.constant=FALSE)
{
  det.data<-object@data@y;     # Binary matrix of detections, sites in rows, visits in columns.
  nSites<-dim(det.data)[1];           # Number of sites.
  psi<-predict(object, 'state')[, 1];           # Occupancy probabilities, assuming a vector.
  
  ## Get number of visits (ni) and probability of non-detection (prob0).
  
  ni<-apply(is.na(det.data)==FALSE,1,sum);    # Number of site visits, a vector across sites.
  xi<-apply(det.data,1,sum,na.rm=TRUE);       # Number of detections.
  xiOcc<-pmin(xi,1);                          # Make it binary to study occupancy "only".
  
  pi<-matrix(predict(object, 'det')[, 1],nrow=nSites,byrow=T);  # Detection probabilities.
  #Without the byrow argument, the matrix of pi gives me the residuals in the incorrect order. I have only one NA so I know its position and can deduce that the original code gives the structure S1R1,S1R2,S1R3 as the first three values of the first column. 
 
      pdetMinus[iSite]<-hetcdf(xiSite-1,niSite,piSite); #why is it minus 1? The cdf function has a clause for no detections so why do we exclude them here?
      pdet[iSite]<-pdetMinus[iSite]+hetpdf(xiSite,niSite,piSite);
    }
    
    pdet<-pdet/(1-prob0);   # 'CDF' for number of detections xi in heterogeneous case. # calculating pdet in this way results in NAs for any sites with no detections making lengths unequal when trying to plot vs fitted values unless I can retrospectively return NAs to some other value. 
    pdetMinus<-pdetMinus/(1-prob0); # Previous value of the cdf of xi.
  }
  
  ## Get cdf's for occupancy residuals - as a function of binary detected/not.
  
  probOcc<-psi*(1-prob0);                     # Probability of occupancy.
  pOcc<-1-probOcc+xiOcc*probOcc;              # CDF for occupancy, Bernoulli variable with param probOcc.
  pOccMinus<-xiOcc*(1-probOcc);               # Previous value of the cdf of occupancy.
  
  ## Jitter and get occupancy residuals.
  
  uOcc<-runif(nSites);                             # Standard uniform value to "jitter" the cdf.
  residOcc<-qnorm(pOcc*uOcc+pOccMinus*(1-uOcc));   # Dunn-Smyth residual, standard normal if cdf correct.
  
  ## Jitter and get detection residuals.
  
  u<-runif(nSites);                             # Standard uniform value to "jitter" the cdf.
  residDet<-qnorm(pdet*u+pdetMinus*(1-u));      # Dunn-Smyth residual, standard normal if cdf correct.
  
  residuals<-list(occ=residOcc,det=residDet);  
  
  return(residuals) # Return output (i.e., a list with occupancy residuals (occ) and detection residuals (det)).
}
 

--
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/aNp6s09kMUI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.

John C

unread,
Oct 15, 2020, 12:41:50 PM10/15/20
to unmarked
Hi Katherine, re. the NA issue, that's a pretty straightforward fix and easier to do after the fact when plotting (just remove the NA's from the residuals and the matching indices from the predicted part). Warton's code is below.  A PDF (PMF) is a probability density (mass) function, and a CDF is a cumulative distribution function. CDF_x(A) describes the probability that a variable x is less than or equal to A). The paper only touches on this briefly, but because detection/p(detection) is conditional on occupancy, it is easiest to construct the detection residuals using those sites known to be occupied where y_ij~Bern(p) rather than  y_ij~Bern(p) or  y_ij~Bern(0)

John

###assumes the residual function below is in the workspace
res1<-residuals.occModum(fm1, FALSE)

m1_res_occ<-res1$occ # Stores residuals for occupancy.
m1_res_det<-res1$det[-which(is.na(res1$det))] # Stores residuals for detection (and remove NA
#values).
## Then, use residuals in diagnostic plots, e.g. obtain a normal quantile plot to check distr
#ibutional assumptions:
qqnorm(m1_res_occ,main="",col="blue",pch=16,cex.axis=1.4,ylab="",xlab="") # A QQplot.
abline(0,1,col="black")
mtext(text="sample quantiles",side=2,line=2.2,cex=1.2)




DS.resid.plot<-function(x,y,ylim=c(-1,1)*max(abs(y)),alpha=0.05,k=5)
{
  plot(x,y,pch=16,cex=1.2,col="blue",cex.axis=1.4,ylim=ylim,cex.main=0.9,ylab="",xlab="");
  
  lsmod<-gam(y~s(x,k=k));
  lsmod.p<-predict(lsmod,se.fit=TRUE);
  z.crit<-qnorm(1-alpha/2)
  upr<-lsmod.p$fit+(z.crit*lsmod.p$se.fit);
  lwr<-lsmod.p$fit-(z.crit*lsmod.p$se.fit);
  polygon(c(rev(sort(x)),sort(x)),c(rev(upr[order(x)]),(lwr[order(x)])),col='grey80',border=NA);
  points(x,y,pch=16,cex=1.2,col="blue");
  lines(sort(x),fitted(lsmod)[order(x)],lwd=2);
  abline(h=0,col="black",lwd=1);
}


m1_x_occ<-predict(fm1, 'state')[,1] # Stores the fitted values for occupancy probabilities.
DS.resid.plot(m1_x_occ,m1_res_occ)
title(main="Occupancy Dunn???Smyth residuals",cex.main=2)
mtext(text="fitted occupancy values",side=1,las=1,line=2.6,cex=1.2)

###have to play around with this a bit if p is constant
m1_x_det<-apply(matrix(predict(fm1, 'det')[,1],ncol=nt),1,sum,na.rm=T)[is.na(res1$det)==FALSE] # Stores
#the sum of fitted values for detection probabilities.
DS.resid.plot(m1_x_det,m1_res_det)
title(main="Detection Dunn Smyth residuals",cex.main=2)
mtext(text=Sigma[j]~"fitted detection values",side=1,las=1,line=2.7,cex=1.2)




Katie B

unread,
Oct 15, 2020, 10:40:07 PM10/15/20
to unma...@googlegroups.com
Hi John,

Thanks for your kindness in answering my often naïve questions. I am happy to go on with only the site for which there is at least one detection, and already this has helped me ALOT, thank you! 

I have one last question:  I am wondering if, within those sites, it is possible to expand the residuals to include multiple events - particularly where covariates are categorical and cannot be easily averaged across detections?

Best wishes,
Katherine

John C

unread,
Oct 16, 2020, 2:49:58 PM10/16/20
to unmarked
Hi Katherine,

Perhaps grouping by site and Category x could be okay if there is sufficient replication. If the idea is, say, to plot the residuals vs some site and time varying detection cov (observer or something), the authors note that maybe replicating the residuals some number of times could work ok, or that constructing the DS residuals on binary data might work (although for reasons mentioned at the top, this seems less clear). I have no idea whether either of these approaches have been assessed, and I don't have any real advice regarding how to go about this. My sense is that if the residuals you have been able to assess don't suggest clear problems, then making an adjustment based on the MB estimate of (slight) overdispersion and carrying on is probably a reasonable way to proceed.

John

Katie B

unread,
Oct 25, 2020, 8:50:20 PM10/25/20
to unma...@googlegroups.com
Hi John,

Many thanks for your advice, I decided to replicate the residuals constructed at the site level and looked across levels of my categorical variables. I did not see any pattern, but I'm not sure I have actually the power to tell as there are likely multiple sites for which the categorical variable changes across sampling sessions e.g. Cloudcover. Do you think this is still reasonable?

Also, I am very thankful for the modified code you shared for constructing DS residuals - how would I go about acknowledging you in my final paper? I am thinking of a sentence like this in my methods:  Residual checks were performed using code modified from Warton et al. (2007) courtesy of John C.

Thanks,
Katherine

John C

unread,
Oct 26, 2020, 3:17:33 PM10/26/20
to unmarked
Hi Katherine,

Limited power to detect lack of fit is a common issue, and I don't know of anything more that can be done. Maybe somebody else here has better ideas? Otherwise, "Residual checks were performed using code modified from Warton et al. (2017)." is fine--no attribution needed.

John
Reply all
Reply to author
Forward
0 new messages