Multi-species single-season occupancy model: NAs and warnings

323 views

Skip to first unread message

Michelle Pretorius

unread,

Jun 5, 2020, 8:50:33 AM6/5/20

to nimble-users

Good afternoon all,

I am very new to nimble (previously used JAGS), but I am excited to explore this especially with all the code available from the Ponisio (2020) paper in Ecology & Evolution! I would like to run multi-species single-season occupancy models to explore occupancy and species richness. To sink my teeth in, I have been trying to run the "multiSpp-singleSea" model from Ponisio (2020) and adding in parameters to estimate species richness (Zipkin et al. (2010)). However, I am receiving messages such as:

NAs were detected in model variables: cato.occ.mean, logProb_cato.occ.mean, sigma.ucato, logProb_sigma.ucato ...

and

Infinite values were detected in model variable: logProb_Z.

I am not sure how to address these issues... I have posted my full code below, using the same data files as Ponisio (2020).

Any guidance/assistance would be greatly appreciated!

Many thanks

Michelle Pretorius

########
#DATA PREP
########

survey.data <- read.csv("occupancy_data.csv")
survey.dates <- read.csv("survey_dates.csv")
species.groups <- read.csv("species_groups.csv")
habitat <- read.csv("habitat.csv")


#The detection/non-detection data is reshaped into a three dimensional array X where the first dimension, j, is the point; the second dimension, k, is the rep; and the last dimension, i, is the species. 
survey.data$occupancy <- rep(1, dim(survey.data)[1])

junk.melt=melt(survey.data,id.var=c("species", "point", "repetition"), measure.var="occupancy")
X=acast(junk.melt, point ~ repetition ~ species)
X

## Change abundance to presence/absence
## Set counts 1 and greater to indicators
X[which(X > 0)] <- 1

## Create all zero encounter histories to add to the detection array X as part of the data augmentation to account for additional species (beyond the n observed species).
X.zero = matrix(0, nrow=dim(X)[1], ncol=dim(X)[2])

for(point in 1:length(unique(survey.data$point))){
  point.index <- which(survey.dates$point == row.names(X)[point])
  missing <- is.na(survey.dates[point.index,][,-(1:1)])
  X[point, missing,] <- NA
  X.zero[point, missing] <- NA
}

n.zeroes = 50
X.dim <- dim(X)
X.dim[3] <- X.dim[3] + n.zeroes
X.aug <- array( NA, dim = X.dim)

## fill in the array with the occurrence data
X.aug[,,1:dim(X)[3]] <-  X

## fill the zero histories
X.aug[,,-(1:dim(X)[3])] <- rep(X.zero, n.zeroes)

siteLevelStandardized <- function(parameter){
  m <- mean(parameter, na.rm = TRUE)
  sd <- sd(parameter, na.rm = TRUE)
  linear <- (parameter - m)/sd
  quadratic <- linear * linear
  return(list(linear=linear,quadratic=quadratic))
}

surveyLevelStandardized <- function(parameter){
  parameter <- as.matrix(parameter)
  m <- mean(parameter, na.rm = TRUE)
  sd <- sd(parameter, na.rm = TRUE)
  linear <- (parameter - m) /  sd
  quadratic <- linear * linear
  linear <- as.matrix(linear)
  quadratic <- as.matrix(quadratic)
  return(list(linear=linear,quadratic=quadratic))
}

num.species <- length(unique(survey.data$species))
num.points <- length(unique(survey.data$point))
num.reps <- apply(survey.dates[,-(1:1)],
                  1,
                  function(x) length(which(!is.na(x))))

## Create an indicator vector for each assemblage (ground, mid-story)
ground <- mid <- rep(0, dim(species.groups)[1])
ground[which(species.groups$group == 1)] <- 1
mid[which(species.groups$group == 2)] <- 1

## Create a vector to indicate which habitat type each point is in (CATO = 1; FCW  = 0)
habitat.ind <- rep(0, num.points)
habitat.ind[grep("CAT", row.names(X))] <- 1

## Standardize variables
ufc <-  siteLevelStandardized(habitat$ufc)
ufc.linear <- ufc$linear
ufc.quadratic <- ufc$quadratic

ba <- siteLevelStandardized(habitat$ba)
ba.linear <- ba$linear
ba.quadratic <- ba$quadratic

date <- surveyLevelStandardized(survey.dates[,-(1:1)])
date.linear <- date$linear
date.quadratic <- date$quadratic

## Z data for whether or not a species was ever observed
## zs with 1s as 1s and 0s as NAs
zs <- apply(X.aug, c(1, 3), max, na.rm=TRUE)
zs[zs == 0] <- NA
zs[!is.finite(zs)] <- NA

model.data <- list(Z = zs,
                   X = X.aug,
                   ground = ground,
                   mid = mid,
                   habitat.ind = habitat.ind,
                   ufc.linear = ufc.linear,
                   ufc.quadratic = ufc.quadratic,
                   ba.linear = ba.linear,
                   ba.quadratic = ba.quadratic,
                   date.linear = date.linear,
                   date.quadratic = date.quadratic)

psi.mean.draw <- runif(1, 0.25, 1)

## initial values
omega.draw <- runif(1, num.species/(num.species + n.zeroes), 1)

## inital conditions. 1 should be NA, NA should be a 0 or 1
zinits <- zs
zinits[zinits == 1] <- 2
## zinits[is.na(zinits)] <- 1
zinits[is.na(zinits)] <- sample(0:1, sum(is.na(zinits)),
                                replace=TRUE)
zinits[zinits == 2] <- NA

inits <-list(Z=zinits,
             omega = omega.draw,
             w = c(rep(1, num.species),
                   rbinom(n.zeroes, size = 1, prob = omega.draw)),
             u.cato = rnorm(num.species + n.zeroes),
             v.cato = rnorm(num.species + n.zeroes),
             u.fcw = rnorm(num.species + n.zeroes) ,
             v.fcw = rnorm(num.species + n.zeroes),
             a1 = rnorm(num.species + n.zeroes),
             a2 = rnorm(num.species + n.zeroes),
             a3 = rnorm(num.species + n.zeroes),
             a4 = rnorm(num.species + n.zeroes),
             b1 = rnorm(num.species + n.zeroes),
             b2 = rnorm(num.species + n.zeroes))

constants <- list(num.species = num.species,
                  num.points = num.points,
                  num.reps = num.reps,
                  n.zeroes = n.zeroes)

########
#BUGS CODE WITH NIMBLE
########

ms.ss.occ <- nimbleCode({
  ## Define prior distributions for community-level model parameters
  
  omega ~ dunif(0,1)
  
  cato.occ.mean ~ dunif(0,1)
  mu.ucato <- log(cato.occ.mean) - log(1-cato.occ.mean)
  sigma.ucato ~ dunif(0, 100)
  tau.ucato <-  1/(sigma.ucato*sigma.ucato)
  
  fcw.occ.mean ~ dunif(0,1)
  mu.ufcw <- log(fcw.occ.mean) - log(1-fcw.occ.mean)
  sigma.ufcw ~ dunif(0, 100)
  tau.ufcw <-  1/(sigma.ufcw*sigma.ufcw)
  
  cato.det.mean ~ dunif(0,1)
  mu.vcato <- log(cato.det.mean) - log(1-cato.det.mean)
  sigma.vcato ~ dunif(0, 100)
  tau.vcato <-  1/(sigma.vcato*sigma.vcato)
  
  fcw.det.mean ~ dunif(0,1)
  mu.vfcw <- log(fcw.det.mean) - log(1-fcw.det.mean)
  sigma.vfcw ~ dunif(0, 100)
  tau.vfcw <-  1/(sigma.vfcw*sigma.vfcw)
  
  ## random effects
  mu.a1 ~ dnorm(0, 0.001)
  sigma.a1 ~ dunif(0, 100)
  tau.a1 <-  1/(sigma.a1*sigma.a1)
  
  mu.a2 ~ dnorm(0, 0.001)
  sigma.a2 ~ dunif(0, 100)
  tau.a2 <-  1/(sigma.a2*sigma.a2)
  
  mu.a3 ~ dnorm(0, 0.001)
  sigma.a3 ~ dunif(0, 100)
  tau.a3 <-  1/(sigma.a3*sigma.a3)
  
  mu.a4 ~ dnorm(0, 0.001)
  sigma.a4 ~ dunif(0, 100)
  tau.a4 <-  1/(sigma.a4*sigma.a4)
  
  mu.b1 ~ dnorm(0, 0.001)
  sigma.b1 ~ dunif(0, 100)
  tau.b1 <-  1/(sigma.b1*sigma.b1)
  
  mu.b2 ~ dnorm(0, 0.001)
  sigma.b2 ~ dunif(0, 100)
  tau.b2 <-  1/(sigma.b2*sigma.b2)
  
  for (i in 1:(num.species + n.zeroes)) {
    ## Create priors for species i from the community level prior
    ## distributions
    w[i] ~ dbern(omega)
    
    u.cato[i] ~ dnorm(mu.ucato, tau.ucato)
    u.fcw[i] ~ dnorm(mu.ufcw, tau.ufcw)
    a1[i] ~ dnorm(mu.a1, tau.a1)
    a2[i] ~ dnorm(mu.a2, tau.a2)
    a3[i] ~ dnorm(mu.a3, tau.a3)
    a4[i] ~ dnorm(mu.a4, tau.a4)
    
    v.cato[i] ~ dnorm(mu.vcato, tau.vcato)
    v.fcw[i] ~ dnorm(mu.vfcw, tau.vfcw)
    b1[i] ~ dnorm(mu.b1, tau.b1)
    b2[i] ~ dnorm(mu.b2, tau.b2)
    
    ## Create a loop to estimate the Z matrix (true occurrence for species i at point j).
    for (j in 1:num.points) {
      logit(psi[j,i]) <- u.cato[i]*(1-habitat.ind[j]) +
        u.fcw[i]*habitat.ind[j] +
        a1[i]*ufc.linear[j] +
        a2[i]*ufc.quadratic[j] +
        a3[i]*ba.linear[j] +
        a4[i]*ba.quadratic[j]
      
      mu.psi[j,i] <- psi[j,i]*w[i]
      Z[j,i] ~ dbern(mu.psi[j,i])
      
      ## Create a loop to estimate detection for species i at point k during sampling period k.
      for (k in 1:num.reps[j]) {
        logit(p[j,k,i]) <-  v.cato[i]*(1-habitat.ind[j]) +
          v.fcw[i]*habitat.ind[j] +
          b1[i]*date.linear[j,k] +
          b2[i]*date.quadratic[j,k]
        
        mu.p[j,k,i] <- p[j,k,i]*Z[j,i]
        X[j,k,i] ~ dbern(mu.p[j,k,i])
      }
    }
  }


  #Sum all species observed (n) and unobserved species (n0) to find the total estimated richness
  n0 <- sum(w[(num.species+1):(num.species+n.zeroes)])
  N <- num.species + n0
  
  #Create a loop to determine point level richness estimates for the whole community and for subsets or assemblages of interest.
  for(j in 1:num.points){
    Nsite[j]<- inprod(Z[j,1:(num.species+n.zeroes)],w[1:(num.species+n.zeroes)])
    Nmid[j]<- inprod(Z[j,1:num.species],mid[1:num.species])
    Nground[j]<- inprod(Z[j,1:num.species],ground[1:num.species])
  }
})

########
#NIMBLE MODEL
########

Rmodel <- nimbleModel(code=ms.ss.occ, 
                      data=model.data, 
                      inits = inits, 
                      constants=constants)

MCMCconfiguration <- configureMCMC(Rmodel)

nodesToMonitor <- Rmodel$getNodeNames(stochOnly = TRUE, includeData = FALSE)
monitor1 = c('u.cato', 'u.fcw', 'v.cato', 'v.fcw', 'omega', 'a1', 
               'a2', 'a3', 'a4', 'b1', 'b2', 'Nsite', 'N', 'Nground', 'Nmid') 

monitor = c(nodesToMonitor, monitor1)

mcmc.out <- nimbleMCMC(code=ms.ss.occ, 
                       data=model.data, 
                       inits = inits, 
                       constants=constants,
                       monitors = monitor,
                       niter = 30000, nchains = 3, nburnin = 20000, thin=10,
                       summary = TRUE, WAIC = TRUE)

OUTPUT:

defining model...
building model...
setting data and initial values...
running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 
checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
checking model calculations...
NAs were detected in model variables: cato.occ.mean, logProb_cato.occ.mean, sigma.ucato, logProb_sigma.ucato, fcw.occ.mean, logProb_fcw.occ.mean, sigma.ufcw, logProb_sigma.ufcw, cato.det.mean, logProb_cato.det.mean, sigma.vcato, logProb_sigma.vcato, fcw.det.mean, logProb_fcw.det.mean, sigma.vfcw, logProb_sigma.vfcw, mu.a1, logProb_mu.a1, sigma.a1, logProb_sigma.a1, mu.a2, logProb_mu.a2, sigma.a2, logProb_sigma.a2, mu.a3, logProb_mu.a3, sigma.a3, logProb_sigma.a3, mu.a4, logProb_mu.a4, sigma.a4, logProb_sigma.a4, mu.b1, logProb_mu.b1, sigma.b1, logProb_sigma.b1, mu.b2, logProb_mu.b2, sigma.b2, logProb_sigma.b2, mu.ucato, tau.ucato, mu.ufcw, tau.ufcw, mu.vcato, tau.vcato, mu.vfcw, tau.vfcw, tau.a1, tau.a2, tau.a3, tau.a4, tau.b1, tau.b2, lifted_d1_over_sqrt_oPtau_dot_ucato_cP, lifted_d1_over_sqrt_oPtau_dot_ufcw_cP, lifted_d1_over_sqrt_oPtau_dot_vcato_cP, lifted_d1_over_sqrt_oPtau_dot_vfcw_cP, lifted_d1_over_sqrt_oPtau_dot_a1_cP, lifted_d1_over_sqrt_oPtau_dot_a2_cP, lifted_d1_over_sqrt_oPtau_dot_a3_cP, lifted_d1_over_sqrt_oPtau_dot_a4_cP, lifted_d1_over_sqrt_oPtau_dot_b1_cP, lifted_d1_over_sqrt_oPtau_dot_b2_cP, logProb_u.cato, logProb_u.fcw, logProb_v.cato, logProb_v.fcw, logProb_a1, logProb_a2, logProb_a3, logProb_a4, logProb_b1, logProb_b2.
Infinite values were detected in model variable: logProb_Z.
model building finished.
Monitored nodes are valid for WAIC
compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
compilation finished.
runMCMC's handling of nburnin changed in nimble version 0.6-11. Previously, nburnin samples were discarded *post-thinning*.  Now nburnin samples are discarded *pre-thinning*.  The number of samples returned will be floor((niter-nburnin)/thin).
running chain 1...
warning: problem initializing stochastic node Z[8, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[9, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[11, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[12, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[13, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[14, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[18, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[19, 59]: logProb is -Inf.
warning: problem initializing stochastic node Z[20, 59]: logProb is -Inf.

.... etc ....

|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|

NULL

Perry de Valpine

unread,

Jun 5, 2020, 11:29:40 AM6/5/20

to Michelle Pretorius, nimble-users

Hi Michelle,

Glad to see this.

Those are only warnings, and it looks like the MCMC ran. If you got sensible results, then the initialization routine was able to fill in values where needed and your results may be fine. However, typically the issue you're seeing arises from incomplete initialization. The easiest solution is to make sure your inits are thorough and all values are valid. In a model like this, the kinds of cases that could cause problems would be if an initial occupancy probability or detection probability is numerically exactly 0 or 1 (which could happen if a logit(psi[...]) or a logit(p[...]) is very large in magnitude (+ive or -ive) and that makes a probability exactly 0, which will appear as a log probability of -Inf.

A good way to debug issues is to build the model using

model <- nimbleModel(code, data, constants, inits)

with the same code, data, constants and inits you are providing to nimbleMCMC. nimbleModel will run calculate() on the whole model and issue the warnings you saw. You can then start to inspect the model:

model$Z[8, 59]

model$getLogProb("Z[8, 59]")

model$mu.psi[8, 59] # backtracking on Z[8, 59]

model$psi[8, 59] # etc.

etc to see where some calculation has led to some invalid value or some initial value is missing.

If you're still stuck after trying this kind of digging around, we'd have to have your data to investigate further.

-Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/c4969770-4616-4163-9f32-718cbce801bfo%40googlegroups.com.

Michelle Pretorius

unread,

Jun 8, 2020, 5:17:42 AM6/8/20

to nimble-users

Hi Perry,

Thank you so much for getting back to me so quickly!

Working backwards, I was able to see where the issue was.

When estimating the Z matrix (true occurrence for species i at point j), I included the parameter w[i] when calculating mu.psi[j,i] (see below) to coincide with Zipkin (2010).

for (j in 1:num.points) {
      logit(psi[j,i]) <- u.cato[i]*(1-habitat.ind[j]) +
        u.fcw[i]*habitat.ind[j] +
        a1[i]*ufc.linear[j] +
        a2[i]*ufc.quadratic[j] +
        a3[i]*ba.linear[j] +
        a4[i]*ba.quadratic[j]


      
      mu.psi[j,i] <- psi[j,i]*w[i] #this is where the problem arises 
      Z[j,i] ~ dbern(mu.psi[j,i])

However, the initial values of w include 0's, resulting in an occupancy probability of exactly 0 (as you predicted; see below). Removing w[i] from the mu.psi[j,i] calculation (as done in the Ponisio (2020) multiSpp-singleSea code) produces valid results.

omega.draw <- runif(1, num.species/(num.species + n.zeroes), 1)



 w = c(rep(1, num.species),


                   rbinom(n.zeroes, size = 1, prob = omega.draw))

> w
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [45] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
 [89] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1

Thank you for your help!

- Michelle

Perry de Valpine

unread,

Jun 8, 2020, 10:48:25 AM6/8/20

to Michelle Pretorius, nimble-users

Great. Thanks for the follow-up. These kinds of examples can be helpful for others.

-Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/23cc00d9-1419-4aab-bcc1-161fe96fd09do%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages