Speeding up data augmentation in BUGS

535 views
Skip to first unread message

Richard Chandler

unread,
May 15, 2018, 4:09:32 PM5/15/18
to hmec...@googlegroups.com, spatialcapture.
Hi everyone,

Here is a trick for speeding up data augmentation in the context of (S)CR models. Instead of augmenting the capture histories with zeros, you can break the problem up into two components: one for the detected guys (standard observation model) and one for the undetected (augmented) guys. For the latter, you can compute the probability of being detected at least once and then do:

zeros[i] ~ dbern(PrAtLeastOneDet[i])  # for i in (n0+1):M

Or for a Poisson observation model, you can do 

zeros[i] ~ dpois(ExpectedDets[i])


In either case, 'zeros' is provided as data and you are just providing one zero for each augmented guy. For the attached SCR example, this trick speeds things up by a factor of about 2.5. The speed improvement increases with M.

The script also demonstrates how to run multiple chains in parallel using the 'rjags' package. This can be done in batches to allow you to inspect convergence as you go.

​Richard


--
Richard Chandler
Assistant Professor
Warnell School of Forestry and Natural Resources
University of Georgia
fasterSCR.R
scr0.jag
scr0-faster.jag

Peter Solymos

unread,
May 15, 2018, 5:42:06 PM5/15/18
to Richard Chandler, hmecology: Hierarchical Modeling in Ecology, spatialcapture.
Hi Richard,

Nice trick and speedup. I just noted that in the 'standard' case you used the default n.chains=1 in jags.model(), whereas you have 3 chains invoked implicitly in the parallel example.

Another thing is that the default RNGs (up to 4 chains) are set automagically when you call jags.model (see JAGS user manual, p. 17):

jm.faster <- jags.model("scr0-faster.jag", jd.faster, ji,
                            n.adapt=0, n.chains=3)
st <- jm.faster$state(internal=TRUE)
lapply(st, "[[", ".RNG.name")
## [[1]]
## [1] "base::Marsaglia-Multicarry"
##
## [[2]]
## [1] "base::Super-Duper"
##
## [[3]]
## [1] "base::Mersenne-Twister"

But RNGs are not set to different kinds/states when using parallel as in your example, which leads to identical chains -- this is something must be avoided (of course I set the ji function to deterministic to demonstrate the point, with the stochastic version, it works better, but still no guarantees):

library(parallel)
nChains <- 3
cl1 <- makeCluster(nChains)
ji <- function() list(z=c(rep(NA, n0), rep(1, M-n0)),
                      lam0=0.01, sigma=0.03)
clusterExport(cl1, c("jd.faster", "ji", "M", "n0"))
out1 <- clusterEvalQ(cl1, {
    library(rjags)
    jm.faster <- jags.model("scr0-faster.jag", jd.faster, ji,
                            n.adapt=0, n.chains=1)
    jm.faster$state(internal=TRUE)[[1]]
})
lapply(out1, "[[", ".RNG.name")
## [[1]]
## [1] "base::Wichmann-Hill" #            these are all the same!
##
## [[2]]
## [1] "base::Wichmann-Hill"
##
## [[3]]
## [1] "base::Wichmann-Hill"
lapply(out1, "[[", ".RNG.state")
## [[1]]
##  [1]   896518066  1481771107 -1240342298 -1779828825  1775180164 -1241426758
## 
##  [[2]]
##  [1]   896518066  1481771107 -1240342298 -1779828825  1775180164 -1241426758
## 
##  [[3]]
##  [1]   896518066  1481771107 -1240342298 -1779828825  1775180164 -1241426758
out1 <- clusterEvalQ(cl1, {
    library(rjags)
    jm.faster <- jags.model("scr0-faster.jag", jd.faster, ji,
                            n.adapt=0)
    jc.faster <- coda.samples(jm.faster,
                              c("lam0","sigma","N","deviance"),
                              n.iter=5)
    return(as.mcmc(jc.faster))
})
##  [[1]]
##  Markov Chain Monte Carlo (MCMC) output:
##  Start = 1 
##  End = 5 
##  Thinning interval = 1 
##         N      lam0      sigma
##  [1,] 271 0.6325076 0.05749986
##  [2,] 179 0.6060224 0.07138174
##  [3,] 147 0.5292536 0.07798082
##  [4,] 125 0.4369718 0.08907124
##  [5,]  93 0.3788972 0.09063776
## 
##  [[2]]
##  Markov Chain Monte Carlo (MCMC) output:
##  Start = 1 
##  End = 5 
##  Thinning interval = 1 
##         N      lam0      sigma
##  [1,] 271 0.6325076 0.05749986
##  [2,] 179 0.6060224 0.07138174
##  [3,] 147 0.5292536 0.07798082
##  [4,] 125 0.4369718 0.08907124
##  [5,]  93 0.3788972 0.09063776
## 
##  [[3]]
##  Markov Chain Monte Carlo (MCMC) output:
##  Start = 1 
##  End = 5 
##  Thinning interval = 1 
##         N      lam0      sigma
##  [1,] 271 0.6325076 0.05749986
##  [2,] 179 0.6060224 0.07138174
##  [3,] 147 0.5292536 0.07798082
##  [4,] 125 0.4369718 0.08907124
## [5,]  93 0.3788972 0.09063776



The dclone package has some functions that deal with the issue:

dclone::parJagsModel(cl1, "jm.faster", "scr0-faster.jag", jd.faster, ji,
                            n.adapt=0, n.chains=3)
st <- clusterEvalQ(cl1, dclone:::.DcloneEnvResults$jm.faster$state(internal=TRUE))
lapply(st, function(z) z[[1]][[".RNG.name"]])
## [[1]]
## [1] "base::Super-Duper" #             these are different now
## 
## [[2]]
## [1] "base::Mersenne-Twister"
## 
## [[3]]
## [1] "base::Wichmann-Hill"
stopCluster(cl1)


You might want to check parallelized functions in the dclone package (jags.parfit, parJagsModel, parUpdate, parCodaSamples), which handle RNGs properly. The design was influenced by some early input from Martyn Plummer at the time I wrote it (which was 8 years ago I think?). 

When >4 chains are requested the RNG types are recycled with different seed, or the lecuyer module can be used, but still one has to pass the RNG state to the workers as part of the initial values, which is a lot more complicated than using parJagsModel:

cl1 <- makeCluster(10)
clusterExport(cl1, c("jd.faster", "ji", "M", "n0"))
load.module("lecuyer")
parLoadModule(cl1, "lecuyer")
dclone::parJagsModel(cl1, "jm.faster", "scr0-faster.jag", jd.faster, ji,
                            n.adapt=0, n.chains=10)
stopCluster(cl1)

Cheers,

Peter


--
Péter Sólymos
780-492-8534 | sol...@ualberta.ca | peter.solymos.org
Alberta Biodiversity Monitoring Institute http://www.abmi.ca
Boreal Avian Modelling Project http://www.borealbirds.ca


--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.
To post to this group, send email to hmec...@googlegroups.com.
Visit this group at https://groups.google.com/group/hmecology.
To view this discussion on the web visit https://groups.google.com/d/msgid/hmecology/CADy23JuCJOaYovp3fqge2pVOU4LsQpAe3dK5VDCf4Q%2BKigJ4aw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Richard Chandler

unread,
May 15, 2018, 7:46:07 PM5/15/18
to Peter Solymos, hmecology: Hierarchical Modeling in Ecology, spatialcapture.

Hi Peter,

Thanks for the comment. I probably shouldn’t have mentioned parallel chains in my post. The speed gain has nothing to do with running chains in parallel. The main comparison I wanted to make was between:

## Standard approach
system.time({
    jm <- jags.model("scr0.jag", jd, ji, n.adapt=200)
    jc <- coda.samples(jm, c("lam0","sigma","N","deviance"), n.iter=200)
})

and

## Faster
system.time({
    jm.faster <- jags.model("scr0-faster.jag", jd.faster, ji, n.adapt=200)
    jc.faster <- coda.samples(jm.faster, c("lam0","sigma","N","deviance"), n.iter=200)
})

The latter should be >2x faste

​r.​

Regardless, thanks for the info about 

​your​
dclone package.

Richard


For more options, visit https://groups.google.com/d/optout.

Mathias Tobler

unread,
May 16, 2018, 2:21:58 AM5/16/18
to hmec...@googlegroups.com

Hi Richard,

that is a nice trick. Alternatively, if you don't have temporal covariates (which most people don't), you can just not loop over K and get about the same speed (actually this approach is independent of the size of K so for large K it is quite a bit faster than  yours; try K=50):

  for(j in 1:nTraps) {
    dSq.s2x[i,j] <- (s[i,1]-x[j,1])^2 +
                    (s[i,2]-x[j,2])^2
      lambda[i,j] <- lam0*exp(-dSq.s2x[i,j]/(2*sigma^2))*z[i]*opersum[j]
      y[i,j] ~ dpois(lambda[i,j])
  }

with

jd.fastest <- list(y=apply(yaug,c(1,2),sum), M=M, opersum=rowSums(oper),
           z=c(rep(1, n0), rep(NA, M-n0)),
           xlim=xlim, ylim=xlim,
           x=x, nTraps=nTraps, nOcc=nOcc)

As Ben summed it up nicely a few days ago: "I wonder how many months of total computation time has been wasted in these models to date by looping over K, especially for camera trap applications with many sampling occasions."

Best,

Mathias

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

To post to this group, send email to hmec...@googlegroups.com.
Visit this group at https://groups.google.com/group/hmecology.
fastestSCR.R
scr0-fastest.jag

Jose Jimenez Garcia-Herrera

unread,
May 16, 2018, 3:11:01 AM5/16/18
to spatialcapt...@googlegroups.com, hmecology: Hierarchical Modeling in Ecology

Hi All,

 

I’ve used the Richard trick in Nimble saving 2X aprox.

 

Regards,

José

 

=======

 

code <- nimbleCode({

 

  lam0 ~ dunif(0, 3)

  sigma ~ dunif(0, 2)

  psi ~ dbeta(1,1)

 

  for(i in 1:M) {

    s[i,1] ~ dunif(xlim[1], xlim[2])

    s[i,2] ~ dunif(ylim[1], ylim[2])

    z[i] ~ dbern(psi) ## Provide the first n0 as data, ie z[i]=1

    dSq.s2x[i,1:nTraps] <- (s[i,1]-x[1:nTraps,1])^2 + (s[i,2]-x[1:nTraps,2])^2

 

    for(j in 1:nTraps) {

      for(k in 1:nOcc) {

        lambda[i,j,k] <- lam0*exp(-dSq.s2x[i,j]/(2*sigma^2))*oper[j,k]

      }

    }

  }

  for(i in 1:n0) {

    for(j in 1:nTraps) {

      for(k in 1:nOcc) {

        y[i,j,k] ~ dpois(lambda[i,j,k])

      }

    }

  }

  for(i in (n0+1):M) {

    zeros[i] ~ dpois(sum(lambda[i,1:nTraps,1:nOcc])*z[i])

  }

 

  N <- sum(z[1:M])

})

 

Time: 4.33509 mins

#==================

 

library(nimble)

## define the model

code <- nimbleCode({

 

  lam0 ~ dunif(0, 3)

  alpha1 ~ dnorm(0,.1)

  sigma <- sqrt(1/(2*alpha1))

  psi ~ dunif(0,1)

 

  for(i in 1:M){

    z[i] ~ dbern(psi)

    s[i,1] ~ dunif(xlim[1],xlim[2])

    s[i,2] ~ dunif(ylim[1],ylim[2])

    d[i,1:nTraps] <- pow(s[i,1]-x[1:nTraps,1],2) + pow(s[i,2]-x[1:nTraps,2],2)

 

    for(j in 1:nTraps){

      p[i,j,1:nOcc] <- lam0*z[i]*exp(- alpha1*d[i,j])*oper[j,1:nOcc]

      for(k in 1:nOcc){

        y[i,j,k] ~ dpois(p[i,j,k])

      }

    }

  }

  N <- sum(z[1:M])

})

 

Time: 9.071667 mins

#==================

--

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

 

--

*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

--
You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerec...@googlegroups.com.

Jeffrey Royle

unread,
May 16, 2018, 5:29:33 AM5/16/18
to spatialcapt...@googlegroups.com, hmecology: Hierarchical Modeling in Ecology
hi all, this seems like a big deal, thanks a lot Richard and everyone for the discussion!

Jose: I wonder if you wouldn't mind providing the full NIMBLE code so that non-NIMBLE users can simulate data and execute that?  

regards,
andy


--

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.

 

--

*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

--
You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerecapture+unsub...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerecapture+unsub...@googlegroups.com.

Jose Jimenez Garcia-Herrera

unread,
May 16, 2018, 5:49:21 AM5/16/18
to Jeffrey Royle, spatialcapt...@googlegroups.com, hmecology: Hierarchical Modeling in Ecology

Hi Andy,

 

Of course! These are the Nimble versions of Richard's code (faster: fasterSCR_Y.R; Y; normal: fasterSCR_N.R).

 

Regards,

José

 

De: hmec...@googlegroups.com [mailto:hmec...@googlegroups.com] En nombre de Jeffrey Royle
Enviado el: miércoles, 16 de mayo de 2018 11:30
Para: spatialcapt...@googlegroups.com
CC: hmecology: Hierarchical Modeling in Ecology <hmec...@googlegroups.com>
Asunto: Re: Speeding up data augmentation in BUGS

 

hi all, this seems like a big deal, thanks a lot Richard and everyone for the discussion!

--

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

 

--

*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

--
You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerec...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerec...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.


To post to this group, send email to hmec...@googlegroups.com.
Visit this group at https://groups.google.com/group/hmecology.

fasterSCR_Y.R
fasterSCR_N.R

Richard Chandler

unread,
May 16, 2018, 7:05:20 AM5/16/18
to Jose Jimenez Garcia-Herrera, Jeffrey Royle, spatialcapt...@googlegroups.com, hmecology: Hierarchical Modeling in Ecology
Hi everyone,

Thanks for the discussion. Jose, thanks for the NIMBLE code. Mathias, this trick works whether or not you loop over K. I demonstrated it with a K loop just so people can easily extend it to deal with behavioral effects or temporal covariates. However, I completely agree that you should avoid the K loop when possible.

One other note. This is not the same thing as the marginal approach described here: 


That approach will speed things up considerably too, but it can't account for more than one or two sources of individual heterogeneity because it requires a numerical integration. Still, it's worth checking out if you have a very slow model with few sources of individual heterogeneity.

Richard


--

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.

 

--

*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

--
You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerecapture+unsub...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerecapture+unsub...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.

To post to this group, send email to hmec...@googlegroups.com.
Visit this group at https://groups.google.com/group/hmecology.

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.

To post to this group, send email to hmec...@googlegroups.com.
Visit this group at https://groups.google.com/group/hmecology.

For more options, visit https://groups.google.com/d/optout.

Alexandra Curtis - NOAA Affiliate

unread,
Aug 24, 2018, 6:21:53 PM8/24/18
to Richard Chandler, hmecology: Hierarchical Modeling in Ecology
In case this can be of use to anyone else (or someone can improve on it!), I'm reactivating this thread to share my adaptation of Richard's trick (thanks!) to a simple POPAN model with data augmentation. As far as I was able to work out, the shortcut isn't as clean in this context, so it only provided 10-20% time savings in my JAGS simulations. Model data and inits (appended below) were adapted accordingly based on Richard's code. 

I rewrote the following likelihood (from Kery and Schaub):

 # Likelihood
 for (i in 1:M){
    # First occasion
    # State process
    w[i] ~ dbern(psi)   # Draw latent inclusion
    z[i,1] ~ dbern(nu[1])
    # Observation process
    mu1[i] <- p[i,1] * z[i,1] * w[i]
    y[i,1] ~ dbern(mu1[i])
    # Subsequent occasions
    for (t in 2:n.occasions){
       # State process
       q[i,t-1] <- 1-z[i,t-1]
       mu2[i,t] <- phi[i,t-1] * z[i,t-1] + nu[t] * prod(q[i,1:(t-1)])
       z[i,t] ~ dbern(mu2[i,t])
       # Observation process
       mu3[i,t] <- z[i,t] * p[i,t] * w[i]
       y[i,t] ~ dbern(mu3[i,t])
    } #t
 } #i


as:


# Likelihood
for (i in 1:M){
   # First occasion
   # State process
   w[i] ~ dbern(psi)  # Draw latent inclusion
   z[i,1] ~ dbern(nu[1])
   # Subsequent occasions
   for (t in 2:n.occasions){
      # State process
      q[i,t-1] <- 1-z[i,t-1]
      mu2[i,t] <- phi[i,t-1] * z[i,t-1] + nu[t] * prod(q[i,1:(t-1)])
      z[i,t] ~ dbern(mu2[i,t])
   } #t
} #i

for (i in 1:n0){
   for (t in 1:n.occasions){
      y[i,t] ~ dbern(p[i,t] * z[i,t] * w[i])
   } #t
} #i
for(i in (n0+1):M) {
   for (t in 1:n.occasions){ 
      pzo[i,t] <- 1 - p[i,t] * z[i,t] * w[i]    # probability of not being detected on each occasion
   }
   zeros[i] ~ dbern(1-prod(pzo[i,]))    # probability of at least one detection
}

modified data and inits in R: 

# data
bugs.data <- list(y = CH, n.occasions=ncol(CH), M=nrow(CH.aug), 
                  zeros=c(rep(NA, nrow(CH)), rep(0, nz)),
                  w=c(rep(1, nrow(CH)), rep(NA, nz)),
                  n0=nrow(CH))

# Initial values
z.init <- matrix(1, nrow(CH.aug), ncol(CH.aug))
w.init <- c(rep(NA, nrow(CH)), rep(1, nz))
inits <- function(){list(z=z.init, w=w.init)}


Alex

-- 
Alexandra Curtis
Contractor with Ocean Associates, Inc.
Marine Mammal and Turtle Division, Southwest Fisheries Science Center
National Marine Fisheries Service, NOAA
8901 La Jolla Shores Drive
La Jolla, CA 92037 USA
Phone: +1-858-546-5625



To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerecapture+unsubscr...@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Spatial Capture-Recapture" group.

To unsubscribe from this group and stop receiving emails from it, send an email to spatialcapturerecapture+unsubscr...@googlegroups.com.

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2015)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+unsubscribe@googlegroups.com.
To post to this group, send email to hmec...@googlegroups.com.
Visit this group at https://groups.google.com/group/hmecology.

Natalie Haydt

unread,
Jan 16, 2020, 11:12:50 PM1/16/20
to hmecology: Hierarchical Modeling in Ecology
Hi everyone,

I'm using trying to use this trick for an SCR model on turtles in the Chesapeake and Ohio Canal. We caught a ton of turtles and our augmentation numbers are very large. This trick seems like it will substantially decrease our model run time!!

I have a quick question about the concept behind the zeros equation.
Why is zeros[i] associated with the probability of catching an augmented individual at least once rather than the probability of that individual never being detected? Is it because if it’s the probability of never being caught then it can’t separate that from the probability of not being in the population?

For example, why is it:
for(i in (n0+1):M) {
   zeros[i] ~ dbern(1-prod(1 - p[i, 1:n_occassions] * z[i]))    # probability of at least one detection over sampling occasions
}

And not:

for(i in (n0+1):M) {
 zeros[i] ~ dbern(prod(1 - p[i, 1:n_occassions]) * z[i])     # probability of never being detected over sampling occasions
}


Thank you!!

Natalie Haydt

Catherine Sun

unread,
Jan 17, 2020, 4:55:51 AM1/17/20
to Natalie Haydt, hmecology: Hierarchical Modeling in Ecology
Hi Natalie!

Others may correct me because it's been a while since I've used this trick.

Since the Bernoulli trial is about the probability of 'success', or detection in our case, this trick asks what are the chances that the individual was never detected given the probability of being detected. By being the probability of catching an augmented individual, it helps determine if it's really part of the population. If an individual was never detected, then the detection probability is likely really low if it truly exists (z[i] is 1) or it probably doesnt exist (z[i] is 0) if the detection probability is likely really high.

 If instead you want to formulate it as the the probability of 'failure'/non-detection, that would be asking what's the probability of successfully getting a failure, which is perhaps an unnecessary convolution. And so as it's written, you would have to swap your definition of 'zeros' so that 1 (success) means never detected and 0 means detected at least once. And then maybe you want to call it the 'ones' trick instead ;) You wouldnt be able to have the definition of 'zeros' as currently used in the trick. The z[i] in there is modeled elsewhere so I dont think it's an issue of identifiability.

More generally, data augmentation is about finding the individuals that do exist but were missed; we dont care about the ones that dont actually exist and how many there are of those is not meaningful anyways because it changes with the arbitrary value of M. The psi parameter (the DA parameter) and looping through all M individuals to model individual-level detection probability help to find the undetected individuals that exist, in which case we've already got what we're interested in. Modelling the probability of never detecting an individual seems unnecessary to me, and with the additional need to clarify if it really exists or not with z[i] (which you alluded to). 

Interested to hear others thoughts..






--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2020)

---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hmecology/334456a4-de2b-4ff3-b1cf-e7192d01ea7d%40googlegroups.com.


--
Catherine Sun
PhD Student
New York Cooperative Fish and Wildlife Research Unit
Department of Natural Resources
302 Fernow Hall, Cornell University
(mailing address is 111 Fernow Hall)

Ithaca, NY 14853

Richard Chandler

unread,
Jan 17, 2020, 8:28:43 AM1/17/20
to Cat Sun, Natalie Haydt, hmecology: Hierarchical Modeling in Ecology
Hi Natalie,

Just to build on what Cat said, the intuition for this trick is the same as for standard data augmentation. If the encounter history data indicate that Pr(captured at least once) is high, then it is unlikely that you would observe an "all-zero" encounter history for an individual with z=1. It would be more likely in that case that you caught most of the individuals in the population, and therefore z=0 for most of the augmented individuals. 

This trick just implements that idea using a single zero instead of J*K zeros for each augmented individual. Fewer evaluations of probability densities results in a speed improvement.

There are a few other tricks out there that you might also want to check out:




Richard


Natalie Haydt

unread,
Jan 17, 2020, 2:58:18 PM1/17/20
to hmecology: Hierarchical Modeling in Ecology
Hi Catherine and Richard,

Thank you for your quick responses! That makes sense now. I was thinking of the concept as the "ones" trick Catherine mentioned, but then didn't think about having to switch the 1s and 0s (which seems obvious now). Thank you for your explanations and the extra resources!!

All the best,
Natalie

On Tuesday, May 15, 2018 at 4:09:32 PM UTC-4, Richard Chandler wrote:
Message has been deleted
Message has been deleted

David Christianson

unread,
May 20, 2020, 12:26:40 PM5/20/20
to hmecology: Hierarchical Modeling in Ecology
Hello Richard,
I am working with a hierarchical distance sampling model with individual covariates where I am looping over i (data augmented individuals) to estimate detection.  I am also modelling availability of individuals from the superpopulation as I have replicate surveys.  Availability much <1 requires a very large coefficient for data augmentation when setting up the data before a JAGS run.  The model iterations are prohibitively slow.   After reading this post, I was wondering, shouldn't the rate of data augmentation be the same between the superpopulatoin and the available population? and therefore, shouldn't it be possible to limit the data augmentation to the available population?  When estimating psi (data augmentation rate) initially using:
sum(lambda)/(nind+nz) [Kery and Royle's notation] with a very large set of data augmented individuals (augmenting for the entire superpopulation) 
I could instead use: 
(sum(lambda)*phi)/(nind+nz) with a smaller set of data augmented individuals (augmenting only the available population)?   

Thanks for this excellent post. I find speeding things up is what I spend most of my time on now!

David Christianson
Assistant Professor
University of Wyoming

Richard Chandler

unread,
May 20, 2020, 3:24:53 PM5/20/20
to David Christianson, hmecology: Hierarchical Modeling in Ecology
Hi David,

I think you could speed this up a lot by avoiding data augmentation altogether. Have you considered the multinomial formulation of the model?

Richard


--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2020)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

David Christianson

unread,
May 20, 2020, 5:17:00 PM5/20/20
to hmecology: Hierarchical Modeling in Ecology
Hi Richard,  Thanks for responding so quickly.  I have considered that, but in this case, I have a particular interest in fitting the observation-level covariates.  My data set consists of ungulates that vary dramatically in group size (1-500), behavior, and composition in a manner that I think is likely to affect detection.  Further, I am working on a generalized approach I can fit across ungulate species in this species and the i-fold DA seemed the most likely of success (but not yet!).  This ruled out the multinomial formulation as I understand it (number of groups observed in each replicate follows a multinomial distriubtion with cell probs applied to N, correct?).  


To unsubscribe from this group and stop receiving emails from it, send an email to hmec...@googlegroups.com.

Richard Chandler

unread,
May 21, 2020, 8:07:18 AM5/21/20
to David Christianson, hmecology: Hierarchical Modeling in Ecology
Sorry, I forgot to reply to the entire group when I sent the last message. 

Marginalizing N probably wouldn't result in much of a speed improvement, so I wouldn't worry about that approach, especially because you might have to integrate out group size too.

You can't define psi as a function of z, because z depends on on psi.

Alternatively, you could just forget about the N(j)~Bin(M, phi) formulation, and simply set the expected value of abundance (and hence density) to be the same in each occasion, which would just mean setting psi(j)=psi and then doing J separate data augmentations. This would be the easiest option, but it wouldn't correspond to a traditional model of temporary emigration. 

I can think of several other options, but they would likely be very slow, or they would require that you can recognize individual groups across occasions (similar to mark-recapture distance sampling). 

 
 

On Wed, May 20, 2020 at 11:51 PM David Christianson <daveal...@gmail.com> wrote:
Thank you very much for evaluating this.  The model is running fine and I wouldn't have caught this.  I can't figure how to marginalize N out of the model but it should be possible to condition on the realized N.  As z[i] conforms to 'real' groups in the available population N, then sum(z[i])  should be the realized N and then psi=sum(lambda)*phi/sum(z[i]] would refer to the data augmentation rate in the available population....I think.

On Wed, May 20, 2020 at 7:01 PM Richard Chandler <richard....@gmail.com> wrote:
Yeah, it would be much harder to use the multinomial formulation if you have individual covariates such as group size. Unfortunately, I don't think the data augmentation formulation that you suggested would accomplish what you're after.  

Imagine the availability model is N(j)~Bin(M, phi=1). In that case, it should be true that N(1)=N(2)=...=N(J). However, the formulation you mentioned would simply set the expected values of N equal to one another. The realized values of N would be conditionally independent, and this would result in confounding between phi and the detection parameters. I think you would need to condition the detections on the realized values of N (not the expected values of N), or marginalize the N's out of the model. 



To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hmecology/baff1469-6cc4-4bb8-a3df-8a7d8462c128%40googlegroups.com.

David Christianson

unread,
May 23, 2020, 4:52:44 PM5/23/20
to hmecology: Hierarchical Modeling in Ecology
Hello Richard (et al),
I was wondering about another possible way to speed up data augmentation when working within individuals.  I often play around with the level of data augmentation to minimize the number of nodes, and when augmentation is too low, an "Invalid parent values" error gets thrown for node z (the data augment indicator).  I assume this is because in some iterations, the sum(lambda) is larger than the size of the augmented observations (DA) (the numerator and denominator respectively in the estimation of psi) resulting in a rate parameter >1 for  z ~ dbern(psi), causing the error.  This forces one to either keep running the model until finally a chain never encounters this condition or add more 'augments' to the data.  I am wondering if there might be a solution that would constrain psi to always be less than 1, once a reasonable augmented data size has been identified? 

Dave Christianson

Tomas T.

unread,
May 24, 2020, 6:26:10 AM5/24/20
to David Christianson, hmecology: Hierarchical Modeling in Ecology
Hi David,

not sure if i understand what you need, but as you speak about z ~
dbern(psi) and keeping psi < 1... what is your prior for psi? Wouldn't
something like this solve it? :

logit(psi) ~ dnorm(0, 0.01)

(Not sure now if you can do it directly like this, maybe you will need
logit(psi) <- alpha and then alpha ~ dnorm(0, 0.01)....

Tomas
> --
> *** Three hierarchical modeling email lists ***
> (1) unmarked: for questions specific to the R package unmarked
> (2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
> (3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2020)
> ---
> You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hmecology/7b28846a-90be-4d7a-a6e8-028bc25f6e96%40googlegroups.com.

David Christianson

unread,
May 24, 2020, 5:55:35 PM5/24/20
to hmecology: Hierarchical Modeling in Ecology
Hi Tomas, I don't think I can use a distribution that is naturally contained between 0 and 1.  Following the notation of kery and royle (2016) psi is the data agumentation parameter and it is confounded with lambda, so it must be constrained as sum(lambda)/M (where M is the total size of the augmented observations).  
Thanks for the suggestion.
> To unsubscribe from this group and stop receiving emails from it, send an email to hmec...@googlegroups.com.

Richard Chandler

unread,
May 25, 2020, 9:18:42 AM5/25/20
to David Christianson, hmecology: Hierarchical Modeling in Ecology
Hi David,

I think you're right about that error message. If the initial value of psi is >1, it won't run. For a model like this:

mu[g] <- (beta0 + beta1*x[g])*pixelArea # expected value of abundance in pixel g
EN <- sum(mu)                           # expected value of N
psi <- EN/M                             # data augmentation parameter
z[i] ~ dbern(psi)
s[i] ~ dcat(mu[1:M]/EN)

I would use a low initial value for the intercept, say beta0=-5, and I would set beta1=0 to ensure that EN<M. Once it's initialized, it will reject values of beta that result in psi>1.  Note that as M approaches infinity, this approximates a Poisson point process (in discrete space).

The alternative formulation would involve something like a psi~dunif(0,1). If you do that, you would exclude the intercept beta0. This corresponds to a conditional intensity function, which might run faster.

Richard

 

 


To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hmecology/46968d84-d02f-474f-b8ae-028e60c80121%40googlegroups.com.

David Christianson

unread,
May 25, 2020, 10:18:50 PM5/25/20
to hmecology: Hierarchical Modeling in Ecology
This takes care of it! i wasn't aware that Jags would disallow lambda parents that would lead to sum(lambda)>M.  That makes it very clear why the initial values can be so critical (along with an adequate number of augments).    
Reply all
Reply to author
Forward
0 new messages