specifying different data for first iteration only?

38 views
Skip to first unread message

Rebecca Taylor

unread,
Oct 15, 2022, 6:52:25 PM10/15/22
to nimble-users

Hi all,

I am working with an age-structured Jolly-Seber model (superpopulation parameterization with data augmentation.) Ages are observed without error for real individuals. The time 1 population is comprised of individuals of many ages. Entries in time 2 and beyond are by birth only. For augmented individuals, I would like to initialize all ages at “NA” so the ages can simply be determined by the code.

That works fine if survival and recapture rates are not age-specific. However, if rates are age-specific, then the initialization fails because there is an “NA” being used as an index. For example, any individual in time t has a survival rate of phi[t, NA].

I am able to work around this by assigning all augmented individuals an age in time 1. But that means I need many groups of augmented individuals—e.g., if there are 10 ages, there need to be 11 groups of augmented individuals (the extra group is for not yet born). And all groups of augmented individuals have to be pretty large, so I end up with a much larger augmented population and run time than I would like.

Is there a way I can do something different on the first iteration to get around the initialization problem? Perhaps assign everyone a non-age-dependent survival rate on the first iteration only? Or assign everyone an initial age in the first time period for the first iteration, but then let those initial ages be overwritten in subsequent iterations?

I hope this makes sense. I’ll copy code below. Thank you!!

Rebecca

code <- nimbleCode({
  psi ~ dbeta(1,1)  # psi = inclusion parameter (is individual real or fake)
  for (t in 1:K){                      
    for(aa in 1:max.age){   # age 0 means not yet born
      p[t,aa+1] ~ dbeta(1,1)# p[t,aa+1] = cap rate for time t, age aa; age 0 cap rate = 0
    }                                
  }
  for (t in 1:(K-1)){                
    for(aa in 1:(max.age-1)){        
      phi[t,aa+1] ~ dbeta(1,1) # phi[t,aa+1] = surv rate for time t, age aa; age 0 surv rate = 0
    }                                  
  }
  beta[1:K] ~ ddirch(b[1:K])   # betas = unconditional entry probabilities
  eta[1] <- beta[1]            # etas = corresponding conditional probabilities
  for(k in 2:K){eta[k] <- beta[k]/(1-sum(beta[1:(k-1)]))}
 
  # piAGE = age distribution of animals alive in the population at time 1
  # piAGEuncond has an additional "age 0" which means "not yet born".
  # Entries after time 1 are by birth only
  piAGE[1:max.age] ~ ddirch(a[1:max.age])
  piAGEuncond[1:(max.age+1)] <- c( (1-eta[1]), eta[1]*piAGE[1:max.age] )

  # Likelihoods
  for (i in 1:M){    # M = size of augmented population
    w[i] ~ dbern(psi)# w[i] = indicator if individual i is real

    # agePlusOne are time 1 age data + 1
    agePlusOne[i] ~ dcat(piAGEuncond[1:(max.age+1)])
    age[i,1] <- (agePlusOne[i]-1)

    # state process
    u[i,1] <- step(age[i,1]-.1)  # u[i,j] = indicator for alive  
    z[i,1] <- u[i,1]*w[i]

    # Observation process
    yprob_t1[i,1] <- z[i,1]*p[1,age[i,1]+1]
    y[i,1] ~ dbern(yprob_t1[i,1] )  

    # avail[i,j] = indicator if not yet recruited from superpop to pop
    avail[i,1] <- 1 - u[i,1]        

    for (t in 2:K){
      # State process
      uprob[i,t] <- u[i,t-1]*phi[t-1,age[i,t-1]+1]  + avail[i,t-1]*eta[t]
      u[i,t] ~ dbern(uprob[i,t])
      z[i,t] <- u[i,t]*w[i]

      # Age process: grows older by one year after recruitment
      age[i,t] <- age[i,t-1] + max(u[i,1:t])  

      # Observation process
      yprob[i,t] <- z[i,t]*p[t,age[i,t]+1]
      y[i,t] ~ dbern(yprob[i,t])  
     
      # avail[i,j] = indicator if not yet recruited from superpop to pop
      avail[i,t] <- 1- max(u[i,1:t])
    } #t
  } #i
})#END model




Daniel Gibson

unread,
Oct 15, 2022, 7:37:24 PM10/15/22
to Rebecca Taylor, nimble-users
A few years ago, Richard Chandler posted a little trick on the HMEcology listserv that basically suggested that you could set spatial CMR likelihoods up in a way that only used the pseudo-individuals to inform the state process and completely exclude them from the observation process. This idea was expanded on by Alexandra Curtis to Jolly-Seber population models. I have no idea if this has entered the vetted ecological literature, but I messed around with the idea a bit, and it seemed pretty sound. So,although this doesn't change the number of pseudo-individuals you are using, it does change the extent to which they can bog down your model.

I am just going to include an example code that I have used in the past that should be relatively similar to your code. I think this example was specifically written for JAGS, but I am not immediately seeing any nimble red flags.

    ######################################
    # State Process
    ######################################
    for (i in 1:M){                                                   # Loop through real + psuedo individuals
     w[i] ~ dbern(psi)                                               # Latent inclusion state
     z[i,1] ~ dbern(eta[1])                                         # Entry into first occasion
   
    for (t in 2:n.occasions) {
      q[i,t-1] <- 1 - z[i,t-1]                                                 # Have not entered yet
      mu1[i,t] <- phi[i,t-1] * z[i,t-1] + eta[t] * prod(q[i,1:(t-1)]) # Presence = survived since last + entered since last
      z[i,t] ~ dbern(mu1[i,t])
    }
    }
   
    ######################################
    # Observation Process
    ######################################
    for (i in 1:n.ch){                            # Loop through real individuals
      y[i,1] ~ dbern(z[i,1] * p[1] * w[i])     # Latent state * detection * latent inclusion
    for (t in 2:n.occasions){
     y[i,t] ~ dbern(z[i,t] * p[t] * w[i])        # Latent state * detection * latent inclusion
    }
    }

    ######################################
    # Chandler trick
    ######################################
    for(i in (n.ch+1):M) {                                        # Loop through psuedo individuals
      pzo[i,1] <- 1 - (p[1] * z[i,1] * w[i])                      # probability of not being detected on first occasion
    for (t in 2:n.occasions){
      pzo[i,t] <- 1 - (p[t] * z[i,t] * w[i])                         # probability of not being detected on each other occasion
    }
      zeros[i] ~ dbern(1-prod(pzo[i,]))                         # probability of at being detected at least once (zeros = vector of 0's for each augmented individual)
    }

#################
 This last bold-italicized part is the time saver. Instead of doing a bernoulli trial for i * t for these imaginary individuals, you just do it once for each psuedo-individual based on the probability of it being detected once per the duration of the study.


-Dan

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/3c0504a2-ceb1-4ccd-85d8-255642b7e38en%40googlegroups.com.

Taylor, Rebecca L

unread,
Oct 15, 2022, 9:12:58 PM10/15/22
to Daniel Gibson, nimble-users
Dan,

Thanks for that trick--it looks like it could be very useful for me down the road!  But I don't think it takes care of my current problem.

Since my current model is age-structured, I need to pull out survival rates as phi[t, age] and recap rates as p[t, age]. But if age is specified as NA in the first time period (for the augmented individuals), I get an index-out-of-bounds error. If I specify a numeric age in the first time period for an augmented individual, then I don't think its age can't be overwritten in subsequent iterations. Thus, if K is the maximum age, I need K+1 separate groups of augmented individuals.

Is there something I can do to separate out the first iteration to fix this? Perhaps assign everyone a non-age-dependent survival rate on the first iteration only? Or assign everyone an initial age in the first time period for the first iteration, but then let those initial ages be overwritten in subsequent iterations?

Thanks!

Rebecca


Rebecca Taylor
Research Statistician
Alaska Science Center
U. S. Geological Survey

Pronouns: She/her
Please note I sometimes send and answer email outside normal work hours. 
This helps me with my schedule but is not intended to pressure others to do the same

From: nimble...@googlegroups.com <nimble...@googlegroups.com> on behalf of Daniel Gibson <gib...@vt.edu>
Sent: Saturday, October 15, 2022 3:37 PM
To: Taylor, Rebecca L <rebecc...@usgs.gov>
Cc: nimble-users <nimble...@googlegroups.com>
Subject: [EXTERNAL] Re: specifying different data for first iteration only?
 

 

 This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.  



You received this message because you are subscribed to a topic in the Google Groups "nimble-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nimble-users/YFi4aBAQkcA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/CAD5tc9xK3LFa07SPcz77%2BFsXCU_4N9S8o6SwOE2cVrL_%2Bwe-FQ%40mail.gmail.com.

Perry de Valpine

unread,
Oct 19, 2022, 11:56:04 AM10/19/22
to Taylor, Rebecca L, Daniel Gibson, nimble-users
Hi Rebecca,

Thanks for this.  And thanks for the suggestion, Dan.

I'm not easily grasping exactly what you need or what the problem is.  Could you try to pinpoint it more clearly (possibly referring to each line of code involved)?  One thing I'm not sure about is whether your code shows your workaround or shows what you would like to have work but doesn't work.

If you need ages of augmented individuals to be latent states with valid initial values, would it work to provide their data entries as NA and their inits entries as valid values?  That is the usual approach.

You might be saying that when w[i] is sampled to be a 1 (turning "on" an augmented individual), then you want an appropriate draw for the age.  It looks like the ages are stochastic.  Can you spell it out more?  I'm not sure whether you are seeing the error at (what we call) initialization, i.e. before the first MCMC iteration, or when a w[i] is sampled (or proposed by a sampler) to be a 1.

A general comment: Data augmentation like this is a setup for inefficient MCMC.  That's because the MCMC will sample dimensions of the model relevant to augmented individuals even when they are not "in the model" (w[i] == 0).  An alternative is reversible jump MCMC customized to this kind of model.  We've experimented with that in simpler cases and it didn't help a lot, but it could help in other cases and just hasn't been something we've had time to do more with.  There are other ideas for sampling these kinds of models more effectively.  But anyway that's a side comment and does not address your problem.

Perry


Taylor, Rebecca L

unread,
Oct 21, 2022, 9:12:09 PM10/21/22
to Perry de Valpine, Daniel Gibson, nimble-users

Hi Perry, 

 

Thank you for your response and for all you do to help us.  

 

The error was indeed on initialization, and the first line to cause the error was 

yprob_t1[i,1] <- z[i,1]*p[1,age[i,1]+1] because age[i,1]equaled “NA” for the augmented individuals, making the index evaluate to [1,NA+1]. At the time I posted my question, the code only worked if time 1 age data were specified numerically for all individuals, including the augmented individuals. 

 

I previously tried providing numerical inits for all of the entries where I wanted to have “NA” data values, but it did not work. However, since you suggested it again, I tried it again, and this time it did work—which means something else was wrong the last time I tried it. I am taking the next online course y’all are offering, so hopefully that will help me become less reliant on this forum. 

 

I see what you are saying about inefficient sampling for this kind of data augmentation—thank you for taking the extra time to mention this! On my to-do list is combining w with the betas, which I think will alleviate the particular problem you are referencing (although please correct me if I am missing something). Next is marginalizing over individuals so I am looping through unique capture histories, not unique individuals. After that is deciding if I need to abandon data augmentation. As you can tell, however, I am taking baby steps….but if there are any written resources you can direct me to for making this more efficient, that would be outstanding. 

 

Thank you again.  

Rebecca 



Rebecca Taylor
Research Statistician
Alaska Science Center
U. S. Geological Survey

Pronouns: She/her
Please note I sometimes send and answer email outside normal work hours. 
This helps me with my schedule but is not intended to pressure others to do the same

From: Perry de Valpine <pdeva...@berkeley.edu>
Sent: Wednesday, October 19, 2022 7:55 AM

To: Taylor, Rebecca L <rebecc...@usgs.gov>
Cc: Daniel Gibson <gib...@vt.edu>; nimble-users <nimble...@googlegroups.com>
Subject: Re: [EXTERNAL] Re: specifying different data for first iteration only?
 

Perry de Valpine

unread,
Oct 26, 2022, 11:33:29 AM10/26/22
to Taylor, Rebecca L, Daniel Gibson, nimble-users
Hi Rebecca,

Very glad to hear you worked it out.  I know that feeling of having something not work and then somehow work when someone else tries or suggests it because something else must have been changed while you were working on it.

Your ideas sound good.  A place where some serious effort has gone into reducing computational cost of augmented individuals is in nimbleSCR, as reported in Turek et al. (2021): https://doi.org/10.1002/ecs2.3385

In spatial capture-recapture (SCR), even when augmented individuals are "not in the model", their activity centers will be sampled by MCMC, and ensuing calculations of distances to all detectors and detection probabilities will be done, which are potentially quite costly (time-consuming).  A method to reduce this was to put those calculations into nimbleFunction(s) (which we did for other reasons anyway) and then also accept the indicator variable as an argument to the nimbleFunction(s).  Then the nimbleFunction(s) can know if an individual is not currently in the model and if so can skip to much faster processing steps, essentially skipping some costly calculations.  I hope that makes sense.

Good luck with it.

Perry
 

Taylor, Rebecca L

unread,
Oct 26, 2022, 7:35:28 PM10/26/22
to Perry de Valpine, Daniel Gibson, nimble-users
That does make sense! Thank you so much, Perry.

Rebecca

Rebecca Taylor
Research Statistician
Alaska Science Center
U. S. Geological Survey

Pronouns: She/her
Please note I sometimes send and answer email outside normal work hours. 
This helps me with my schedule but is not intended to pressure others to do the same

From: Perry de Valpine <pdeva...@berkeley.edu>
Sent: Wednesday, October 26, 2022 7:33 AM
Reply all
Reply to author
Forward
0 new messages