simulate data with multistate CR nimble model

26 views
Skip to first unread message

Marwan Naciri

unread,
Dec 17, 2025, 12:19:57 PM12/17/25
to nimble-users
Hi nimble team,

I’m trying to simulate CR (and ultimately IPM) data from a nimble. While the CJS model worked fine, for some reason, I can’t get the multistate model to work: I keep getting multiple rows of the following warnings

[Warning] Dynamic index out of bounds: dcat(prob = S[z[i, t_minus_1], i, t_minus_1, 1:3])

[Warning] Dynamic index out of bounds: dcat(prob = E[z[i, t], i, t_minus_1, 1:3])

(with S and E being my transition and observation matrix, respectively).

I am fairly certain that my indexing is fine, and that I have provided all constants used in the loops (the nimble model code works fine when fitted to data simulated using R code). I checked whether I would be able to simulate data using the nimble model despite the warnings, but unsurprisingly, it doesn’t work: I get NA on the occasion of first capture, and NaN on all the following occasions in both the matrix of capture histories and the matix of latent states. I imagine I should specify the latent states at first capture or something along these lines, but I’m not sure how. What am I missing?

The code is attached.

Many thanks in advance,

Marwan


reproducible_example.R

Chris Paciorek

unread,
Dec 22, 2025, 11:08:54 AM12/22/25
to Marwan Naciri, nimble-users
hi Marwan,

Those messages are appearing as you build the model and set data and inits because the z and y values are all NA, as you know.

Then when you call `simulate`, one problem is that you need `includeData=TRUE` to force simulation into nodes that are flagged as data nodes.

But that won't solve the problem either because some of the z's are being set based on some of the y's, and you don't initialize those y's, and they are not in `dependencies`. E.g., nimble can't simulate `z[1,1]` based on `y[1,1]` because `y[1,1]` is only on the right-hand side of a model declaration and you don't initialize it anywhere.

If you make sure to initialize the y's that correspond to "first" and make sure those y's are in `dependencies`, then I think things will work.

-chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/c5844d43-c19e-4596-95b3-3b22db79113bn%40googlegroups.com.

Marwan Naciri

unread,
Jan 6, 2026, 10:44:47 AM (5 days ago) Jan 6
to nimble-users

Hi Chris,

Many thanks for your answer! I have now (i) added `includeData=TRUE` when calling  ` simulate `, (ii) added `“y” ` in `dependencies`, and (iii) added initial value of y (on “first” only).

This indeed got rid of the warning messages, and the z’s are now simulated correctly. However, the y’s on occasions following first capture are not simulated, in spite of `includeData=TRUE`(in fact I get the same result as when `includeData=F`). Any idea why?

The updated code is attached  

As always, thanks a lot for taking time to help,

Marwan

reproducible_example.R

Perry de Valpine

unread,
Jan 6, 2026, 4:57:22 PM (5 days ago) Jan 6
to Marwan Naciri, nimble-users
Hi Marwan,
Thanks for the follow-up. Here's the curious situation I see.
When setting data (via nimbleModel or later via the setData method of a model object), any nodes within a variable with NA values are *not set as data*. In your case, all of y is being set to NA via setData, so none of y is actually marked as data. This can be checked with

multistate_model$isData("y") # all FALSE!

Structurally, the y nodes look to nimble like posterior predictive nodes, so that is how they are labeled. This can be seen by

multistate_model$getNodeNames(predictiveOnly=TRUE) # includes all the y nodes!

By default, posterior predictive nodes are not included in the results of model$getDependencies, because they are often not wanted for purposes for which dependencies are used. So your set of simulation nodes does not actually include any nodes in the variable y. This can be seen by

length(grep("^y", "nodesToSim")) # returns 0!

I can see two ways you can do what you want.
Option 1:
Set the elements of y to some value like "1" instead of "NA", so they are actually marked as data. This is probably what you want for later purposes anyway. Those "1" values will not be used once you simulate over them. And since y nodes will now be marked as data, they should appear in the result of getDependencies (but please check).
Option 2:
Include the argument "includePredictive=TRUE" when you call getDependencies. This should include posterior predictive nodes like the y nodes. But if you proceed to do something like MCMC, you'll want them marked as data, in which case you should use Option 1.

I know it can seem hard to find one's way around the methods available in a model object. They are documented in help(modelBaseClass), which is linked from help(nimbleModel).

Also note you can do the simulation step on the uncompiled model if that is easier while you are experimenting to get things right.

HTH
Perry


Marwan Naciri

unread,
Jan 8, 2026, 9:33:37 AM (3 days ago) Jan 8
to nimble-users

Hi Perry,

Thank you for your answer. I have tried both options but none seem to work (unless I did something wrong). Whether or not non-NA values are set for all nodes of y, and whether or not I specify "includePredictive=TRUE" when I call getDependencies, the y’s never appear in the result of getDependencies, and predictably are not simulated when I call simulate.

One thing I noticed is that the z’s are simulated even though they are flagged as posterior predictive nodes (whether or not I specify "includePredictive=TRUE" when I call getDependencies).

Any idea what is going wrong for the y’s?

The reproducible example is attached.

Many thanks,

Marwan

reproducible_example.R

Perry de Valpine

unread,
Jan 8, 2026, 12:22:29 PM (3 days ago) Jan 8
to Marwan Naciri, nimble-users
Hi Marwan,
Thanks for the follow-up.
I see that I gave you a piece of code with a typo.

length(grep("^y", "nodesToSim"))
should have been
length(grep("^y", nodesToSim))

The first version would look for a string beginning with "y" in the string "nodesToSim", while the second would look in the contents of nodesToSim, which is what you want. I must have manually mis-typed it instead of copy-pasting it.

Both fixes can now be seen to work. You do not want to include "y" in your vector called "dependencies" (perhaps better named "parameters"?) because then it will be excluded from the result of multistate_model$getDependencies since you have self=FALSE.

Please let me know if you're good to go now. This looks like a case where the practice of reducing the model size might have aided in debugging. i.e. if n_ind and n_occasions were toy-sized, then the nodesToSim would be easier to inspect instead of being of order 10000s.

HTH
Perry


Marwan Naciri

unread,
Jan 9, 2026, 11:55:20 AM (2 days ago) Jan 9
to nimble-users

Hi Perry,

Thank you for your answer.

Ah I should have seen the typo, I  was too quick.

Indeed, now the y’s appear in the result of getDependencies. However, they are still not simulated and now even the z’s are not simulated anymore. Including “y” back in the ‘dependencies’ vector (now renamed ‘parameters’ as you suggested) seems to ensure that the z’s are simulated (though the pre-defined y values are still not simulated over).

Do you see what is wrong (the code is below, and  the model is toy-sized now)? Sorry for the hassle.

Many thanks,

Marwan

--------------------------------------------------------

library(nimble)

multistate_code <- nimbleCode({
  # Priors and constraints
  for (i in 1:n_ind){
    for (t in 1:(n_occasions-1)){
      phi[i, t] <- mu_phi
      beta[i, t] <- mu_beta
      p[i, t] <- mu_p
    } #t
  } #i
  mu_phi ~ dunif(0, 1)     # Prior for mean survival prob
  mu_beta ~ dunif(0, 1)     # Prior for mean breeding prob
  mu_p ~ dunif(0, 1)     # Prior for mean resighting prob
  
  # Define state-transition and observation matrices
  for (i in 1:n_ind){  
    # Define probabilities of state S(t+1) given S(t)
    for (t in first[i]:(n_occasions-1)){
      S[1, i, t, 1] <- phi[i, t]*(1-beta[i, t])
      S[1, i, t, 2] <- phi[i, t]*beta[i, t]
      S[1, i, t, 3] <- 1-phi[i, t]
      
      S[2, i, t, 1] <- phi[i, t]*(1-beta[i, t])
      S[2, i, t, 2] <- phi[i, t]*beta[i, t]
      S[2, i, t, 3] <- 1-phi[i, t]
      
      S[3, i, t, 1] <- 0
      S[3, i, t, 2] <- 0
      S[3, i, t, 3] <- 1
      
      # Define probabilities of O(t) given S(t)
      E[1, i, t, 1] <- p[i, t]
      E[1, i, t, 2] <- 0
      E[1, i, t, 3] <- 1-p[i, t]
      
      E[2, i, t, 1] <- 0
      E[2, i, t, 2] <- p[i, t]
      E[2, i, t, 3] <- 1-p[i, t]
      
      E[3, i, t, 1] <- 0
      E[3, i, t, 2] <- 0
      E[3, i, t, 3] <- 1
    } #t
  } #i
  
  # Likelihood 
  for (i in 1:n_ind){
    # Define latent state at first capture
    z[i, first[i]] <- y[i, first[i]]
    for (t in (first[i]+1):n_occasions){
      
      # State process: draw S(t) given S(t-1)
      z[i, t] ~ dcat(S[z[i, t-1], i, t-1, 1:3])
      
      # Observation process: draw O(t) given S(t)
      y[i, t] ~ dcat(E[z[i, t], i, t-1, 1:3])
    } #t
  } #i
})


# ~ 2. Define parameters -------------------------------------------------------

{
  n_occasions <- 3
  n_marked <- 3 
  n_ind <- n_marked*n_occasions - n_marked
  mu_phi <- 0.7
  mu_beta <- 0.6
  mu_p <- 0.4
}

# Bundle constants for the model
constants <- list(n_ind = n_ind,
                  n_occasions = n_occasions,
                  first = rep(1:(n_occasions-1), each = n_marked)
)


# Build the model
multistate_model  <-  nimbleModel(multistate_code,
                                  constants = constants)

# ~ 3. Simulate the data -------------------------------------------------------

# First option from Perry ++++++++++++++++++++++++++++++++++++++++++++++++++++++

# Initialize y on the first capture, put 1 on all following occasions
y <- matrix(data = NA, nrow = n_ind, ncol = n_occasions)
first <- rep(1:(n_occasions-1), each = n_marked)
for (i in 1:n_ind) {
  y[i, first[i]] <- sample(x = 1:2, size = 1, prob = c(0.4, 0.6))
  y[i, (first[i]+1):n_occasions] <- 1
}
# Set y as data
multistate_model$setData(list(y = y)) 
multistate_model$isData("y")  # y is marked as data

# Set initial values
initial_values <- list(mu_phi = mu_phi,
                       mu_beta = mu_beta,
                       mu_p = mu_p)
multistate_model$setInits(initial_values)



parameters <- c("mu_phi", "mu_beta", "mu_p", "y") 
nodesToSim <- multistate_model$getDependencies(parameters,
                                               self = F, 
                                               downstream = T)

# Compile the model 
c_multistate_model <- compileNimble(multistate_model) 

# simulate
list.simul <- list()
# simulate 2 datasets a
for (i in 1:2) {
  set.seed(i)
  c_multistate_model$simulate(nodes = nodesToSim,
                              includeData = T)  # forced simulation for nodes flagged as data
  list.simul[[i]] <- list(y = c_multistate_model$y,
                          z = c_multistate_model$z)
  print(i)  
}

list.simul[[1]]$y
list.simul[[1]]$z

-----------------------------------------------------

Perry de Valpine

unread,
Jan 9, 2026, 1:08:32 PM (2 days ago) Jan 9
to Marwan Naciri, nimble-users
Hi Marwan,
I think this next level of issue is coming from the "first" cases.
For reference:

> first
[1] 1 1 1 2 2 2
(makes sense)

When I do:

> parameters <- c("mu_phi", "mu_beta", "mu_p")

> nodesToSim <- multistate_model$getDependencies(parameters,
                                               self = F,
                                               downstream = T)
> nodesToSim
  [1] "phi[1, 1]"     "phi[1, 2]"     "phi[2, 1]"     "phi[2, 2]"     "phi[3, 1]"     "phi[3, 2]"     "phi[4, 1]"    
  [8] "phi[4, 2]"     "phi[5, 1]"     "phi[5, 2]"     "phi[6, 1]"     "phi[6, 2]"     "beta[1, 1]"    "beta[1, 2]"  
 [15] "beta[2, 1]"    "beta[2, 2]"    "beta[3, 1]"    "beta[3, 2]"    "beta[4, 1]"    "beta[4, 2]"    "beta[5, 1]"  
 [22] "beta[5, 2]"    "beta[6, 1]"    "beta[6, 2]"    "p[1, 1]"       "p[1, 2]"       "p[2, 1]"       "p[2, 2]"      
 [29] "p[3, 1]"       "p[3, 2]"       "p[4, 1]"       "p[4, 2]"       "p[5, 1]"       "p[5, 2]"       "p[6, 1]"      
 [36] "p[6, 2]"       "S[1, 1, 1, 3]" "S[2, 1, 1, 3]" "S[1, 1, 2, 3]" "S[2, 1, 2, 3]" "S[1, 2, 1, 3]" "S[2, 2, 1, 3]"
 [43] "S[1, 2, 2, 3]" "S[2, 2, 2, 3]" "S[1, 3, 1, 3]" "S[2, 3, 1, 3]" "S[1, 3, 2, 3]" "S[2, 3, 2, 3]" "S[1, 4, 2, 3]"
 [50] "S[2, 4, 2, 3]" "S[1, 5, 2, 3]" "S[2, 5, 2, 3]" "S[1, 6, 2, 3]" "S[2, 6, 2, 3]" "S[1, 1, 1, 1]" "S[1, 1, 1, 2]"
 [57] "S[2, 1, 1, 1]" "S[2, 1, 1, 2]" "S[1, 1, 2, 1]" "S[1, 1, 2, 2]" "S[2, 1, 2, 1]" "S[2, 1, 2, 2]" "S[1, 2, 1, 1]"
 [64] "S[1, 2, 1, 2]" "S[2, 2, 1, 1]" "S[2, 2, 1, 2]" "S[1, 2, 2, 1]" "S[1, 2, 2, 2]" "S[2, 2, 2, 1]" "S[2, 2, 2, 2]"
 [71] "S[1, 3, 1, 1]" "S[1, 3, 1, 2]" "S[2, 3, 1, 1]" "S[2, 3, 1, 2]" "S[1, 3, 2, 1]" "S[1, 3, 2, 2]" "S[2, 3, 2, 1]"
 [78] "S[2, 3, 2, 2]" "S[1, 4, 2, 1]" "S[1, 4, 2, 2]" "S[2, 4, 2, 1]" "S[2, 4, 2, 2]" "S[1, 5, 2, 1]" "S[1, 5, 2, 2]"
 [85] "S[2, 5, 2, 1]" "S[2, 5, 2, 2]" "S[1, 6, 2, 1]" "S[1, 6, 2, 2]" "S[2, 6, 2, 1]" "S[2, 6, 2, 2]" "E[1, 1, 1, 1]"
 [92] "E[1, 1, 1, 3]" "E[2, 1, 1, 2]" "E[2, 1, 1, 3]" "E[1, 1, 2, 1]" "E[1, 1, 2, 3]" "E[2, 1, 2, 2]" "E[2, 1, 2, 3]"
 [99] "E[1, 2, 1, 1]" "E[1, 2, 1, 3]" "E[2, 2, 1, 2]" "E[2, 2, 1, 3]" "E[1, 2, 2, 1]" "E[1, 2, 2, 3]" "E[2, 2, 2, 2]"
[106] "E[2, 2, 2, 3]" "E[1, 3, 1, 1]" "E[1, 3, 1, 3]" "E[2, 3, 1, 2]" "E[2, 3, 1, 3]" "E[1, 3, 2, 1]" "E[1, 3, 2, 3]"
[113] "E[2, 3, 2, 2]" "E[2, 3, 2, 3]" "E[1, 4, 2, 1]" "E[1, 4, 2, 3]" "E[2, 4, 2, 2]" "E[2, 4, 2, 3]" "E[1, 5, 2, 1]"
[120] "E[1, 5, 2, 3]" "E[2, 5, 2, 2]" "E[2, 5, 2, 3]" "E[1, 6, 2, 1]" "E[1, 6, 2, 3]" "E[2, 6, 2, 2]" "E[2, 6, 2, 3]"
[127] "z[1, 2]"       "z[2, 2]"       "z[3, 2]"       "z[4, 3]"       "z[5, 3]"       "z[6, 3]"       "z[1, 3]"      
[134] "y[1, 2]"       "z[2, 3]"       "y[2, 2]"       "z[3, 3]"       "y[3, 2]"       "y[4, 3]"       "y[5, 3]"      
[141] "y[6, 3]"       "y[1, 3]"       "y[2, 3]"       "y[3, 3]"   

I see that this includes only the elements of z that come *after* z[i, first[i]]. That makes sense because the model declares:
z[i, first[i]] <- y[i, first[i]]

So the z[i, first[i]] elements do not depend on mu_phi, mu_beta, or mu_p.

When you include "y" in the "parameters", you are getting the z[i, first[i] ] elements as dependents because they depend on elements of y, but you are also excluding y from the dependents because you have self=FALSE (to exclude mu_phi, mu_beta, and mu_p, which you don't want).

In addition, I see that z is all NA to start out:

> multistate_model$z
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA
[3,]   NA   NA   NA
[4,]   NA   NA   NA
[5,]   NA   NA   NA
[6,]   NA   NA   NA

Therefore when you call multistate_model$simulate(nodesToSim, includeData=TRUE), you are getting garbage because the z[i, first[i]] nodes have no value and are not included in what to simulate (which means calculate for deterministic nodes).

Here are a variety of ways to solve the problem:

1. Provide data when you call nimbleModel. By default, calculate=TRUE in nimbleModel, which means that after assigning any provided initial values and data, there will be a call to calculate on the entire model. That will include the "z[i, first[i]] <- y[i, first[i]]" relationships. This needs y[i, first[i] ] values if that declaration is going to populate the z[i, first[i] ] with meaningful values. Example:

multistate_model  <-  nimbleModel(multistate_code,
                                  constants = constants,
                                  data = list(y = y))
multistate_model$setInits(initial_values) # also could do this in nimbleModel
parameters <- c("mu_phi", "mu_beta", "mu_p")

nodesToSim <- multistate_model$getDependencies(parameters,
                                               self = F,
                                               downstream = T)
multistate_model$simulate(nodesToSim, includeData=TRUE)
multistate_model$z
multistate_model$y
     [,1] [,2] [,3]
[1,]    1    3    3
[2,]    2    2    3
[3,]    2    3    3
[4,]   NA    2    1
[5,]   NA    1    2
[6,]   NA    2    3
> multistate_model$y
     [,1] [,2] [,3]
[1,]    1    3    3
[2,]    2    3    3
[3,]    2    3    3
[4,]   NA    2    3
[5,]   NA    1    3
[6,]   NA    2    3
Looks like that worked

2. Build the model as you were but call calculate again after setting the initial values and data, so that the
"z[i, first[i]] <- y[i, first[i]]" relationships will populate z[i, first[i]] with meaningful values. Example

multistate_model  <-  nimbleModel(multistate_code,
                                  constants = constants)
multistate_model$setData(list(y = y))
multistate_model$setInits(initial_values)
multistate_model$calculate()
parameters <- c("mu_phi", "mu_beta", "mu_p")

nodesToSim <- multistate_model$getDependencies(parameters,
                                               self = F,
                                               downstream = T)
multistate_model$simulate(nodesToSim, includeData=TRUE)
> multistate_model$z
     [,1] [,2] [,3]
[1,]    1    2    1
[2,]    2    3    3
[3,]    2    2    3
[4,]   NA    2    3
[5,]   NA    1    2
[6,]   NA    2    2
> multistate_model$y
     [,1] [,2] [,3]
[1,]    1    3    3
[2,]    2    3    3
[3,]    2    2    3
[4,]   NA    2    3
[5,]   NA    1    3
[6,]   NA    2    2
Looks like that worked

3. Provide the z[i, first[i]] values as data or inits, in which case you don't need the model declaration for them. I'll leave that to you if you want to set it up.

4. Include y[i, first[i] ] in the parameters from which you will getDependencies

multistate_model  <-  nimbleModel(multistate_code,
                                  constants = constants)
multistate_model$setData(list(y = y))
multistate_model$setInits(initial_values)
multistate_model$calculate()
parameters <- c("mu_phi", "mu_beta", "mu_p")
for(i in 1:n_ind) {
  parameters <- c(parameters, paste0("y[", i, ", ", first[i], "]"))

}
nodesToSim <- multistate_model$getDependencies(parameters,
                                               self = F,
                                               downstream = T)
multistate_model$simulate(nodesToSim, includeData=TRUE)
> multistate_model$z
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    1    3
[3,]    2    2    3
[4,]   NA    2    2
[5,]   NA    1    2
[6,]   NA    2    2
> multistate_model$y
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    2    3    3
[3,]    2    2    3
[4,]   NA    2    2
[5,]   NA    1    2
[6,]   NA    2    3
Looks like that worked.
A safer way to create the y[i, first[i]] node names (e.g. "y[1, 1]") is to put the result of the paste0 as the first argument to multistate_model$expandNodeNames, which will format the name string correctly if needed.

5. Manually add all "z" elements to what should be simulated since you can just see it is necessary. (As a general rule this kind of idea is less safe in terms of model-generic programming, but it seems practical in your case.)
multistate_model  <-  nimbleModel(multistate_code,
                                  constants = constants)
multistate_model$setData(list(y = y))
multistate_model$setInits(initial_values)
multistate_model$calculate()
parameters <- c("mu_phi", "mu_beta", "mu_p")

nodesToSim <- multistate_model$getDependencies(parameters,
                                               self = F,
                                               downstream = T)
nodesToSim <- c(nodesToSim, multistate_model$expandNodeNames("z")) |> unique() |> multistate_model$topologicallySortNodes()
multistate_model$simulate(nodesToSim, includeData=TRUE)

> multistate_model$z
     [,1] [,2] [,3]
[1,]    1    1    2
[2,]    2    3    3
[3,]    2    3    3
[4,]   NA    2    3
[5,]   NA    1    1
[6,]   NA    2    2
> multistate_model$y
     [,1] [,2] [,3]
[1,]    1    1    3
[2,]    2    3    3
[3,]    2    3    3
[4,]   NA    2    3
[5,]   NA    1    1
[6,]   NA    2    3
Looks like that worked.

I hope something among those solutions makes sense and will work for you (and I hope I didn't make any typos this time).



Perry de Valpine

unread,
Jan 9, 2026, 1:18:31 PM (2 days ago) Jan 9
to Marwan Naciri, nimble-users
P.S. Since this has evidently been confusing and other readers may be interested, I'd like to offer this piece of code as a suggestion to track exactly what is going on during a call to model$simulate

for(node in nodesToSim) {
  cat("About to simulate ", node, ". Current value is ", multistate_model[[node]], "\n")
  multistate_model$simulate(node)
  cat("After simulating ", node, ", new value is ", multistate_model[[node]], "\nPlease inspect model and hit 'c <Enter>' to continue.\n")
  browser()
}

Potentially this can allow one to see exactly what is missing or where a sequence of simulation steps goes off the rails with NA or NaN values.

HTH
Perry

Reply all
Reply to author
Forward
0 new messages