Categorical sampler and z-matrix outputs

Skip to first unread message

Robin Maritz

Apr 7, 2023, 6:12:12 AM4/7/23
to nimble-users
Hi NIMBLE community,

I am trying to understand the outputs of my multistate capture-recapture model, and I feel I have exhausted all other troubleshooting options. I hope you can assist!

In summary, I had been trying to troubleshoot oddly low population size estimates when I became confused by the categorical sampling of the state process matrix. When I monitor the underlying deterministic parameters and calculate out the probabilities per iteration of sampling (i.e. essentially build the state matrix for each iteration), it appears that certain transitions should be very likely (e.g. 2>2). However, the monitored z-matrix does not suggest this pattern and rather state 7 is the more likely state in most instances. There are no issues in the z-matrix for individuals that have known states assigned the question relates to instances where sampling occurs. 

What I am trying to understand is how the sampling process could produce a z-matrix that is so different from what the calculated probability matrix suggest. The sampling never behaves aberrantly (i.e. unallowed transitions (e.g. 2>1) never occur) so I don't think it is some sort of indexing or referencing issue.  Is there something obvious that I am missing here? 

For context, the purpose of the model is to estimate abundance of a species across N-occasions while accounting for subpopulation-, time-, and individual-dependent covariates. It becomes quite complex and lengthy so I don't want to burden you with detail on the data, model structure and initialization. The important components I think are the following:

1) nind = N (real individuals) + N (pseudo-individuals) # data augmentation with zeros
2) all individuals start in latent state 1 within occasion 1 (dummy occasion)
3) a z.init matrix is provided which has all unknown states as 2 (assume still alive) for real individuals and 1 (not yet entered) for pseudo-individuals 
4) inputted data are y (observation matrix); z (state matrix, with NAs for unknown states)
5) Not all state transitions are possible and probabilities are outlined below along with the likelihood portion of the nimbleCode. 

# -------------------------------------------------------------------
for (i in 1:nind){
    # Define probabilities of state S(t+1) given S(t)
    for (t in 1:(n.occasions-1)){
      ps[1,i,t,1] <- (1-gamma[t]) * (1-intro[t])
      ps[1,i,t,2] <- gamma[t]
      ps[1,i,t,3] <- intro[t] * (1-gamma[t])
      ps[1,i,t,4] <- 0
      ps[1,i,t,5] <- 0
      ps[1,i,t,6] <- 0
      ps[1,i,t,7] <- 0
      ps[2,i,t,1] <- 0
      ps[2,i,t,2] <- s[i,t] * (1-tran[t])
      ps[2,i,t,3] <- 0
      ps[2,i,t,4] <- (1-s[i,t]) * pp[t] * r[i,t]
      ps[2,i,t,5] <- (1-s[i,t]) * (1-pp[t]) * r[i,t]
      ps[2,i,t,6] <- s[i,t] * tran[t]
      ps[2,i,t,7] <- (1-s[i,t]) * (1-r[i,t])
      ps[3,i,t,1] <- 0
      ps[3,i,t,2] <- 1
      ps[3,i,t,3] <- 0
      ps[3,i,t,4] <- 0
      ps[3,i,t,5] <- 0
      ps[3,i,t,6] <- 0
      ps[3,i,t,7] <- 0
      ps[4,i,t,1] <- 0
      ps[4,i,t,2] <- 0
      ps[4,i,t,3] <- 0
      ps[4,i,t,4] <- 0
      ps[4,i,t,5] <- 0
      ps[4,i,t,6] <- 0
      ps[4,i,t,7] <- 1
      ps[5,i,t,1] <- 0
      ps[5,i,t,2] <- 0
      ps[5,i,t,3] <- 0
      ps[5,i,t,4] <- 0
      ps[5,i,t,5] <- 0
      ps[5,i,t,6] <- 0
      ps[5,i,t,7] <- 1
      ps[6,i,t,1] <- 0
      ps[6,i,t,2] <- 0
      ps[6,i,t,3] <- 0
      ps[6,i,t,4] <- 0
      ps[6,i,t,5] <- 0
      ps[6,i,t,6] <- 0
      ps[6,i,t,7] <- 1
      ps[7,i,t,1] <- 0
      ps[7,i,t,2] <- 0
      ps[7,i,t,3] <- 0
      ps[7,i,t,4] <- 0
      ps[7,i,t,5] <- 0
      ps[7,i,t,6] <- 0
      ps[7,i,t,7] <- 1

      # Define probabilities of O(t) given S(t)
      po[1,i,t,1] <- 0
      po[1,i,t,2] <- 0
      po[1,i,t,3] <- 0
      po[1,i,t,4] <- 0
      po[1,i,t,5] <- 0
      po[1,i,t,6] <- 1
      po[2,i,t,1] <- p[i,t]
      po[2,i,t,2] <- 0
      po[2,i,t,3] <- 0
      po[2,i,t,4] <- 0
      po[2,i,t,5] <- 0
      po[2,i,t,6] <- 1-p[i,t]
      po[3,i,t,1] <- 0
      po[3,i,t,2] <- 1
      po[3,i,t,3] <- 0
      po[3,i,t,4] <- 0
      po[3,i,t,5] <- 0
      po[3,i,t,6] <- 0
      po[4,i,t,1] <- 0
      po[4,i,t,2] <- 0
      po[4,i,t,3] <- 1
      po[4,i,t,4] <- 0
      po[4,i,t,5] <- 0
      po[4,i,t,6] <- 0
      po[5,i,t,1] <- 0
      po[5,i,t,2] <- 0
      po[5,i,t,3] <- 0
      po[5,i,t,4] <- 1
      po[5,i,t,5] <- 0
      po[5,i,t,6] <- 0
      po[6,i,t,1] <- 0
      po[6,i,t,2] <- 0
      po[6,i,t,3] <- 0
      po[6,i,t,4] <- 0
      po[6,i,t,5] <- 1
      po[6,i,t,6] <- 0
      po[7,i,t,1] <- 0
      po[7,i,t,2] <- 0
      po[7,i,t,3] <- 0
      po[7,i,t,4] <- 0
      po[7,i,t,5] <- 0
      po[7,i,t,6] <- 1
    } #t
  } #i

  # Likelihood
  for (i in 1:nind){
    z[i,1] <- 1  # Define latent state at first capture
    for (t in 2:n.occasions){
      # State process: draw S(t) given S(t-1)
      z[i,t] ~ dcat(ps[z[i,t-1], i, t-1, 1:7])
      # Observation process: draw O(t) given S(t)
      y[i,t] ~ dcat(po[z[i,t], i, t-1, 1:6])
    } #t
  } #i

I am happy to provide a reproducible example if helpful. I just thought I would put the question out there first to see if there was something obvious that I was missing or if anyone has ever encountered something like this.

Thanks in advance,

Daniel Turek

Apr 7, 2023, 11:58:32 AM4/7/23
to Robin Maritz, nimble-users
Robin, good question.  The categorical sampler functions in a reasonably straightforward manner.  Say:

z[1] ~ dcat(prior)
z[2] ~ dcat( f(z[1]) )
y[1] ~ dcat( g(z[1] )

Then the (categorical) sampling of z[1] will evaluate the (proportional) conditional distribution: p( Z[1]   |   z[2], y[1] )

for all possible values of Z[1].  For the value Z[1] = a, this conditional distribution is proportional to:

p(z[2] | Z[1]=a) * p(y[1] | Z[1] = a),

where both terms come from the categorical distributions in the model.

This is a long-winded way of saying: the categorical sampler samples directly from the full conditional (posterior) distribution of each z[] term.  So, to explain the behavior you're seeing, despite what the elements of the transition matrix might suggest, the terms from the likelihood (the observed values of y), and the downstream latent z values (which themselves are informed by the downstream data) must suggest a different distribution for the latent states you're monitoring.

I have reasonable confidence that the categorical sampler is functioning correctly, for a few reasons including that it's not a new addition, and it's been tested in many different situations. That said, there's always a chance that something funny is going on with your model.  If you really want to get to the bottom of it, since you have the samples of the latent z's, you can recreate (calculate yourself) the exact (proportional) distribution that the categorical sampler is sampling from, and see what the actual (proportional) probabilities are.  This might be insightful regarding what's informing the latent state values, and why.

I hope this helps somewhat.  Maybe someone else will have a more practically helpful response.

Cheers, have a nice weekend,

You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

PierGianLuca Porta Mana

Apr 7, 2023, 4:51:22 PM4/7/23
to, Robin Maritz
I don't know if it can be related to Robin's issue, but I recently experienced some strange behaviour with the categorical sampler. It was outputting values that should have had strictly zero probability, as their prior probability was zero. I tried to understand what was happening; the problem seems to arise from extremely small values of the likelihood, leading to very small *unnormalized* posterior categorical probabilities.

Replacing the categorical sampler with the slice one solved the problem for me. Robin, would it be possible in your case to try that too?

I plan to report reproducible details within a couple of weeks. Nimble version is the latest.

>> <>
>> .

Heather Gaya

Apr 7, 2023, 4:51:26 PM4/7/23
to Robin Maritz, nimble-users
From a less technical perspective, do your priors for s[i,t] and r[i,t] seem reasonable? I assume you've been wise in your prior choice, but it may be of interest to monitor these parameters and run a prior predictive check. An easy way to do that is to only provide NAs instead of data for all y and z, and see what the model estimates. Without any data, it should simply return the joint prior distribution, allowing for a quick check of how all the distributions are interacting. 


Daniel Turek

Apr 7, 2023, 5:43:29 PM4/7/23
to PierGianLuca Porta Mana,, Robin Maritz
Luca, I would be very interested to take a look at a reproducible example demonstrating the categorical sampler misbehaving as you've described, if/whenever you're able to put one together.  Thanks for describing the situation, also.



May 5, 2023, 2:08:33 PM5/5/23
to Daniel Turek,
Hi Daniel,

On 230407 23:42, Daniel Turek wrote:
> Luca, I would be very interested to take a look at a reproducible example demonstrating the categorical sampler misbehaving as you've described, if/whenever you're able to put one together.  Thanks for describing the situation, also.

Very sorry for taking so long, was submerged with other work.

I'm enclosing a minimal code sample and data that present the strange behaviour of dcat:

• 'dcat_strange.R' has the code.

• 'testWbad.csv' has input values (data) that lead to the strange behaviour.

• 'testWgood.csv' has input values that don't lead to the strange behaviour.

Sorry for the cryptic names and strange values in the code, they come from a larger code implementing a finite mixture (discussed in some old posts here).

The problem and some context:

The integer variable 'Alphaindex' has zero prior probability for the values 1 to 3 and 8 to 11, and 0.25 for the values from 4 to 7. Its initial value is 6.

The sampler, however, outputs values (specifically Alphaindex=1) in the prior-forbidden range, when the 'W'-data from the file 'testWbad.csv' are used. The real problem is that strangely I don't receive any warnings about NaNs or similar.

The vector variable 'W' which appears as data in this code is actually a stochastic variable in the original code. The original code starts well, all 'Alphaindex' values stay in the allowed prior range at first. At some iteration, as 'W' goes from the 'testWgood' values to the 'testWbad' ones, I see 'Alphaindex' jumping from the allowed values to the forbidden ones. No warnings as this happens.

Maybe a better example code to show the problem would make 'W' transition from 'good' to 'bad' values at some iteration, to show the jump in 'Alphaindex', but I didn't know how to implement this in a nice way.

Let me know if anything is unclear!

> >> email to <>.
> >> < <>>
> >> .
> >>
> >
> --
> You received this message because you are subscribed to the Google Groups "nimble-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to <>.
> To view this discussion on the web visit <>.


May 5, 2023, 2:12:02 PM5/5/23
to Daniel Turek,
PS: I forgot to say: if I replace the (automatically assigned) categorical sampler for Alphaindex with the slice one, the problem does not appear.

Daniel Turek

May 5, 2023, 8:54:55 PM5/5/23
to PierGianLuca,
Luca, thanks for following up.

The short answer is that the 0 appearing in W (specifically, in element W[32], in testWbad.csv) is causing the problem.  With the 0 element of W, the density evaluation of W returns Inf, which throws the categorical sampler into nonsensical sampling results.

You can also note that something is off, and perhaps we're not in a valid starting location, by running:


and noting the result is Inf.  Also noting that when you use the testWgood.csv input file, this calculation returns a real number.

The operation of the density evaluation of ddirch() in the model does appear to be correct, which feels a little odd to me since it seems the support of the dirichlet distribution has elements in [0, 1] (inclusive), but if you look at the dirichlet PDF, it's not particularly well-defined when an element is exactly 0.

Also, I note the operation of nimble's ddirch density evaluation functions identically to that of LaplacesDemon::ddirichlet, and I don't think there's any issue there.

You mentioned that W is stochastic (non-fixed data) in the larger model.  I'm guessing that in the course of sampling, some of the alpha values are so small that when W is drawn, this element rounds down to 0 (which is just a numerical rounding issue).  There are ways you could work around this, by enforcing that values for W are at a minimum some small value (for example 1E-100, or something like that).  Approaches to doing this would depend on the larger model structure, and the samplers operating on it.  Anyway, that should avoid the problem you're seeing.

I hope this helps clear things up.  Keep at it,



May 6, 2023, 4:26:22 AM5/6/23
to Daniel Turek,
Thank you for the explanation, Daniel!

I understand the main problem. Two things still puzzle me:

1. The sampled Alphaindex value has a logprob=-Inf, or maybe a -Inf+Inf because of the effect of the W having zero value; in any case, there is a -Inf for sure somewhere. Yet there are no warnings about this.

2. If the slice sampler is used instead of the categorical one, it never outputs Alphaindex values that have zero prior probability – even if some W values are zero.

In the original larger algorithm I see that some W value can become 0 at some isolated iterations, yet the *slice* sampler always outputs legitimate values for Alphaindex. The final posterior results look reasonable despite the occasional zero W. With the categorical sample, instead, forbidden Alphaindex values are produced, and the posterior results do have some quirks – that's how I found out this strange behaviour. (I'm enclosing two plots showing the values of min(log(W)) and Alphaindex, just in case you're curious.)

But maybe this behaviour of the categorical sampler is actually saving me from some bad error or nonsense!

I'll try to investigate more and correct the zero-W behaviour, maybe using bounds as you suggested.

In any case, I leave this matter and its relevance to you – it is not a problem for me at this point, and there are probably higher priorities for making Nimble even better :) If in the future you need more data about this behaviour, just let me know.

Thank you again for your kind help!
> >>      >> email to <> < <>>.
> >>      >> To view this discussion on the web visit
> >>      >> <> < <>>
> >>      >> < <> < <>>>
> >>      >> .
> >>      >>
> >>      >
> >>
> >>     --     You received this message because you are subscribed to the Google Groups "nimble-users" group.
> >>     To unsubscribe from this group and stop receiving emails from it, send an email to <> < <>>.
> >>     To view this discussion on the web visit <> < <>>.
> >>

Daniel Turek

May 6, 2023, 12:39:46 PM5/6/23
to PierGianLuca,
Thank you for the follow up, Luca.  I'm glad you're all set moving forward, and I'll aim to investigate this further down the road to see just what is causing this discrepancy.

Thanks also for keeping in touch about any odd-looking results, and your clear explanations and detective work.


Reply all
Reply to author
0 new messages