Building HMMs with continuous observation densities in NIMBLE

95 views
Skip to first unread message

Alexander Thomas

unread,
Mar 5, 2025, 6:31:01 PMMar 5
to nimble-users
HI there, 

I am looking to see if I can use NIMBLE to build a HMM that uses irregularly spaced observation times, that I would ideally like to model continuously as per Rushing (2023) (https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.13902

I was unable to identify how to do this in the documentation, is this possible? If so is there any documentation available on how to do this in NIMBLE?

Thanks, 
Alex 

Perry de Valpine

unread,
Mar 5, 2025, 6:35:32 PMMar 5
to Alexander Thomas, nimble-users
Dear Alex,
It should be possible. You may need to say more unless someone else can respond who is already familiar with the paper you've linked. One way people build HMMs is with dcat distributions and discrete latent states. Another way is with the marginal HMM distributions in nimbleEcology. If you need to get inside how the marginal HMM probabilities are calculated, you can look at the nimbleFunctions in the nimbleEcology source code, which read like R (and can be run and debugged uncompiled in R). I hope that points you in a starting direction.
Perry


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/fc005f80-fc0b-4e7e-933b-f7f595652e03n%40googlegroups.com.

Perry de Valpine

unread,
Mar 5, 2025, 6:44:35 PMMar 5
to Alexander Thomas, nimble-users
I meant also to point you to our collection of workshop training materials, in case you haven't found it: From r-nimble.org, follow the link to GitHub and then to Workshop materials. There are some modules from some workshops with ecological content including HMMs. HTH.
-Perry

Russell Almond

unread,
Mar 28, 2025, 8:27:11 PMMar 28
to nimble-users
The trick is to use an index for observation times (which could be different for each individual if you are working with panel data like I am), and pass in a constant array of time difference to work with.  Here is the heart of a simple model for non-stationary Brownian motion.

```
brownianModelCode1 <- nimbleCode({
  for (i in 1L:N) {
    theta[i,1L] ~ dnorm(mu_0,sigma_0)
    for (it in 2L:(NT)) {
      theta[i,it] ~dnorm(theta[i,it-1L] + growthRate[i]*deltaT[i,it-1L],
                                      growthSD[i]*sqrt(deltaT[i,it-1L]))
    }
  }
## Emission model part omitted.
```

I'm not familiar with Rushing(2003), but I hope this gets you pointed in the right direction.

Devin Johnson - NOAA Federal

unread,
Jul 30, 2025, 2:45:04 PMJul 30
to nimble-users
I happed to stumble upon this thread and I am familiar with the Rushing  (2023) paper. The main problem in implementing those CTMC or Markov modulated Poisson process models is that there is no matrix exponential function in NIMBLE. So you would also have to code that with a nimbleFunction or augment your time with a sufficiently fine grid of times and use Euler time discretization approximation. 

Paul vdb

unread,
Jul 30, 2025, 3:30:50 PMJul 30
to nimble-users
It is much more efficient to compute exp(A) %*% v, than to take the entire matrix exponential. RTMB has a function called expAv that I've written in NIMBLE recently.

I'll try to explain how to use it here. Similar to Rushing 2023, where we have Q for a transition matrix and a diagonal observation rate matrix Lambda, I'll denote p0 as the vector of initial probabilities for each state, and f_i as the vector of observed probabilities for the ith observation, given each state. When the latent states are alive, dead, f_i = c(1, 0) when it is observed. For times t_i, and a study that ends at time T the likelihood becomes,

L = p0 %*% [prod exp((Q - Lambda) t_i - t_i-1) %*% Lambda %*% f_i ]  exp((Q - Lambda (T-t_n_))] 1 

The important thing to recognize is that, v_i = Lambda %*% f_i, is a vector. Then you see that the trick for never actually computing the matrix exponential but only a matrix exponential times a vector applies. For A_i = (Q - Lambda)*(t_i-t_i-1), we simply need to compute exp(A_i) %*% v_i. The final right censored observation is simply exp(A_n) %*% 1. Thus, we can use this nice little trick to avoid full matrix exponentials. I've written two versions, one that does not use sparse linear algebra but is plenty fast in the case of a small number of states, and one that implements a basic version of linear algebra where the non-zero values are passed by reference of row i and column j. Note that this function is something I wrote a few weeks ago and I've not added all the bells and whistles plus full testing so make sure you check that it is working correctly for you (compare with expm in R). You can manually change rescale_freq and tol depending on what speed/tolerance in the approximation you are after and check out RTMB::expAv for details on how it works. You will have to write a custom distribution function to use it which you can read about in Chapter 12 Creating user-defined distributions and functions for models | NimbleUserManual.knit

Let me know if there are any issues with this. I don't think we will get this into the upcoming release of NIMBLE but hopefully it will be in the following release. If you have a small matrix exponential and want to just compute it, you can always do a `nimbleRcall` and use the matrix exponential from the expm package in your code with a cost to efficiency.


expmAvR <- nimbleFunction(
  run = function(A = double(2), v = double(2)){ ## Note here v is a matrix with a single column.
    N <- 100
    tol <- 1e-8
    rescale_freq <- 50
    C <- max(abs(diag(A)))
    diag(A) <- diag(A) + C
    N <- qpois(tol, C, lower.tail=FALSE)
    ans <- v
    term <- v
    log_scale <- 0
    for (n in 1:N) {
      term <- A %*% term / n
      ans <- ans + term
      if (!(n %% rescale_freq)) {
        s <- sum(abs(ans))
        term <- term / s
        ans <- ans / s
        log_scale <- log_scale + log(s)
      }
    }
    ans <- exp(-C + log_scale) * ans
    returnType(double(1))
    return(ans[,1])
  }
)

Sparse version:

```R
expmAvsR <- nimbleFunction(
  run = function(i = integer(1), j = integer(1), vals = double(1), v = double(1)){
    q <- length(v)
    N <- 100
    tol <- 1e-8
    rescale_freq <- 50
    C <- max(abs(vals))
    m <- length(i)
    vals[i == j] <- vals[i == j] + C
    N <- qpois(tol, C, lower.tail=FALSE)
    ans <- v
    term <- v
    log_scale <- 0
    for (n in 1:N) {
      tmp <- nimNumeric(0, length = q)
      for( k in 1:m ){
        tmp[i[k]] <- tmp[i[k]] + term[j[k]]*vals[k]
      }
      term <- tmp / n
      ans <- ans + term
      if (!(n %% rescale_freq)) {
        s <- sum(abs(ans))
        term <- term / s
        ans <- ans / s
        log_scale <- log_scale + log(s)
      }
    }
    ans <- exp(-C + log_scale) * ans
    returnType(double(1))
    return(ans)
  }
)
```

Paul vdb

unread,
Jul 30, 2025, 5:40:34 PMJul 30
to nimble-users
To follow up here is a worked example from Rushing 2023 based on Appendix S2. 
RushingExample.r

Russell Almond

unread,
Jul 31, 2025, 2:54:17 PMJul 31
to nimble-users
I found Sheldon Ross's _Introduction to Probability_ helpful here.

The trick is that $e^{Qt} = \lim_{n \rightarrow \infty} (I+ (t/n)Q)^n$.

So pick $n=2^v$

```
eQt <- function (Q,t, v=10) {
   R <- diag(nrow(Q)) - t/2^v*Q
  for (vv in 1:v)
     R <- R %*% R
  R
}
```

Alexander Thomas

unread,
Aug 5, 2025, 1:59:39 AMAug 5
to nimble-users
Hi all, 

I greatly appreciate you taking the time to consider this problem. I will shortly begin attempting to implement my model following your above suggestions and will let you know how I go. 

I am hoping to use NIMBLE, but my application is outside of capture recapture data. The data is a large observational cohort of humans (~70 k people) where patients have been observed irregularly over a long period of time (average 4 visits per person, with a median interval of 1.2 years).  The model I plan to implement contains 9 states and 15 transitions specifying the disease progression of patients.  
I am utilising a CT-HMMs with MCMC for modelling humans being infected and recovering from hepatitis C. The reason I am using this type of model is to account for nuance in the data. We don't observe patients regularly and a consequence of this is that participants are likely to become infected with the disease then recover without us ever observing them as infected.  To circumvent this I am hoping to specify the model with misclassification of states, then utilize MCMC estimation of the CTHMM likelihood to ascertain estimates of the transition between states.  My hope is that this will account for changes in test frequency over time, variation in test frequency between patients and allow for possible unobserved infections.  There are some more complexity around the disease model and the potential states which I wont bore you with, however I remain interested if you still believe that the above matrix exponentials can be applied to a model such as this. 

Warm regards, 
Alex 

Russell Almond

unread,
Aug 5, 2025, 3:33:52 PMAug 5
to nimble-users
I too am working with panel data with irregular measurements, and I gave up on the NIMBLE particle filter as it would only allow me one particle for all the individuals in my sample rather than creating particles within subjects.

I've started work on an R package which handles this differently.  It is up at https://github.com/ralmond/pfem, although it is still very preliminary and lacks documentation.  If you are interested in trying in, send me a private message and I can walk you through the code.

  --Russell

Paul vdb

unread,
Aug 20, 2025, 2:21:05 PMAug 20
to nimble-users

For future reference to anyone who comes to this thread. I've added a branch to NIMBLE that now has expmAv (expm(A) %*% v) and expmA (expm(A)) for convenience. Do use with care and compare your estimates with "expm::expm". The speed for expmA may be slow for large matrices as we currently do not have sparse linear algebra. It follows a very similar algorithm as mentioned by Russell above, "Scaling and Squaring", but uses uniformization which is described in Sherlock 2021 and is reasonably efficient (faster in some cases than expm::expm and certainly faster than porting back and forth via nimbleRcall).

Reply all
Reply to author
Forward
0 new messages