Non-homogeneous Poisson Process with thinning method

135 views
Skip to first unread message

Léa P

unread,
Oct 8, 2024, 5:31:02 AM10/8/24
to nimble-users
Hello NIMBLE users,

I am currently working on fitting a non-homogeneous Poisson process using a thinning method to account for covariate impact on the event rate (lambda). However, I am struggling with data formatting, as I have continuous-time data (the time of events for several repetitions, e.g. the time of bus arrivals at several bus stops).

I can store them within a nimbleList (see the code below), but I'm not sure whether I can use this format for model data. I do not have a maximum number of events per repetition, so I cannot really preallocate the length of the vectors, or turn this into a matrix. And I cannot really use a summary variable (e.g. the number of events per repetition), because it would probably not fit well to what I'm trying to fit. I'm new-ish to using bayesian inference and Nimble, so maybe I've missed an obvious solution here.

Here is a reproducible example to illustrate my issue, including a base R data simulation and an unfinished draft of my NIMBLE model code:

```R
library(nimble)
set.seed(123)

# Link between my covariate and lambda (the event rate)
nimCompute_lambda <- nimble::nimbleFunction(
  run = function(lambda_intercept = double(0),
                 my_covariate_coef = double(0),
                 my_covariate = double(1)) {
    return(lambda_intercept + exp(my_covariate) * my_covariate_coef)
    returnType(double(1))
  }
)

# Define a nimble function for linear interpolation for a covariate depending on the time
nimCov_linear_interpolation <- nimbleFunction(
  run = function(time_points = double(1), time_values = double(1), cov_values = double(1)) {
    returnType(double(0))
    nb_covariate_measures <- length(time_values)
    nb_time_points <- length(time_points)
    cov_interp <- numeric(nb_time_points)
    for (j in 1:nb_time_points) {
      time_point <- time_points[j]
      if (time_point <= time_values[1]) {
        cov_interp[j] <- cov_values[1]
      } else if (time_point >= time_values[nb_covariate_measures]) {
        cov_interp[j] <- cov_values[nb_covariate_measures]
      }
      for (i in 1:(nb_covariate_measures - 1)) {
        if (time_point >= time_values[i] & time_point <= time_values[i + 1]) {
          cov_interp[j] <- cov_values[i] +
            (time_point - time_values[i]) / (time_values[i + 1] - time_values[i]) *
            (cov_values[i + 1] - cov_values[i])
        }
      }
    }
    return(cov_interp)
  }
)

# Simulate data: the time of events --------------------------------------------
# With a non-homogeneous Poisson Process, using a thinning method
# (e.g. the time of buses arrivals at a bus stop and this depends on some covariate)

nb_repet <- 10 # Let's say I have 10 repetitions of this PP (e.g. 10 bus stops)

# Let's say I have covariate information every 2 hour for a day (so 13 covariate measures)
nb_covariate_measures <- 13
my_time <- seq(0, 24, length.out = nb_covariate_measures)

# I start by simulating covariates and the associated lambda
# (let's say here that I have only 1 covariate, and it's the same for every the bus stop)
my_covariate_values <- rnorm(nb_covariate_measures, mean = 0)
simul_lambda_intercept <- 0.1
simul_covariate_coef <- 1
my_lambda_values <- nimCompute_lambda(
  lambda_intercept = simul_lambda_intercept,
  my_covariate_coef = simul_covariate_coef,
  my_covariate = my_covariate_values
)

# The maximum lambda for oversampling events (with a margin, hence the * 1.1)
oversampling_lambda <- max(my_lambda_values) * 1.1

# I initialise my nimbleList to store the time of events
nimList_types <- vector("list", length = nb_repet)
for (i in 1:nb_repet) {
  nimList_types[[i]] <- nimble::nimbleType(name = as.character(i), type = "double", dim = 1)
}
nimList_def <- nimble::nimbleList(nimList_types)
nimList_data <- nimList_def$new()

# For each repetition, I'll run my non-homogeneous PP, and store the data in my nimbleList
for (i in 1:nb_repet) {
  # Homogeneous Poisson process with this oversampling_lambda
  oversampling_nb_events <- rpois(n = 1, lambda = oversampling_lambda)
  oversampling_event_time <- sort(runif(n = oversampling_nb_events, min = min(my_time), max = max(my_time)))
  # Association with a lambda: find the lambda at the time of events
  oversampling_expected_lambda <- nimCompute_lambda(
    lambda_intercept = simul_lambda_intercept,
    my_covariate_coef = simul_covariate_coef,
    my_covariate = nimCov_linear_interpolation(
      time_point = oversampling_event_time,
      time_values = my_time,
      cov_values = my_covariate_values
    )
  )
  # I get the ratio between the expected lambda and the max lambda (oversampling_lambda)
  oversampling_lambda_ratio <- oversampling_expected_lambda / oversampling_lambda
  # I make a Bernoulli draw for each point to see which points are kept
  oversampling_event_kept <- rbinom(n = oversampling_nb_events, size = 1, prob = oversampling_lambda_ratio)
  # I end up with those event times and store them in the nimbleList
  nimList_data[[as.character(i)]] <- oversampling_event_time[as.logical(oversampling_event_kept)]
}

nimList_data
# nimbleList object of type nfRefClass_52
# Field "1":
#   numeric(0)
# Field "2":
#   [1] 9.929384 9.949112
# Field "3":
#   [1] 10.61280 19.17420 20.58787
# Field "4":
#   [1] 6.585207 9.215271
# Field "5":
#   [1] 11.40760 17.04438
# Field "6":
#   [1]  4.201264  4.504587 15.762195 21.275257
# Field "7":
#   [1]  1.457294  3.415063  3.530273  9.856555 22.447195
# Field "8":
#   [1] 3.405766
# Field "9":
#   [1] 10.54636 18.87076
# Field "10":
#   numeric(0)


# Nimble model -----------------------------------------------------------------
# Now from this data, I'd like to estimate the link between lambda and my covariate.
# In other words, I want to estimate lambda_intercept and my_covariate_coef here.

my_nimble_code <- nimbleCode({
  # Priors
  lambda_intercept ~ dnorm(0, sd = 1)
  my_covariate_coef ~ dnorm(1, sd = 1)
 
  # Compute lambda with my data
  my_lambda_values <- nimCompute_lambda(
    lambda_intercept = lambda_intercept,
    my_covariate_coef = my_covariate_coef,
    my_covariate = my_covariate_values
  )
 
  # The maximum lambda for oversampling events (with a margin)
  oversampling_lambda <- max(my_lambda_values) * 1.1
 
  # For each repetition, I'll run my non-homogeneous PP
  for (i in 1:nb_repet) {
    # Homogeneous Poisson process with this oversampling_lambda
    oversampling_nb_events ~ dpois(lambda = oversampling_lambda)
    for (j in 1:oversampling_nb_events) {
      oversampling_event_time ~ dunif(min = min(my_time), max = max(my_time))
      # Association with a lambda: find the lambda at the time of this event
      oversampling_expected_lambda <- nimCompute_lambda(
        lambda_intercept = simul_lambda_intercept,
        my_covariate_coef = simul_covariate_coef,
        my_covariate = nimCov_linear_interpolation(
          time_point = oversampling_event_time,
          time_values = my_time,
          cov_values = my_covariate_values
        )
      )
      # I get the ratio between the expected lambda and the max lambda (oversampling_lambda)
      oversampling_lambda_ratio <- oversampling_expected_lambda / oversampling_lambda
      # I make a Bernoulli draw for each point to see which points are kept
      oversampling_event_kept ~ dbern(oversampling_lambda_ratio)
      # I keep this event only if I have 1 above. But how to use the nimbleList as my data here?
    }
  }
})
my_nimble_model <- nimbleModel(code = my_nimble_code)

```
reproducible_example_nimble_nonhomogeneous_pp.R

Perry de Valpine

unread,
Oct 8, 2024, 12:14:57 PM10/8/24
to Léa P, nimble-users
Hi Léa,

Thanks for the question. It looks like there may be some basics of nimble models that you'll need to get up and running here. When using a non-scalar in a model, you must always provide indexing brackets with an explicit index range (or a blank range if nimbleModel can figure it out from other parts of your model code or if you provide the dimensions argument, but I'd suggest explicit index ranges). For example, you would need:

my_lambda_values[1:N] <- nimCompute_lambda(

    lambda_intercept = lambda_intercept,
    my_covariate_coef = my_covariate_coef,
    my_covariate = my_covariate_values[1:N]
  )

The second thing I notice is that you can't re-use variables in model code because it is a declarative language rather than (what most people are used to in most programming) and imperative language. For example, if you need a value of oversampling_nb_events for each i, you would need to do something like

oversampling_nb_events[i] ~ dpois(lambda = oversampling_lambda)

Each declaration (line of model code) creates one or more nodes (vertices) in the directed acyclic graph (DAG) of the model. That's why you can't re-use a variable, A similar point applies to other declarations inside the for loops.

nimbleModel requires that you provide in the constants argument any constant values that are necessary to create a DAG representation from the model code. In this case, you would need to provide a constants list with at least nb_repet, oversampling_nb_events, and any values such as "N" that I gave above that you determine you need for vector or matrix index ranges.

A good way to build up a model when you are learning is to build it up piece by piece and see if it works at each step. For example, you could do only

my_nimble_code <- nimbleCode({
  # Priors
  lambda_intercept ~ dnorm(0, sd = 1)
  my_covariate_coef ~ dnorm(1, sd = 1)
 
  # Compute lambda with my data
  my_lambda_values[1:N] <- nimCompute_lambda(

    lambda_intercept = lambda_intercept,
    my_covariate_coef = my_covariate_coef,
    my_covariate = my_covariate_values[1:N]
  )
 }) 

my_nimble_model <- nimbleModel(code = my_nimble_code, 
   constants = list(N = 10)) # not sure what N should be

# you will get warnings and/or NAs unless you provide inits for lambda_intercept and my_covariate_intercept or do calculate=FALSE.

my_nimble_model$lambda_intercept <- 0
my_nimble_model$my_covariate_coef <- 1 # not sure what values are reasonable
my_nimble_model$calculate()
my_nimble_model$my_lambda_values # are they what they should be?

You can also check that things will compile:
c_my_nimble_model <- compileNimble(my_nimble_model)
c_my_nimble_model$calculate()
# etc...

And then try adding in the next piece.

I'm not offering feedback on the model structure from a statistical perspective (that is usually beyond what we do here), so I am not seeing exactly what you want to do. In general, a nimbleList is not something you can use in model code (it is more for nimbleFunctions), and I am not clear how you were intending to use it. When people have subsets of data with different extents (what I think might be your need), a common scheme is nested indexing. For example, you could create a vector of data extents and use it like this

for(i in 1:nb_repeat) {
 #...
  for(j in 1:oversampling_nb_events[i]) {
   some_matrix[i, 1:length_of_data[j] ] <- some_function(<arguments>)
  }
}

You seem to have one data extent of 0 (but I may not be following) and that would be better to remove in advance.

It is also possible for a nimbleFunction with setup code to retain member data involving information about different extents for different indices. This is a relatively new feature and can be found in the user manual. I'm guessing you've had a look at the user manual (at r-nimble.org) because of your exploration of nimbleLists. In case it is helpful, the early chapters on nimble models may be relevant to some of what you are running into.

HTH
Perry


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/afdcde01-dd27-43eb-9fdf-66d6d92edb30n%40googlegroups.com.

Léa P

unread,
Oct 10, 2024, 4:51:44 AM10/10/24
to nimble-users
Hi Perry, thank you for your answer!

Yes I'm sorry about the indexing, as I didn't know how to end my model because I could not figure out how to format my data, I did not write the model code in a functionnal way, apologies!

I tried building up the model piece by piece until I had an issue, and I encountered my first issue when adding the j loop (what's commented here)

my_nimble_code <- nimbleCode({
  # Priors
  lambda_intercept ~ dnorm(0, sd = 1)
  my_covariate_coef ~ dnorm(1, sd = 1)
 
  # Compute lambda with my data
  my_lambda_values[1:nb_covariate_measures] <- nimCompute_lambda(

    lambda_intercept = lambda_intercept,
    my_covariate_coef = my_covariate_coef,
    my_covariate = my_covariate_values[1:nb_covariate_measures]

  )
 
  # The maximum lambda for oversampling events (with a margin)
  oversampling_lambda <- max(my_lambda_values[1:nb_covariate_measures]) * 1.1

 
  for (i in 1:nb_repet) {
    # Homogeneous Poisson process with this oversampling_lambda
    oversampling_nb_events[i] ~ dpois(lambda = oversampling_lambda)
    # for (j in 1:oversampling_nb_events[i]) {
    #   oversampling_event_time[i, j] ~ dunif(min = min(my_time[1:nb_covariate_measures]),
    #                                         max = max(my_time[1:nb_covariate_measures]))
    # }
  }
})

my_nimble_model <- nimbleModel(
  code = my_nimble_code,
  constants = list(
    nb_covariate_measures = nb_covariate_measures, # 13
    nb_repet = nb_repet, # 10
    my_time = my_time, # seq(0,24,by=2)
    my_covariate_values = my_covariate_values # rnorm(nb_covariate_measures, mean = 0)
  ),
  inits = list(
    "lambda_intercept" = 0.1,
    "my_covariate_coef" = 1
  )
)

# Check results with init (without the j loop)
my_nimble_model$calculate()
my_nimble_model$my_lambda_values # works, same as my_lambda_values calculated above
my_nimble_model$oversampling_lambda # works
my_nimble_model$oversampling_nb_events # only NAs

# Compile
c_my_nimble_model <- compileNimble(my_nimble_model)
c_my_nimble_model$calculate()



When adding the j loop, it returns this error: Could not evaluate loop syntax: is indexing information provided via 'constants'? . Is it not possible to use loop indexing outside of constants? That's not very practical in my case, I can't really see a workaround.

Regarding the data structure issue, I'm not sure that the matrix form would suit my need very well, as I don't know the dimensions of this matrix. And as you said, I have data with extent 0. Basically, to explain a bit my statistical model process in a non-code way, for a given repetition:

1. I want to simulate a number of events (oversampling_nb_events). I can have 0 event, and there is no known upper limit in the number of events
2. For each event (so I can't know the number of events in advance, can't put it in a constant, cause it depends step 1.):
        2.1. I find the time of this event (oversampling_event_time)
        2.2. I calculate a probability to keep this event (oversampling_lambda_ratio), and that depends on the time of the event (step 2.1.)
        2.3. I run a Bernoulli trial to see if I keep this event or not (so this depends on the the probability calculated in step 2.2.)

Therefore, I have no idea of the dimension of my final data for a given repetition (nor of my intermediate data, whose dimension depends on the result of step 1.). As you said, I indeed have subsets of data with different extents (for my different repetitions).

So I'm not sure if a nested indexing in a matrix would work well. The two main limitations I see are:
- If I choose arbitrarily a matrix size, and if during a simulation (an iteration of the MCMC), the number of events exceed this matrix size, what will happen ? I guess it will return an error
- Otherwise, I might need a gigantic matrix and therefore probably a lot of computing resources
Also, I can not really see how to format the data part of the model so that it matches.

I did not find in the user manual what you talk about in the end of your message, I'm not sure what you mean by that.

Perry de Valpine

unread,
Oct 23, 2024, 8:46:00 PM10/23/24
to Léa P, nimble-users
Dear Léa,

Sorry your follow-up has sat unresponded for almost two weeks. We try but sometimes we have travel etc. (in my case).

The extent of a for loop range must be a constant (either a literal number or a variable provided in the constants list). It cannot be a model node such as oversampling_nb_events[i]. The reason is that the model code is processed as a declarative language, with each line of code (and for each iteration of a for loop) declaring relationships among variables. Those relationships are not dynamically re-determined each time the value of a variable (such as oversampling_nb_events[i]) changes.

In such a situation, it is usually possible to define the model for the maximal needed range and then write code that achieves using only a part of the range in a dynamic way. I'm sorry I'm not seeing exactly what you need (your explanation refers to variables that seem to be in your first message, but there were other issues going on there and I'm not clear if the names are consistent or of exactly what you need). Would it be possible to now make a toy example with all the pieces you want?

Please refer to section 12.3 of the nimble User Manual for the idea I mentioned previously. It's a new section of the manual, so I hope it is clear enough.

I was able to run your code by editing it to use the values shown behind comment tags (#). It works really well if you are able to send minimal reproducible examples that run without any guesswork at all about how we should change what you sent such as uncommenting pieces of it. Thank you!

Perry



Reply all
Reply to author
Forward
0 new messages