non-sequential indexing

46 views
Skip to first unread message

Neil Gilbert

unread,
Dec 2, 2022, 1:18:28 PM12/2/22
to nimble-users

Hi all,

 

I’m playing with the spatially stratified capture-recapture models described in AHM volume 1, section 7.8.4 (pages 358 ff). The basic premise is that we have capture-recapture data for multiple sites, augment unobserved individuals, and the model assigns site (coded as “group” in the model) membership to the augmented individuals.

 

The challenge: I would like to modify this to derive abundance N for each of the sites. Because site membership is a latent variable for the augmented individuals, the data / corresponding latent variable can’t be sorted by site and indexed with start/end constants, e.g.: sum(z[start[i]:end[i]]). This seems to present a problem because my understanding is that Nimble can only handle sequential integer indexing, so something like sum(z[group_indices[s]]), where group_indices is a vector (e.g., c(2, 5, 10)), wouldn't work. Is there a way to get around this with a nimbleFunction or otherwise accomplish this?

 

Below, I provide code to format data/constants and for the basic model:

 

library(AHMbook)

library(tidyverse)

library(nimble)

 

alfl <- read.csv(system.file("csv", "alfl.csv", package = "unmarked")) %>%

  group_by(id) %>%

  mutate(site = cur_group_id())

 

alfl.covs <- read.csv(system.file("csv", "alflCovs.csv", package = "unmarked"), row.names =  1)

 

y <- as.matrix(alfl[, c("interval1", "interval2", "interval3")])

nind <- nrow(y)

M <- 400

y <- rbind(y, matrix(0, nrow = (M - nind), ncol = 3))

site <- alfl$site

site <- c(site, rep(NA, M - nind))

 

sitecovs <- scale(as.matrix(alfl.covs[,c("woody", "struct", "time.1")]))

 

data <- list(y = y,

             X = sitecovs,

             group = site)

 

constants <- list(

  J = 3,

  M = M,

  nsites = 50

)

 

code <- nimbleCode({

 

  p0 ~ dunif(0, 1)

  alpha0 <- log(p0 / (1 - p0))

  alpha1 ~ dnorm(0, sd = 2)

  alpha2 ~ dnorm(0, sd = 2)

  beta0 ~ dnorm(0, sd = 2)

  beta1 ~ dnorm(0, sd = 2)

 

  psi <- sum(lambda[1:nsites]) / M

 

  # Easy to derive N across sites

  N <- sum(z[1:M])

 

  # Challenge: want to derive N[s] by summing the z’s that correspond to group s

  for( s in 1:nsites ){

    log(lambda[s]) <- beta0 + beta1 * X[s, 1]

    probs[s] <- lambda[s] / sum(lambda[1:nsites])

  }

 

  for( i in 1: M){

    group[i] ~ dcat(probs[1:nsites])

    z[i] ~ dbern(psi)

   

    for(j in 1:J){

      logit(p[i, j]) <- alpha0 + alpha1 * X[group[i], 2] + alpha2 * X[group[i], 3]

      pz[i, j] <- p[i, j] * z[i]

      y[i, j] ~ dbern(pz[i, j])

    }

  }

})

 

And, to be clear about what I’d ideally like to do, here’s R code showing the general concept:

 

nsites <- 10

M <- 200

group <- sample(1:nsites,

                size = M,

                replace = TRUE)

z <- sample(c(0, 1),

            size = M,

            replace = TRUE)

 

N <- c()

for( s in 1:nsites ) {

  indices[s] <- list (which( group == s))

  N[s] <- sum(z[indices[[s]]])

  print(N[s])

}


Thanks!! 


Neil

----------------
Neil A. Gilbert, PhD (he/him)
Postdoctoral Research Fellow
Department of Integrative Biology
Michigan State University

Perry de Valpine

unread,
Dec 2, 2022, 1:49:58 PM12/2/22
to Neil Gilbert, nimble-users
Hi Neil,

I think you'd need to write a nimbleFunction to do this.  Non-sequential indexing will work in a nimbleFunction, or you could write a for loop or other approach to getting the sum(s) you want.  Model code is more restrictive because we not only need to get the calculations right but also to process it symbolically to get the graph structure right, so you're correct that an index vector would not work in model code.  E.g. you could have something like:

N_by_site[1:nsites] <- count_by_site(group[1:M)

and do the counting inside nimbleFunction count_by_site 

 Does that make sense?  This is a quick reply but we can flesh it out more if necessary.

Perry
 

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/CAMc-Rig-OM15ztzcPZ1f1Pj55VvSxLbWZrrR6J5NXyhsscA2rQ%40mail.gmail.com.

Neil Gilbert

unread,
Dec 2, 2022, 3:56:30 PM12/2/22
to Perry de Valpine, nimble-users

Hi Perry, 


Thanks! I took a stab at writing a nimbleFunction to do the counting, but I'm getting an error when the model compiles. Any suggestions for making this run (pertinent parts of the code bolded/highlighted below). Thanks!


Neil


library(AHMbook)

library(tidyverse)

library(nimble)

 

alfl <- read.csv(system.file("csv", "alfl.csv", package = "unmarked")) %>%

  group_by(id) %>%

  mutate(site = cur_group_id())

 

alfl.covs <- read.csv(system.file("csv", "alflCovs.csv", package = "unmarked"), row.names =  1)

 

y <- as.matrix(alfl[, c("interval1", "interval2", "interval3")])

nind <- nrow(y)

M <- 400

y <- rbind(y, matrix(0, nrow = (M - nind), ncol = 3))

site <- alfl$site

site <- c(site, rep(NA, M - nind))

 

sitecovs <- scale(as.matrix(alfl.covs[,c("woody", "struct", "time.1")]))

 

data <- list(y = y,

             X = sitecovs,

             group = site)

 

constants <- list(

  J = 3,

  M = M,

  ngroup = length(unique(data$group))

)

 

# function to sum up z's by each group (site)

count_by_site <- nimbleFunction(

  run = function(

    group = double(1),

    z = double(1),

    ngroup = integer(0)

  ) {

   

    N <- c()

    for( i in 1:ngroup) {

      N[i] <- sum(z[c(which(group == i))])

    }

   

    return(N[1:ngroup])

    returnType(double(1))

   

  }

)

 

ngroup <-  10

M <- 200

group <- sample(1:ngroup,

                size = M,

                replace = TRUE)

 

z <- sample(c(0, 1),

            size = M,

            replace = TRUE)

 

# function works outside of model

count_by_site(group = group[1:M], z = z[1:M], ngroup = ngroup)

 

code <- nimbleCode({

 

  p0 ~ dunif(0, 1)

  alpha0 <- log(p0 / (1 - p0))

  alpha1 ~ dnorm(0, sd = 2)

  alpha2 ~ dnorm(0, sd = 2)

  beta0 ~ dnorm(0, sd = 2)

  beta1 ~ dnorm(0, sd = 2)

 

  psi <- sum(lambda[1:ngroup]) / M

 

  for( s in 1:ngroup ){

    log(lambda[s]) <- beta0 + beta1 * X[s, 1]

    probs[s] <- lambda[s] / sum(lambda[1:ngroup])

  }

 

  for( i in 1: M){

    group[i] ~ dcat(probs[1:ngroup])

    z[i] ~ dbern(psi)

   

    for(j in 1:J){

      logit(p[i, j]) <- alpha0 + alpha1 * X[group[i], 2] + alpha2 * X[group[i], 3]

      pz[i, j] <- p[i, j] * z[i]

      y[i, j] ~ dbern(pz[i, j])

    }

  }

 

  N[1:ngroup] <- count_by_site(group = group[1:M],

                               z = z[1:M],

                               ngroup = ngroup)

})

 

 

inits <- function(){

  list(

    N  = sample(1:10, constants$ngroup, replace = TRUE),

    p0 = runif(1, 0, 1),

    alpha1 = rnorm(1, 0, 1),

    alpha2 = rnorm(1, 0, 1),

    beta0 = rnorm(1, 0, 1),

    beta1 = rnorm(1, 0, 1),

    z = c(rep(1, 100), rep(0, 300)),

    group = c(rep(NA, length(alfl$id)),

              sample(1:constants$ngroup,

                     size = length(site) - length(alfl$id),

                     replace = TRUE)))

}

 

nb <- 5000

ni <- nb + 10000

nc <- 1

nt <- 10

 

params <- c("p0", "N", "alpha0", "alpha1", "alpha2", "beta0", "beta1", "psi")

 

m <- nimbleModel(code = code,

                 constants = constants,

                 data = data,

                 inits = inits())

 

# Error when compiling

# Error in rle(isScalar) : 'x' must be a vector of an atomic type

out <- nimbleMCMC(

  code = code,

  constants = constants,

  data = data,

  monitors = params,

  inits = inits(),

  niter = ni,

  nburnin = nb,

  nchains = 1,

  thin = nt

)

----------------
Neil A. Gilbert, PhD (he/him)
Postdoctoral Research Fellow
Department of Integrative Biology
Michigan State University

Perry de Valpine

unread,
Dec 2, 2022, 5:35:08 PM12/2/22
to Neil Gilbert, nimble-users
Hi Neil,

In a nimbleFunction, automatic extension of vectors is not supported.  Please try allocating N in advance, like this:

    N <- numeric(ngroup)

    for( i in 1:ngroup) {

      N[i] <- sum(z[c(which(group == i))])

    }


In addition, I think it should work to replace 


N[i] <- sum(z[c(which(group == i))])


with


N[i] <- sum(z[group == i])


Also remember you can check compilation and functioning of count_by_site outside of the model.


C_count_by_site <- compileNimble(count_by_site)


C_count_by_site(<give some test inputs>)


Does it work?


Perry


Neil Gilbert

unread,
Dec 2, 2022, 6:56:57 PM12/2/22
to Perry de Valpine, nimble-users
Hi Perry, 

Thanks - this works! However, the model is MUCH slower - it took ~40 minutes to run 1k iterations (in comparison, it took ~1 minute without counting by site).  Any ideas for improving efficiency, or do you think this sort of non-sequential indexing is inevitably costly?

Updated function below:

count_by_site <- nimbleFunction(
  run = function(
    group = double(1),
    z = double(1),
    ngroup = integer(0)
  ) {
   
    N <- numeric(ngroup)
    for( i in 1:ngroup) {
      N[i] <- sum(z[group == i])

    }
   
    return(N[1:ngroup])
    returnType(double(1))
   
  }
)

Thanks,

Neil
----------------
Neil A. Gilbert, PhD (he/him)
Postdoctoral Research Fellow
Department of Integrative Biology
Michigan State University

Perry de Valpine

unread,
Dec 2, 2022, 7:17:20 PM12/2/22
to Neil Gilbert, nimble-users
Hi Neil,
Yikes, that's not good.
Try re-coding it along the ideas below (not tested).  Basically, habits people get from R for efficient coding involve vectorizing steps (to get operations done internally, at a lower level than the R evaluator, which is slow), but in a low-level language like C++ (what we generate), for-loops are efficient whereas creating lots of redundant information (often the result of R-style coding like here) is inefficient.  If I follow, your code roughly creates M*ngroup steps (loop over ngroup, logical vector of length M each time).  The code below roughly creates only M steps, checking the group of each element just once. See if this helps.

count_by_site <- nimbleFunction(
  run = function(
    group = double(1),
    z = double(1),
    ngroup = integer(0)
  ) {

  M <- length(group)
  N <- numeric(ngroup)

for(i in 1:M) {
  if(z[i] == 1) {
    N[group[i]] <- N[group[i]] + 1
  }
}   

    return(N) # brackets are only necessary in model code, not nimbleFunction
    returnType(double(1))
   
  }
)

Neil Gilbert

unread,
Dec 2, 2022, 8:16:22 PM12/2/22
to Perry de Valpine, nimble-users
Splendid! That works and is just as fast as the model without the counting. Thanks for the help and insight, Perry!

Cheers,

Neil
----------------
Neil A. Gilbert, PhD (he/him)
Postdoctoral Research Fellow
Department of Integrative Biology
Michigan State University

Reply all
Reply to author
Forward
0 new messages