Vector random indexing?

515 views
Skip to first unread message

Neil Gilbert

unread,
Nov 19, 2021, 11:38:55 AM11/19/21
to nimble-users
Hi folks,

I'm trying to run the autologistic model from Ch 9.6.1.2 from Applied Hierarchical Modeling in Ecology, Volume II (JAGS code here) in Nimble, but am running into an issue with vector random indexing (used to identify neighboring cells in a grid). 

I *think* there should be a simple solution to this by writing a nimbleFunction to do the nested indexing (I'm going off this old post: https://groups.google.com/g/nimble-users/c/TUs5RCw193M), but I'm hitting a wall and my attempts thus far have been unsuccessful. Can anyone help with this? 

Thanks so much! Code below:

Cheers,

Neil

######## AHM 9.6.1.2

library(AHMbook)
library(spdep)
library(nimble)

dat <- AHMbook::simDynoccSpatial(side = 30, nyears = 10, nsurveys = 3,
                                 mean.psi1 = 0.1, range.phi = c(0.5, 0.5),
                                 range.gamma = c(0.2, 0.2), range.p = c(0.4, 0.4),
                                 beta.Xautolog = c(1, 1), seed.XAC = 1, seed = 24,
                                 show.plots = FALSE)

nsites <- dat$side^2
nsurveys <- dat$nsurveys
nyears <- dat$nyears
y <- array(NA, dim = c(nsites, nsurveys, nyears))
for(i in 1:nyears){
  y[,,i] <- dat$umf@y[,(3*i-2):(3*i)]
}

neigh <- spdep::dnearneigh(dat$grid, d1 = 0, d2 = sqrt(2) + 0.1)
str(winnb <- spdep::nb2WB(neigh))
numN <- winnb$num

neighID <- array(NA, dim = c(nsites, 8))
for(i in 1:nsites){
  neighID[i, 1:numN[i]] <- unlist(neigh[i])
}

data <- list(y = y)

constants <- list(nsites = nsites, nsurveys = nsurveys, nyears = nyears,neighID = neighID, numN = numN) 

autologistic1 <- nimble::nimbleCode({
 
  psi1         ~ dbeta(1, 1)
  alpha.lphi   ~ dnorm(0, sd = 3)
  beta.lphi    ~ dnorm(0, sd = 1.5)
  alpha.lgamma ~ dnorm(0, sd = 3)
  beta.lgamma  ~ dnorm(0, sd = 1.5)
  p            ~ dbeta(1, 1)
 
  # likelihood
  # ecological submodel
  for(i in 1:nsites){
    z[i, 1] ~ dbern(psi1)
    for(t in 2:nyears){
      z[i, t] ~ dbern(z[i, t-1]*phi[i, t-1] + (1 - z[i, t-1])*gamma[i, t-1])
      # compute autocovariate and specificy its effects on phi and gamma
     
 ### *** problem here with vector random indexing ***
      autocov[i, t] <- sum(z[neighID[i, 1:numN[i]], t-1]) / numN[i]

      logit(phi[i, t-1]) <- alpha.lphi + beta.lphi*autocov[i, t-1]
      logit(gamma[i, t-1]) <- alpha.lgamma + beta.lgamma*autocov[i, t-1]
    }
  }
 
  # observation model
  for(i in 1:nsites){
    for(j in 1:nsurveys){
      for(t in 1:nyears){
        y[i, j, t] ~ dbern(z[i, t]*p)
      }
    }
  }
})

zst <- array(1, dim = c(nsites, nyears))

inits <- function(){
  list(z = zst, psi1 = rbeta(1, 1, 1), alpha.lphi = rnorm(1, 0, 1),
    beta.lphi = rnorm(1, 0, 1), alpha.lgamma = rnorm(1, 0, 1),
    beta.lgamma = rnorm(1, 0, 1), p = rbeta(1, 1, 1))
}

m <- nimbleModel(code = autologistic1,
                 constants = constants,
                 data = data,
                 inits = inits())

# Error in getSymbolicParentNodesRecurse(x, constNames, indexNames, #nimbleFunctionNames,  :
  #getSymbolicParentNodesRecurse: only scalar random indices are allowed; vector random #indexing found in z[neighID[i, 1:numN[i]], t - 1]

Chris Paciorek

unread,
Nov 20, 2021, 2:58:31 PM11/20/21
to Neil Gilbert, nimble-users
Hi Neil,

1) We're flagging "random indexing" because there are NAs in "neighID", which we can't treat as constants. So for that, please replace any NAs with some arbitrary value (e.g., zero). It doesn't matter what value as those entries are presumably never actually used.

2) Once you do that, you'll get a different error about Nimble not handling vector indexing. To get around that, you're right that you'll need a nimbleFunction to do the summation over the indexed values. For efficiency, I would just calculate all of `autocov` at once so that the looping is done in the nimbleFunction and only one multivariate `autocov` node is needed and not many scalar nodes. 

Something like this should get you started. I haven't tested it.                                                                  

calc_autocov <-
  run = function(z = double(2), neighID = double(2), numN = double(1)) {
      returnType(double(2))
      nsites <- dim(z)[1]
      nyears <- dim(z)[2]
      result <- matrix(nrow = nsites, ncol = nyears, init = FALSE)
      for(i in 1:nsites)
            for(t in 2:nyears)
                result[i, t] <- sum(z[neighID[i, 1:numN[i]], t-1]) / numN[i]
      return(result)
  })

Then in model code, you'll define autocov outside of the loops as

autocov[1:nsites, 1:nyears] <- calc_autocov(Z[1:nsites,1:nyears], neighID[1:nsites, 1:8], numN[1:nsites])

2a) One thing you'll need to deal with is that I don't see that you are defining autocov[,1].

-chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/50ae551c-f852-4be6-ad9c-1c9a21997feen%40googlegroups.com.

Chris Paciorek

unread,
Nov 20, 2021, 3:09:41 PM11/20/21
to Neil Gilbert, nimble-users
And actually, given the temporal dependence and the neighborhood structure, calculating autocov all at once might create additional dependencies that could slow down the MCMC sampling for z.

So you might actually want to compute autocov[i,t] one at a time. You should be able to modify my calc_autocov to do that fairly readily by passing in 'i', and 't' and removing the loops and the creation of a matrix. Feel free to follow up if you have questions.

-chris

Neil Gilbert

unread,
Nov 24, 2021, 4:44:05 PM11/24/21
to Chris Paciorek, nimble-users
Hi Chris, 

Thanks so much for the help! I replaced the NAs in neighID with 0's and I tried both of the approaches you suggested, but unfortunately am still stuck. Notes below:

1) Quick note, since you mentioned that autocov[,1] is not defined: the autocov variable is used for the transitions (phi and gamma) but not z itself, and so its dimensions should be autocov[1:nsites, 1:nyears-1]

2) I tried the first implementation you suggested, following your code very closely, except for minor changes to reflect the dimensions of autocov:

calc_autcov <- nimbleFunction(

  run = function(z = double(2), neighID = double(2), numN = double(1)){
    returnType(double(2))
    nsites <- dim(z)[1]
    nyears <- dim(z)[2]
    result <- matrix(nrow = nsites, ncol = nyears-1, init = FALSE)

    for(i in 1:nsites){
      for(t in 2:nyears){
        result[i, t-1] <- sum(z[neighID[i, 1:numN[i]], t-1]) / numN[i]
      }
    }
    return(result)
  })


Within the model, I defined autocov outside the loops with:

autocov[1:nsites, 1:nyears-1] <- calc_autcov(z[1:nsites, 2:nyears], neighID[1:nsites, 1:8], numN[1:nsites])

Which gave the following error:

Error in permute.verticecs(graph, oldGraphID_2_newGraphID) :

At topology.c:2725 : Permute vertices: invalid permutation vector size, Invalid value

In addition: Warning message:

In topological.sort(graph, mode = “out”) :

At structural_properties.c:3346 :graph contains a cycle, partial result is returned


3)  Next, I tried to compute autocov[i, t] one at a time, as you suggested. I think this is the proper way to feed the i and t indices into the function? I tried this: 


calc_autcov <- nimbleFunction(
  run = function(z = double(2), neighID = double(2), numN = double(1), i = integer(), t = integer()){
    returnType(double(1))
    result <- sum(z[neighID[i, 1:numN[i]], t-1]) / numN[i]
    return(result)
  })


And defined autocov[i, t] within the loops, as follows:

for(i in 1:nsites){
    z[i, 1] ~ dbern(psi1)
    for(t in 2:nyears){
      z[i, t] ~ dbern(z[i, t-1]*phi[i, t-1] + (1 - z[i, t-1])*gamma[i, t-1])
      autocov[i, t-1] <- calc_autcov(z[i, t-1], neighID[i, 1:8], numN[i])
      ...

But get the error: Error in neighID[i, 1:numN[i]] : incorrect number of dimensions

Any ideas to troubleshoot this? I have a sneaking feeling I may be making a basic mistake with indexing, but I have hit a wall.

Thanks so much!

Cheers,

Neil
----------------
PhD Candidate
Department of Forest and Wildlife Ecology
University of Wisconsin-Madison

Chris Paciorek

unread,
Nov 29, 2021, 8:29:59 PM11/29/21
to Neil Gilbert, nimble-users
Hi Neil,

1) Given you are defining the various quantities in a lagged fashion, trying to define the entire 'autocov' at once looks like it causes the model graph to not be acyclic. So my suggestion of defining as a block was not a good suggestion. I'd go with the one at a time approach.

2) It looks like you are forgetting to pass "i" and "t" into "calc_autocov". 

-chris

Neil Gilbert

unread,
Dec 6, 2021, 6:50:41 PM12/6/21
to Chris Paciorek, nimble-users
Hi Chris, 
 
Thanks again for the help! I tried passing "i" and "t" into "calc_autocov" within the loops, i.e.:

 autocov[i, t-1] <- calc_autcov(z[i, t-1], neighID[i,], numN[i], i = i, t = t)

But still got the error:  Error in neighID[i, 1:numN[i]] : incorrect number of dimension

Allso, I'm not very practised with nimbleFunctions, but I think I may have incorrectly defined the dimensions in the previous version. I'm assuming that, since the function is being called within the loops, the z, neighID, and numN variables should be double(0), double(1), and double(0), respectively, i.e. below:

calc_autcov <- nimbleFunction(
  run = function(z = double(0), neighID = double(1), numN = double(0), i = integer(), t = integer()){
    returnType(double(0))

    result <- sum(z[neighID[i, 1:numN[i]], t-1]) / numN[i]
    return(result)
  })


Would appreciate any other ideas - I'll continue experimenting with this and if I figure anything out!

Cheers,

Neil  
----------------
PhD Candidate
Department of Forest and Wildlife Ecology
University of Wisconsin-Madison
 

Chris Paciorek

unread,
Dec 7, 2021, 4:04:01 PM12/7/21
to Neil Gilbert, nimble-users
Hi Neil, 

Based on how you use 'z', 'neighID', and 'numN' in the code as matrices (first two) and vector (numN), you'd need to have:

run = function(z = double(2), neighID = double(2), numN = double(1), i = integer(), t = integer())

From your previous emails, you might have tried that before, but I'm not sure. Let me know if that doesn't help.

-chris

Oriol Solà

unread,
Jan 24, 2024, 10:22:38 AM1/24/24
to nimble-users

Hi folks,

Sorry to dig out this old thread. I am new to Nimble and I am trying to run the autologistic dynocc model. I followed the tips from this thread but I get the following error (code below):

Error in topological.sort(graph, mode = "out") :

  At core/properties/dag.c:114 : The graph has cycles; topological sorting is only possible in acyclic graphs, Invalid value


Any ideas what could be wrong?

Neil, were you able to successfully run the model in Nimble in the end? And if so, could you share the code with me?


Thank you very much,


Oriol

 

This is my attempt:


# Autocovariate Nimble function

calc_autocov <- nimbleFunction(

  run = function(i = integer(), t = integer(), z = double(2), neighID = double(2), numN = double(1)){

    returnType(double(0))

    result <- sum(z[neighID[i, 1:numN[i]], t-1]) / numN[i]

    return(result)

  })

 

# NIMBLE model

dynocc_model <- nimble::nimbleCode( {

# Priors

psi1 ~ dunif(0, 1) # Initial occupancy

alpha.lpsi1 <- logit(psi1)

beta.lpsi1 ~ dnorm(0, 0.01)

phi.int ~ dunif(0, 1) # Persistence

alpha.lphi <- logit(phi.int)

beta.lphi ~ dnorm(0, 0.01)

gamma.int ~ dunif(0, 1) # Colonization

alpha.lgamma <- logit(gamma.int)

beta.lgamma ~ dnorm(0, 0.01)

p ~ dunif(0, 1) # Detection

# Likelihood

# Ecological submodel

for (i in 1:nsites){

  z[i,1] ~ dbern( ilogit(alpha.lpsi1 + beta.lpsi1 * field[i]) )

  for (t in 2:nyears){

    autocov[i, t-1] <- calc_autocov(i,t,z[1:nsites,1:nyears], neighID[1:nsites, 1:8], numN[1:nsites])

    logit(phi[i,t-1]) <- alpha.lphi + beta.lphi * autocov[i,t-1]

    logit(gamma[i,t-1]) <- alpha.lgamma + beta.lgamma * autocov[i,t-1]

   

    z[i,t] ~ dbern(z[i,t-1] * phi[i,t-1] + (1-z[i,t-1]) * gamma[i,t-1])

    # Compute autocovariate and specify its effects on phi and gamma

   

  }

}

# Observation model

for (i in 1:nsites){

  for (j in 1:nsurveys){

    for (t in 1:nyears){

      y[i,j,t] ~ dbern(z[i,t] * p)

    }

  }

}

 

}

)

 

# Remove Nas from neighID data

neighID[is.na(neighID)]<-nsites+1

bdata <- list(y = y_nimble, nsites = nsites, nsurveys = nsurveys, nyears = nyears,

              neighID = neighID, numN = numN,

              field=as.vector(field) )

 

# Initial values

zst <- array(1, dim = c(nsites, nyears))

inits <- function(){ list(z = zst)}

# Parameters monitored

params <- c("alpha.lpsi1", "beta.lpsi1", "alpha.lgamma", "beta.lgamma",

            "alpha.lphi", "beta.lphi", "p") # could also monitor "z"

# MCMC settings

na <- 1000 ; ni <- 2000 ; nt <- 2 ; nb <- 1000 ; nc <- 3

 

out1_nimble <-

  nimbleMCMC(code = dynocc_model,

             constants = bdata, inits = inits, monitors = params,

             nburnin = nb, niter = ni, thin = nt, nchains = nc) 


El dia dimarts, 7 de desembre del 2021 a les 22:04:01 UTC+1, Chris Paciorek va escriure:

Daniel Turek

unread,
Jan 24, 2024, 1:16:21 PM1/24/24
to Oriol Solà, nimble-users
Oriol, thanks for sending your question.

Nimble models cannot have "cycles" in them; a "cycle" is a circular dependency structure within the nodes.  For example, a cycle would be:
A depends on B, B depends on C, and C depends on A.

Looking at your code, it looks like the (many) cycles (circular dependencies) are coming from:

Each z[i,t] depends on gamma[i,t-1]
gamma[i,t-1] depends on autocov[i,t-1]
autocov[i,t-1] depends on z[1:nsites,1:nyears], which includes the original z[i,t].

So, in order to have this model work in nimble, you'd have to restructure your model code to *not* have these cycles (circular dependencies).

It's another question of how to go about doing that.

I note that your calc_autocov() function for calculating the value of each autocov[i, t-1] does not end up using every element of z[1:nsites,1:nyears], but rather only a subset of the z[,] values, specifically, those given by z[neighID[i, 1:numN[i]], t-1].  Note, that even if the calculation of autocov[i,t-1] does not end up using the value of z[i,t] in the calculation, nimble will still consider this a "dependency", since calc_autocov() accepts *all* of z[1:nsites,1:nyears] as an argument.

So, it's possible that you can re-write your calc_autocov() function to do exactly the same calculations, but not passing in the z[i,t] node as an argument (or any other z[,] values which would create a cycle).  If you can do this, then your model would hopefully work just fine.

Cheers,
Daniel





Oriol Solà

unread,
Jan 26, 2024, 4:46:39 AM1/26/24
to Daniel Turek, nimble-users

Hi Daniel,


Thank you very much for your answer. Your explanations are very clear.

I modified the nimble function so autocov[i,t-1] depends only on z[1:nsites,t-1]. Now the model seems to run without errors but it takes forever to compile using nimbleMCMC. Any ideas on how to speed up the compilation?


Thank you again,

 

Oriol

 

Daniel Turek

unread,
Jan 26, 2024, 6:49:50 AM1/26/24
to Oriol Solà, nimble-users
Oriol, unfortunately, there's no prescription for this.  Generally, the larger your model size (in terms of sheer number of nodes), the longer compilation will take.

Looking at your model, the triple-indexing on y[i, j, t]  seems to be the worst offender, in terms of contributing the largest number of nodes to your model:

  for(i in 1:nsites){
    for(j in 1:nsurveys){
      for(t in 1:nyears){
        y[i, j, t] ~ dbern(z[i, t]*p)
      }
    }
  }

Since each of the nsurveys elements in y[i, 1:nsurveys, t] has an identical distribution, I would suggest using a vectorized bernoulli distribution to jointly calculation the likelihood of the y[i, 1:nsurveys, t] vector.  That should speed up compilation for you.  To help you get started, there are multiple examples of this in the nimbleSCR package.

Cheers,
Daniel



Message has been deleted

Chris Paciorek

unread,
Feb 4, 2024, 3:20:36 PM2/4/24
to Abby Keller, nimble-users
hi Abby,

On fairly quick glance, it looks like one (the?) issue could be with `totalo` having zeroes, but you use it in the model code as `1:totalo[t,i]`.

-chris

On Thu, Feb 1, 2024 at 8:32 PM 'Abby Keller' via nimble-users <nimble...@googlegroups.com> wrote:
Hi everyone,

Just following up on this thread because I'm getting somewhat similar errors related to loop indexing as Neil back in 2021.

I am trying to loop through sites (i) and time periods (t), where sites have irregular numbers of time periods. Lines 108-109 in the attached code read:
for(i in 1:nsite){
    for(t in 2:totalt[i]){
}}

'totalt' is a vector of length 'nsite'.

If I replace this with the following, the model builds with no error:
for(i in 1:nsite){
    for(t in 2:min(totalt)){
}}

So I'm pretty certain there's something strange happening with totalt[i]. I have 'totalt' as a constant in the model, and there are no 0s in 'totalt'. I'm a bit baffled by this error and am having a hard time troubleshooting. I have attached code in case there is not something obviously off about this set-up. I have also attached most of the data, except 'counts' because it is 24 MB.

Thanks so much as always. Having this Google group as a resource is so amazing.

Best,
Abby Keller

Abby Keller

unread,
Feb 4, 2024, 6:20:47 PM2/4/24
to paci...@stat.berkeley.edu, nimble-users
Ok, that makes sense. I suppose I'm still confused because the 0s in totalo should never be looped through because they are all beyond total[i] for every i. 

For example, when I did not replace NA with 0 in totalo and do:
for(i in 1:nsite){
  if(totalt[i]==sum(!is.na(totalo[,i]))){
    print(TRUE)
  }
  }

I get all TRUE.

Do you suggest a different data structure for totalo?

Abby Keller
PhD Student
Environmental Science, Policy, and Management
(she/her)

Chris Paciorek

unread,
Feb 5, 2024, 10:50:00 AM2/5/24
to Abby Keller, nimble-users
Ah, that makes sense. Clearly my quick glance wasn't helpful.

Can you provide me with the counts object (say, off-list) so I can run the code or create a smaller example that illustrates the problem?

-chris

Abby Keller

unread,
Feb 5, 2024, 6:50:36 PM2/5/24
to nimble-users
Just following up here based on some off-list discussion.

The problem ended up being that a few elements of modelo[t,i] were equal to 1. The values of modelo[t,i] were used as inputs in some nimbleFunctions as the dimensions of arrays. So when nobs = modelo[t,i] == 1, the following code in a nimble function would create a one-dimensional array of dimensions [nsize], rather than a two-dimensional array of dimensions [1, nsize]:

#create empty array
array <- array(init=FALSE,dim=c(nobs,nsize))
Reply all
Reply to author
Forward
0 new messages