Sum all the elements within a matrix / array

373 views
Skip to first unread message

Mattia Falaschi

unread,
Mar 24, 2022, 11:15:20 AM3/24/22
to nimble-users
I was trying to implement a posterior predictive check in an occupancy model, following Kéry and Shaub 2012 book (Bayesian population analysis using WinBUGS).

I tried to implement their code in nimble, and at a certain point I define "f", which has three dimensions i, j, and k.
The problem is that when I try to calculate sum(f), I get the following error message:

"Error: Error: This part of NIMBLE is still under development.
The model definition included the expression 'f[, , ]', which contains missing indices.
There are two options to resolve this:
(1) Explicitly provide the missing indices in the model definition (e.g., 'f[1:10, 1:10, 1:10]'), or
(2) Provide the dimensions of variable 'f' via the 'dimensions' argument to nimbleModel(), e.g., nimbleModel(code, dimensions = list(f = c(dim1_max, dim2_max, dim3_max)))
Thanks for bearing with us."

I tried both to supply dimensions with sum(f[1:i, 1:j, 1:k]) or through the dimensions argument but it still does not work.
I also tried to create a nimble function to do the sum of the array, but I miserably failed.

I solved this by manually calculating the sum outside nimble, but I'd like to directly do it within the model to avoid saving dozens of additional parameters that I do not need.

Thanks for your support! 

Daniel Turek

unread,
Mar 24, 2022, 11:52:44 AM3/24/22
to Mattia Falaschi, nimble-users
Mattia, are you able to provide a self-contained reproducible example code demonstrating the failure?

Providing the indices as "sum(f[1:i, 1:j, 1:k])" should work, so I'm not sure what the issue was.  But without being able to look at code, it's difficult to diagnose.  Feel free to send an example off-list, if you'd prefer not to share the full model.

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/28e055ef-8eb1-4e7e-b8e5-2f0da4f29935n%40googlegroups.com.

Matthijs Hollanders

unread,
Mar 24, 2022, 3:04:58 PM3/24/22
to db...@williams.edu, Mattia Falaschi, nimble-users
Hey Mattia. Just quickly, did you sum over 1:i, 1:j, and 1:k or 1:n.ind, 1:n.survey, etc, where n.ind and n.survey are the total number of elements in rows and columns?

Mattia Falaschi

unread,
Mar 25, 2022, 5:03:10 AM3/25/22
to nimble-users
Hi Daniel, maybe the issue is due to missing values?
Here is a simplified version of my model with a simulated dataset.
I randomly added missing values to try to mimic my dataset as much as possible.

setwd("C:/Users/matt_/Desktop/nimble group")

library(nimble)

# Create dataset
set.seed(1234)
nsite <- 150
nrep <- 8
nyears <- 2
pa <- rbinom(nsite*nrep*nyears, 1, 0.8)
y <- array(pa, dim = c(150, 8, 2))
# Randomly assign NAs
for(i in 1:nsite){
  for(k in 1:nyears){
    s <- sample(1:(nrep+1),1)
    if(s < 9){ y[i,s:nrep,k] <- NA}
  }
}
# Check that sites contain data at least in one of the two years
na_y1 <- which(is.na(y[,1,1]))
na_y2 <- which(is.na(y[,1,2]))
na12 <- na_y1[na_y1%in%na_y2]
y[na12,1,1] <- 1 # Assign 1 to first year for sites with NA both years

# Model
occupancy_static <- nimbleCode({
 
  ## PRIORS ##
  # DETECTION
  p_int~dnorm(0,0.01)

  # OCCUPANCY
  for(i in 1:nyears){
   psi_year[i] ~ dnorm(0,0.1)  
  }

  ## Ecological model
  for (i in 1:nsite){
    for(k in 1:nyears){
    z[i,k] ~ dbern(psi[i,k])
    logit(psi[i,k]) <- psi_year[year[k]]
    } #k
  } #i

  # Observation model
  for (i in 1:nsite){
    for(k in 1:nyears){
      for(j in 1:J[i,k]){
        muy[i,j,k] <- z[i,k]*p[i,j,k]
        logit(p[i,j,k]) <- p_int
        y[i,j,k] ~ dbern(muy[i,j,k])
       
        # Goodness of Fit
        f[i,j,k] <- abs(y[i,j,k] - p[i,j,k]) # Discrepancy for real data
        y.new[i,j,k] ~ dbern(muy[i,j,k]) # Simulate data
        f.new[i,j,k] <- abs(y.new[i,j,k] - p[i,j,k]) # Discrepancy for simulated data
       
      } #j
    } #k
  } #i
 
  f.sum <- sum(f[,,])

})

# Create an object used to exclune NA samples
J <- matrix(NA, nrow = nrow(y), ncol=2)
for (i in 1:nrow(y)){
  for(k in 1:2){
    if (sum(is.na(y[i,,k])) != ncol(y)) J[i,k] <- ncol(y) - sum(is.na(y[i,,k]))
    else J[i,k] <- NA  
  }
 
}
J[is.na(J)]<-1

# Bundle data
win.data <- list(y = y)
constants <- list(nsite = dim(y)[1], nrep = dim(y)[2], J = J, nyears = dim(y)[3],year = 1:dim(y)[3])

# List of parameters to monitor
params<-c("psi_year","p_int")

# List of inits
inits <-list(psi_year = rnorm(2,0,1),p_int = rnorm(1,0,1))

# MCMC settings
ni <- 4*10^3
nb <- ni*3/4
nt <- (ni-nb)/1000 # Extract 1000 posterior for each chain
nc <- 3

# Nimble model
Rmodel <- nimbleModel(code=occupancy_static, constants=constants, data=win.data, inits = inits)
Rmodel$initializeInfo()
# mcmc
conf <- configureMCMC(Rmodel, print = F)
conf$addMonitors(c("f","f.new"))
Rmcmc <- buildMCMC(conf)
# Compile the model and MCMC algorithm
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

# RUN THE MODEL
start.time <- Sys.time()
out <- runMCMC(mcmc = Cmcmc, niter = ni, nburnin = nb, nchains = nc, thin = nt,
               samplesAsCodaMCMC = T, inits = inits)
Sys.time()-start.time


Matthijs, I tried f.sum <- sum(f[1:150,1:8,1:2]) but it still does not work. 

Daniel Turek

unread,
Mar 25, 2022, 9:29:48 AM3/25/22
to Mattia Falaschi, nimble-users
Mattia, I'm not sure what we're doing differently.  I just tried the code below, and it worked through to building the R model.  See the modified final line of the model code, with the sum() which reads:

    f.sum <- sum(f[1:150,1:8,1:2])

Can you try running the code, just as below, and see if it works for you?


library(nimble)

# Create dataset
set.seed(1234)
nsite <- 150
nrep <- 8
nyears <- 2
pa <- rbinom(nsite*nrep*nyears, 1, 0.8)
y <- array(pa, dim = c(150, 8, 2))
# Randomly assign NAs
for(i in 1:nsite){
    for(k in 1:nyears){
        s <- sample(1:(nrep+1),1)
        if(s < 9){ y[i,s:nrep,k] <- NA}
    }
}
# Check that sites contain data at least in one of the two years
na_y1 <- which(is.na(y[,1,1]))
na_y2 <- which(is.na(y[,1,2]))
na12 <- na_y1[na_y1%in%na_y2]
y[na12,1,1] <- 1 # Assign 1 to first year for sites with NA both years

# Model
code <- occupancy_static <- nimbleCode({
    ##f.sum <- sum(f[,,])

    f.sum <- sum(f[1:150,1:8,1:2])
})

# Create an object used to exclune NA samples
J <- matrix(NA, nrow = nrow(y), ncol=2)
for (i in 1:nrow(y)){
    for(k in 1:2){
        if (sum(is.na(y[i,,k])) != ncol(y)) J[i,k] <- ncol(y) - sum(is.na(y[i,,k]))
        else J[i,k] <- NA  
    }
}
J[is.na(J)]<-1

# Bundle data
data <- win.data <- list(y = y)


constants <- list(nsite = dim(y)[1], nrep = dim(y)[2], J = J, nyears = dim(y)[3],year = 1:dim(y)[3])

# List of parameters to monitor
params<-c("psi_year","p_int")

# List of inits
set.seed(0)

inits <-list(psi_year = rnorm(2,0,1),p_int = rnorm(1,0,1))

# Nimble model
Rmodel <- nimbleModel(code=occupancy_static, constants=constants, data=win.data, inits = inits)


## > Rmodel <- nimbleModel(code=occupancy_static, constants=constants, data=win.data, inits = inits)
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions
##   [Note] This model is not fully initialized. This is not an error.
##          To see which variables are not initialized, use model$initializeInfo().
##          For more information on model initialization, see help(modelInitialization).




Mattia Falaschi

unread,
Mar 25, 2022, 11:54:07 AM3/25/22
to nimble-users
Hi Daniel, 
until that point everything is fine but the problem comes later when compiling the model.
So, I run the following lines and I get an error when creating "Cmodel" object:
# Nimble model
Rmodel <- nimbleModel(code=occupancy_static, constants=constants, data=win.data, inits = inits)
# Nimble model

Rmodel$initializeInfo()
# mcmc
conf <- configureMCMC(Rmodel, print = F)
conf$addMonitors(c("f","f.new"))
Rmcmc <- buildMCMC(conf)
# Compile the model and MCMC algorithm
Cmodel <- compileNimble(Rmodel)


At this point I get this error message:
> Cmodel <- compileNimble(Rmodel)
Compiling
[Note] This may take a minute.
[Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
Error in eigenizeNameStrided(code, symTab, typeEnv, workEnv) :
  Error, cannot eigenize a map of dimensions > 2
Error: There is some problem at the Eigen processing step for this code:
{
  model$f.sum[1] <<- sum(model$f[1:150, 1:8, 1:2])
}


I don't know how to interpret this message.

Besides this, I also tried Matt's suggestion to sum over J[i,k].
Because the observational model is built as following, there are some non existing nodes, and this might be the reason for the original error message.
# Observation model
  for (i in 1:nsite){
    for(k in 1:nyears){
      for(j in 1:J[i,k]){
        muy[i,j,k] <- z[i,k]*p[i,j,k]
        logit(p[i,j,k]) <- p_int
        y[i,j,k] ~ dbern(muy[i,j,k])
       
        # Goodness of Fit
        f[i,j,k] <- abs(y[i,j,k] - p[i,j,k]) # Discrepancy for real data
        y.new[i,j,k] ~ dbern(muy[i,j,k]) # Simulate data
        f.new[i,j,k] <- abs(y.new[i,j,k] - p[i,j,k]) # Discrepancy for simulated data
       
      } #j
    } #k
  } #i


So, I tried to sum indexing by J, as suggested by Matt:
f.sum <- sum(f[1:nsite, 1:J[i,k], 1:nyears])
but I still get error message:
> Rmodel <- nimbleModel(code=occupancy_static, constants=constants, data=win.data, inits = inits)
Defining model
Error in getSymbolicParentNodesRecurse(x, constNames, indexNames, nimbleFunctionNames,  :
                                         getSymbolicParentNodesRecurse: dynamic indexing of constants is not allowed in J[i, k]. Try adding the dynamically-indexed constant as data instead (using the data argument of nimbleModel).


Forgive me if anything is a bit confusing.
I will try to look into it again in the next few days.

Matthijs Hollanders

unread,
Mar 27, 2022, 2:44:52 AM3/27/22
to nimble-users
Hey Mattia,

I reckon my suggested approach doesn't work because you're summing over J[i,k] when i and k aren't referenced in a loop. My apologies for that. 

But, I do recognize your Eigen error. I believe nimble only allows you to do operations like sums in 2 dimensions at a time, whereas you're trying to do 3. So you can bypass this by having an intermediate step, f.sum1[k] <- sum(f[1:150,1:8, k], and then the final step, f.sum2 <- sum(f.sum1[1:k]). There might still be issues with your indexed J[i,k] approach, where I do seem to remember having missing values can be problematic for the sums. But I'm not sure. Actually, now that I think about it, I wonder if the following would work:

for(i in 1:nsite){
   for(k in 1:nyears){
      f.sum1[i,k] <- sum(f[i,1:J[i,k],k])  # Sum over J[i,k] for each site and year
   }
}
f.sum2 <- sum(f.sum1[1:nsite,1:nyears])   

Cheers,
Matt

Message has been deleted

Mattia Falaschi

unread,
Mar 28, 2022, 2:56:17 AM3/28/22
to nimble-users
Hi Matt,
as you suspected, the problem was due to the missing nodes because of the indexing by J[i,k].
The two-step sum you suggested works fine, and this is the updated model:
  for(i in 1:nsite){
   for(k in 1:nyears){
     fsum[i,k]<- sum(f[i, 1:J[i,k], k])
     nsum[i,k] <- sum(f.new[i, 1:J[i,k], k])
   }
  }
  sum.f <- sum(ysum[1:nsite,1:nyears])
  sum.n <- sum(esum[1:nsite,1:nyears])

})


Thank you very much for your help!

Mattia

Reply all
Reply to author
Forward
0 new messages