calculate single WAIC value from model run in parallel

470 views
Skip to first unread message

Catrin Eden

unread,
Jul 27, 2021, 10:40:54 AM7/27/21
to nimble-users
Please excuse my ignorance on this issue, I am new to both Bayesian and Nimble, so running into a few problems.

I have run some models in parallel using the nimbleMCMC function, but cannot find a way to calculate a single WAIC value for the combined chains, instead I am given a WAIC value for each chain. Is there a way to compile the chains and calculate WAIC after running the model with nimbleMCMC? I have the chains saved and it would be ideal if I didn't need to run the models again.

If there is no other option, I have managed to compile the model separately (although I am unfamiliar with this method) and compiled the chains into Cmcmc$mvsamples to calculate WAIC. However, I have no idea whether this is accurate as I had to compile the model first in the global environment, then run the parallel function (which is set to compile the model also), as the Cmcmc object is not available in my global environment after running parLapply (example code below). My questions with regards to this are 1. is the method I have used correct? 2. Is there a way to do this so that I do not have to compile the model outside of the parallel function?

Your help and patience would be much appreciated!

Example code:

## I have to run this first to get a CmyMCMC object in global environment
  myCode <- nimbleCode({
    a ~ dunif(0, 100)
    b ~ dnorm(0, 100)
  
    for (i in 1:length_y) {
      y[i] ~ dgamma(shape = a, rate = b)
    }
  })
  
  myModel <- nimbleModel(code = myCode,
                          data = list(y = data),
                          constants = list(length_y = 1000),
                          inits = list(a = 0.5, b = 0.5))
  
  CmyModel <- compileNimble(myModel)
  
  myMCMC <- buildMCMC(CmyModel)
  CmyMCMC <- compileNimble(myMCMC)

## then I have to run it all again in parallel
# make cluster
this_cluster <- makeCluster(2)
seed <- set.seed(1234)
# Create a function with all the needed code
run_MCMC_allcode <- function(seed, data) {
  library(nimble)
  
  myCode <- nimbleCode({
    a ~ dunif(0, 100)
    b ~ dnorm(0, 100)
  
    for (i in 1:length_y) {
      y[i] ~ dgamma(shape = a, rate = b)
    }
  })
  
  myModel <- nimbleModel(code = myCode,
                          data = list(y = data),
                          constants = list(length_y = 1000),
                          inits = list(a = 0.5, b = 0.5))
  
  CmyModel <- compileNimble(myModel)
  
  myMCMC <- buildMCMC(CmyModel)
  CmyMCMC <- compileNimble(myMCMC)
  
  results <- runMCMC(CmyMCMC, niter = 10000, setSeed = seed)
  
  return(results)
}

chain_output <- parLapply(cl = this_cluster, X = 1:2,
                          fun = run_MCMC_allcode,
                          data = my.data)
stopCluster(this_cluster)

posteriorSamplesMatrix <- rbind(chain_output[[1]], chain_output[[2]])
colnames(posteriorSamplesMatrix) <- colnames(chain_output[[1]])
nimble:::matrix2mv(posteriorSamplesMatrix, CmyMCMC$mvSamples)
CmyMCMC$calculateWAIC()

Daniel Turek

unread,
Jul 29, 2021, 6:31:31 PM7/29/21
to Catrin Eden, nimble-users
Catrin, thanks for reaching out with a good question.

You're correct, you're run into a shortcoming of NIMBLE's current system for WAIC.  We're aware of this, and there's recently been much development effort put towards a new approach to WAIC calculations in NIMBLE.

That said, you've done a nice job working around the current short-comings.  Just a few comments:

- Using your approach, yes, you unfortunately do have to compile the model and the MCMC in R's global environment, to use the compiled version of the mcmc$calculateWAIC() method.

- Using the toy model you provided, the parameters of the gamma distribution (shape, rate) for y may not be well recovered, because of the highly informative prior (dnorm(0, 100)) that is used for "b".  Note that this implies a normal *precision* of tau = 100, which is equivalent to normal variance = 1/100 = 0.01, or a normal standard deviation of sigma = 0.1.  So, if this toy model doesn't work well, that might be why.  Perhaps you meant to use a dunif(0, 100) for "b", the same as for "a" ?

- Using your approach, you'll either need to set enableWAIC = TRUE when you create the global version of the MCMC:
myMCMC <- buildMCMC(CmyModel, enableWAIC = TRUE)
or otherwise, you'll need to set this variable to TRUE manually, before running the CmyMCMC$calculateWAIC() method:
CmyMCMC$enableWAIC <- TRUE

- Using your approach, you'll also have to execute the global version of the CmyMCMC mcmc for some non-zero number of MCMC iterations, prior to calling nimble:::matrix2mv.  Running the CmyMCMC for some number of iterations initializes an internal variable (thinToUseVec) inside the MCMC, which allows calculateWAIC to operate correctly (otherwise, I was running into a segmentation fault at the time of running calculateWAIC - did you encounter this?).  So, your final workflow would look more like:

.....
posteriorSamplesMatrix <- rbind(chain_output[[1]], chain_output[[2]])
CmyMCMC$run(5)   ## non-zero number of iterations
nimble:::matrix2mv(posteriorSamplesMatrix, CmyMCMC$mvSamples)
CmyMCMC$enableWAIC <- TRUE
CmyMCMC$calculateWAIC()

I'm sorry this is a few more steps, but you mostly have it just right.  And we also apologize for these known short-comings in the current WAIC system, which as I said, are undergoing development at present.

Please keep in touch if this works for you.

Cheers,
Daniel



--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/16d7a70c-1ded-4ea8-875a-cebcfb74f699n%40googlegroups.com.

Catrin Eden

unread,
Aug 10, 2021, 9:49:31 AM8/10/21
to nimble-users
Thanks for your response, Daniel. This has worked well for me. The model example I provided wasn't the one I was using, it was for illustrative purposes only as the model I'm using is a multi-state survival model and would have taken up a lot of space.

The additions you suggested worked well for me, I had also been encountering a fatal error and running some iterations fixed this. I was also able to get WAIC for the models I had already run with nimbleMCMC using the saved chains and this method, which saved me a lot of time!
Thanks again,
Catrin

Message has been deleted

Michael Roast

unread,
Dec 10, 2021, 8:05:24 AM12/10/21
to nimble-users
I think I am having the same/similar issue as the original post here. I'm aiming to have a WAIC value calculated from all chains run in parallel. Nimble has since updated with the online method of WAIC calculation, so I have been able to compute a WAIC efficiently within my parallel model function, which then gets repeated in parallel using parLapply() and I end up with essentially a perChainWAIC. Is there a valid way to 'average' these into a single value? (or can these not be used beyond "as a means of helping assess the stability of the WAIC estimate"? as in runMCMC help) Alternatively, the workaround method suggested in this thread is not working for me, for some reason that I don't understand? I don't include here my parallel function code and parLapply, but the chain_output objects come directly from that, with longer chains. This code is setting and running the model for a non-zero number of iterations in the global environment. 

Thank you in advance if you can offer any advice or solutions,

Michael


y <- matrix(sample(c(0,1),120,replace = TRUE),ncol = 6, nrow = 20)
first <- apply(y, 1, function(x) min(which(x!= 0),na.rm = TRUE))

hmm.model<- nimbleCode({
 
  # Priors
  phi ~ dunif(0,1)    # prior for survival φ
  p ~ dunif(0,1)      # prior for detection p
 
  # Vector of initial state probabilities δ
  delta[1] <- 1     # Alive
  delta[2] <- 0     # Dead
 
  # Transition matrix Γ
  gamma[1,1] <- phi     # transition Pr(alive T t --> alive T t+1)
  gamma[1,2] <- 1-phi    # transition Pr(alive T t --> alive R t+1)
  gamma[2,1] <- 0     # transition Pr(alive R t --> alive T t+1)
  gamma[2,2] <- 1   # transition Pr(alive R t --> alive R t+1)
 
  # Observation matrix Ω
  omega[1,1] <- 1 - p # observation Pr(alive T t --> not detected t)
  omega[1,2] <- p     # observation Pr(alive T t --> detected t)
  omega[2,1] <- 1 # observation Pr(alive R t --> not detected t)
  omega[2,2] <- 0     # observation Pr(alive R t --> detected t)
 
  # Likelihood
  for(i in 1:N){    # Loops over each individual (row)
    z[i,first[i]] ~ dcat(delta[1:2])  # Draws an initial state  
    for(j in (first[i]+1):T){    # Loops through each time point (column)
      z[i,j] ~ dcat(gamma[z[i,j-1], 1:2]) # Draws current state based on previous state
      y[i,j] ~ dcat(omega[z[i,j], 1:2])  # Draws an observation based on current state.
    }
  }
})

# Defines constants in the model
my.constants <- list(N = nrow(y),   # N individual
                     T = ncol(y),   # N of time points
                     first = first) # occasions of first capture

# Defines data in the model
# To remove 0s. 1 is non-detected, 2 is detected
my.data <- list(y = y+1)

# Defines initial values for states and probabilities
zinits <- y + 1
zinits[zinits == 2] <- 1

initial.values <-  list(phi = runif(1,0,1),
                        p = runif(1,0,1),
                        z = zinits)

# Specfies the parameters of interest to record in output
parameters.to.save <- c("phi","p")

#MCMC model details
n.iter <- 10 # non-zero number of iterations
n.burnin <- 2

myModel <- nimbleModel(code = hmm.model,        
                       constants = my.constants,      
                       data = my.data,
                       inits = initial.values)

CmyModel <- compileNimble(myModel)

myMCMC <- buildMCMC(CmyModel, monitors = parameters.to.save, enableWAIC = TRUE)

CmyMCMC <- compileNimble(myMCMC)

results <- runMCMC(CmyMCMC, niter = n.iter, nburnin = n.burnin, nchains = 1)


posteriorSamplesMatrix <- rbind(chain_output[[1]],
                                chain_output[[2]],
                                chain_output[[3]],
                                chain_output[[4]],
                                chain_output[[5]])


CmyMCMC$run(5)   ## non-zero number of iterations
nimble:::matrix2mv(posteriorSamplesMatrix, CmyMCMC$mvSamples)
CmyMCMC$enableWAIC <- TRUE
CmyMCMC$calculateWAIC()

## The following gets returned:
> CmyMCMC$mvSamples CmodelValues object with variables: p, phi.
> CmyMCMC$run(5)  
NULL 
> CmyMCMC$calculateWAIC() 
NA
# Existing chain ouput
> head(chain_output[[1]]) p phi [1,] 0.5465156 0.9609154 [2,] 0.5465156 0.9609154 [3,] 0.5465156 0.9609154 [4,] 0.5465156 0.9609154 [5,] 0.5465156 0.9609154 [6,] 0.5465156 0.9609154 > str(chain_output[[1]]) num [1:800, 1:2] 0.547 0.547 0.547 0.547 0.547 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "p" "phi"

Chris Paciorek

unread,
Dec 10, 2021, 6:58:11 PM12/10/21
to Michael Roast, nimble-users
Hi Michael,

If you use runMCMC or nimbleMCMC with multiple chains, we'll compute a single WAIC from all the chains by default.

Unfortunately, if you are running in parallel, you'll have to do a bit of extra work to get a single WAIC using the samples from your separately-run parallel chains. You should use the current nimble (version 0.12.1). This will use an "offline" version of WAIC that does all the WAIC calculations after the fact, rather than the default WAIC that does calculations at each iteration.

1) Set up your MCMC so that you monitor all of the variables that are involved in the likelihood. Do not set the WAIC argument to runMCMC or nimbleMCMC, just let it default to FALSE.

2) Once you've run the MCMC chains, collect the samples from them as separate matrices. Discard any burn-in, if you haven't already, and then rbind the matrices together so you have as many rows as the total number of samples across all the chains. Then use the `calculateWAIC` function, passing the matrix of samples as the first argument and the model object as the second argument. 

-chris


Tomas T.

unread,
Dec 13, 2021, 9:37:58 AM12/13/21
to Chris Paciorek, Michael Roast, nimble-users
Dear Chris, thanks for this info, so is it not correct to just simply average the WAIC, lppd and pWAIC values from parallel chains? As we do with other estimates? That's what I've been doing up till now. When the model converges, the values from the chains are usually very similar. Perhaps this average should then converge to the correctly calculated WAIC?

I am asking because the approach you described (to monitor all nodes) might result in quite huge sample matrices, much bigger than the values we actually want to monitor. This sound quite inconvenient, so I am trying to find out if the lazy solution wouldn't just work nicely for well-converged models.

Thanks,

Tomas

Chris Paciorek

unread,
Dec 13, 2021, 10:52:10 AM12/13/21
to Tomas T., Michael Roast, nimble-users
Hi Tomas,

This is a good reference to see how WAIC is calculated and what the impact of averaging the per chain values would be. It won't get the same values as treating it as a single chain.

-chris

Michael Roast

unread,
Dec 14, 2021, 6:40:36 AM12/14/21
to Chris Paciorek, nimble-users
Hi Chris, 

Thanks again for your solution and taking the time to help, it all makes sense to me. This works fine for the toy model I made, but I don't think it will be feasible with the hidden markov models (HMM) that I'm using on my much larger real dataset. Parallel computation of chains improves the processing speed, and calculating WAIC using the online method reduces the memory requirement, but if all the latent state nodes of a HMM must still be monitored during parallel computing of chains, then the memory is still a limiting factor that makes post-hoc WAIC calculation across all chains this way unfeasible. I guess there is still a computational limitation at the moment, where you can optimise processing or memory, but not at the same time to calculate the WAIC across chains for these models.

Many thanks,

Michael





On Sat, 11 Dec 2021 at 00:58, Chris Paciorek <christophe...@gmail.com> wrote:


--
Dr Michael J Roast
BSc, PGCertEd, MSc, PhD
Tw: @Roasty247

Chris Paciorek

unread,
Dec 14, 2021, 12:01:18 PM12/14/21
to Michael Roast, nimble-users
Hi Michael, a good option may be monitor the latent states with increased thinning to reduce memory use and then calculate WAIC from all the chains based on this reduced number of saved iterations. You can use "monitors2" and "thin2" to do this. 

-chris

frederic....@gmail.com

unread,
Apr 15, 2024, 9:50:51 AM4/15/24
to nimble-users
Dear all,

in echo to this thread, I have found some surprising results with the code proposed by Michael. In sum, I have found out at least for me that the sequence:

nimble:::matrix2mv(posteriorSamplesMatrix, CmyMCMC$mvSamples)
CmyMCMC$enableWAIC <- TRUE
CmyMCMC$calculateWAIC()

does not do what I thought it was doing, since the WAIC value before and after this code is the same which it should not be. I would suggest replace this with:

calculateWAIC(posteriorSamplesMatrix, CmyModel)

but then the code provided by Michael lacks some parameters (z) for WAIC to be calculated.


This therefore questions me on what does the above code, as well as what does a similar code in runMCMC & so on. 


Sincerely,


Frédéric


################################################################

########### Here is the first code (Michael's adapted):

##################################################################


library(nimble)

 

n.iter <- 100 # non-zero number of iterations


n.burnin <- 2

myModel <- nimbleModel(code = hmm.model,        
                       constants = my.constants,      
                       data = my.data,
                       inits = initial.values)

CmyModel <- compileNimble(myModel)

myMCMC <- buildMCMC(CmyModel, monitors = parameters.to.save, enableWAIC = TRUE)

CmyMCMC <- compileNimble(myMCMC)

results <- runMCMC(CmyMCMC, niter = n.iter, nburnin = n.burnin, nchains = 1)



posteriorSamplesMatrix <-
results[8:20,]

 

CmyMCMC$getWAIC()



CmyMCMC$run(5)   ## non-zero number of iterations

CmyMCMC$getWAIC()



nimble:::matrix2mv(posteriorSamplesMatrix, CmyMCMC$mvSamples)
CmyMCMC$enableWAIC <- TRUE
CmyMCMC$calculateWAIC()

CmyMCMC$getWAIC()

 

calculateWAIC(posteriorSamplesMatrix, CmyModel)

## Error with the following message:

# To calculate WAIC in NIMBLE, all parameters of data nodes in the model must be monitored.

  # Currently, the following parameters are not monitored: z



#############################################################

##### Second code with saving z:

#######################################################

 library(nimble)

 

parameters.to.save <- c("phi","p","z")

#MCMC model details
n.iter <- 100 # non-zero number of iterations


n.burnin <- 2

myModel <- nimbleModel(code = hmm.model,        
                       constants = my.constants,      
                       data = my.data,
                       inits = initial.values)

CmyModel <- compileNimble(myModel)

myMCMC <- buildMCMC(CmyModel, monitors = parameters.to.save, enableWAIC = TRUE)

CmyMCMC <- compileNimble(myMCMC)

results <- runMCMC(CmyMCMC, niter = n.iter, nburnin = n.burnin, nchains = 1)



posteriorSamplesMatrix <-
results[8:20,]

 

CmyMCMC$getWAIC()



CmyMCMC$run(5)   ## non-zero number of iterations

CmyMCMC$getWAIC()



nimble:::matrix2mv(posteriorSamplesMatrix, CmyMCMC$mvSamples)
CmyMCMC$enableWAIC <- TRUE
CmyMCMC$calculateWAIC()

CmyMCMC$getWAIC()

 

calculateWAIC(posteriorSamplesMatrix, CmyModel)

##OK


Chris Paciorek

unread,
Apr 16, 2024, 9:14:48 PM4/16/24
to frederic....@gmail.com, nimble-users
Hi Frédéric,

If you are wanting to calculate WAIC from a saved set of samples, then you're right that you need to run the `calculateWAIC` function. And for that you need to save all parameters needed for the WAIC calculation (generally stochastic parents of data nodes, but not stochastic parents of those parents).

The `calculateWAIC` method of the MCMC object is an internal method not intended for use by users and (I hope) not documented anywhere as being something users would use.

I don't think I will try to recover/interpret the history of the whole discussion unless it ends up being important. I suspect that when we were responding to Michael we overlooked a chunk of code that is not what one would want to use.

-chris

Reply all
Reply to author
Forward
0 new messages