loo package for NIMBLE

353 views
Skip to first unread message

Keith Lau

unread,
May 27, 2022, 6:47:01 AM5/27/22
to nimble-users
Hello everyone,

I'm looking for an example of using loo package to obtain leave-one-out (LOO) cross-validation and WAIC. I searched the internet but seemed no exact examples found. Could anyone indicate or provide examples?

By the way, I understand the WAIC can be obtained by `WAIC = TRUE`. I'm interested to see if the WAIC value is identical to that from the loo package.

Thank you in advance!

Keith

Chris Paciorek

unread,
Jun 4, 2022, 12:21:28 PM6/4/22
to Keith Lau, nimble-users
Hi Keith, 

Sorry for the delay in responding.

I am not familiar with the package, but it should work with output from a NIMBLE MCMC in the same way as using JAGS or WinBUGS. Presumably you just need to get the samples into the right format. Is your question about how to get it into the format needed?

As far as WAIC results, yes the WAIC from nimble should be the same as the WAIC from the loo package, presuming loo implements the standard WAIC calculation (WAIC conditional on any random effects/latent processes in the model) and provided the definition of "an observation" is the same, since the WAIC calculation sums over the predictive density values  for the individual "observations".

-Chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/d8c27568-1fa8-4d7f-9284-f20bf80147a4n%40googlegroups.com.

John Clare

unread,
Jun 4, 2022, 2:32:58 PM6/4/22
to nimble-users
Hi Keith,

loo::loo or loo::waic is simply looking for an n_posterior_samples * n_observations (observation constituting whatever unit is germane to the likelihood)  matrix of log (y|parameters), or n_posterior_samples_per_chain* n_observations*n_chains array. This can be generated within the nimble model itself using something like:

for (i in 1:n){
 y[i]~dnorm(mu, sd=sigma)
loglike[i]<-dnorm(y[i], mu, sd=sigma, log=1)
}
...

where you'd trace "loglike". If you store all of the germane parameters (above, mu/sigma), you can also calculate this after the fact--a non-optimized example might have the following r loop:
loglike<-matrix(NA, nsamps, n)
for (i in 1:n){
  for (s in 1:nrow(samps)){
    loglike[s,i]<-dnorm(y[i], samps$mu[s], sd=samps$sigma[s], log=TRUE)
 }
}

Could create and compile a nimble function that does something like the above, which could be useful for using varied existing nimble functions or distributions or speeding things up.

You can also pass a function (perhaps compiled) to loo::waic that itself calls the germane data and iterative parameters and generates a matrix of log likelihood values that the subsequent waic calculations employ. loo::waic(x=<function>) requires data and parameters to be formatted in a certain way.

John

Keith Lau

unread,
Jun 14, 2022, 11:58:04 PM6/14/22
to nimble-users
Thank you for your time and the reply, Chris and John. I've followed John's instructions and confirmed that the loo package had the same offline WAIC results.

Off the point: It took me a while to get the results because I used parallel computation. As shown at https://groups.google.com/g/nimble-users/c/nSNLN7LA2dk , it requires a lot of computer memory for my model to collect all the nodes, compile the model again, and use calculateWAIC(). In comparison, to use loo package, I just need to collect the logProb_response (log likelihood for each data point) and use loo::waic() to get the same offline WAIC, which saved a lot of computer memory. It seems the parallel nimble for WAIC has no simple way to get yet (?).

John Clare 在 2022年6月5日 星期日凌晨2:32:58 [UTC+8] 的信中寫道:

Christopher Paciorek

unread,
Jan 18, 2023, 1:00:12 PM1/18/23
to nimble-users
A rather belated reply, perhaps mostly for posterity. One option would be to save the logProb values for the data points by monitoring them (e.g., if `y` is the variable containing the observations, you would monitor `logProb_y`). Then you could concatenate the output from multiple chains and provide the logProb values for the responses to `loo::waic`, rather than using `calculateWAIC`. Of course this is only viable if you don't have so many observations and MCMC samples that you use too much memory in saving all the logProb values. And if saving all the observation logProbs is viable, then monitoring all needed nodes will generally be viable as well (and use less memory).
Reply all
Reply to author
Forward
0 new messages