recommendations for posterior predictive nodes on multivariate distributions?

133 views
Skip to first unread message

Scott Brown

unread,
Sep 13, 2023, 11:42:37 AM9/13/23
to nimble-users
Hello,

I am training a basic Gaussian Process model:

gaussianProcess <- nimble::nimbleCode({
  mu0 ~ dnorm(0, sd = 100)
  sigma ~ dunif(0, 100)
  rho ~ dunif(0, 5000)

  ones[1:n] <- rep(1, n)
  mu[1:n] <- mu0*ones[1:n]
  cov[1:n, 1:n] <- expcov(sqDists[1:n, 1:n], rho, sigma, nugget)
 
  s[1:n] ~ dmnorm(mu[1:n], cov = cov[1:n, 1:n])
})

I have observations for some subset of the `s`, and would like to use my MCMC run to sample the remainder.  When I pass in `NA` values for the points I would like to sample, this doesn't work as expected, because, per the docs:

Sometimes one needs a model variable to have a mix of data and non-data, often due to missing data values. In NIMBLE, when data values are provided, any nodes with NA values will not be labeled as data. A node following a multivariate distribution must be either entirely observed or entirely missing.

What is the recommended way to sample my posterior predictive distribution for the unobserved values of `s`?  I could train this model on only the observed sites, then sample from the posterior predictive for `sigma`, `rho`, and `mu0`, and use those values to `rmnorm` for `s`, but this seems like a hassle.  Is there a better way to accomplish this?

Cheers,
-Scott

Matthijs Hollanders

unread,
Sep 13, 2023, 12:39:10 PM9/13/23
to Scott Brown, nimble-users
Hey Scott,

Normally nimble would assign the relevant posterior predictive samplers to those nodes but IIRC there isn’t one (yet) for multivariate normal. I think your rmnorm() method as described is probably what you want to do…

Also, the Cholesky version is faster! You’ll need:

cov_chol[1:n, 1:n] <- chol(cov[1:n, 1:n])
s[1:n] ~ dmnorm(mu[1:n], cholesky = cov_chol[1:n, 1:n], prec_param = 0)


Cheers,

Matt

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/6b8e2430-3307-4f0b-ad28-5013b5e13d5cn%40googlegroups.com.

Scott Brown

unread,
Sep 13, 2023, 1:50:23 PM9/13/23
to nimble-users
"the cholesky version is faster"  It looks like [the implementation of `dnorm`](https://github.com/nimble-dev/nimble/blob/02cffc9fc107a5a1c8f2fd5d938d7558b0c0c184/packages/nimble/R/distributions_inputList.R#L206C34-L206C89) does exactly this sequential call of `chol` and `dnorm_chol`.  Is there a reason why explicitly calling `chol` in the model is faster?

Matthijs Hollanders

unread,
Sep 13, 2023, 4:00:41 PM9/13/23
to Scott Brown, nimble-users
I may have been wrong there. It may be faster for classic covariance matrices where you can place priors on the SDs and Cholesky’s of the corr matrices. I know what I said is true in Stan but I think they do things differently. 

Perry de Valpine

unread,
Sep 15, 2023, 9:42:02 AM9/15/23
to Matthijs Hollanders, Scott Brown, nimble-users
Hi Scott and Matt,

Thanks for the question and discussion.

To give a little more general explanation in case it is of interest to anyone following this: A fundamental difference between nimble and Stan is that nimble represents the model as a set of graphical relationships (nodes, which are sometimes called vertices, and edges), whereas I believe Stan has a single function to compute the log probability for the entire model, always all together. This means that, in nimble, samplers (or other algorithms) can set themselves up to calculate only the parts of the model they need, and this avoids recalculation when it is unnecessary (whereas in Stan the entire model is always being sampled jointly anyway). In some cases, like dmnorm, nimbleModel automatically creates a node to help avoid recalculations when algorithms use the model (and for other reasons). These are called lifted nodes. In the case of dmnorm, whether using covariance or precision parameterization, there will be a lifted node for the Cholesky. So, in effect, creating the Cholesky explicitly or not should be totally equivalent. The User Manual explains more about lifted nodes.

As for the harder question about sampling incompletely observed multivariate normals, unfortunately there is not an automatic way to do that in nimble. I think you'd have to write out the conditional distribution steps yourself and implement them as an R function or a nimbleFunction. This has been on our radar but is not something we've made any kind of more clever system for yet.

HTH
Perry


Reply all
Reply to author
Forward
0 new messages