How to implement the truncated multivariate normal distribution?

1,765 views
Skip to first unread message

Greg W

unread,
Sep 10, 2014, 3:04:19 PM9/10/14
to stan-...@googlegroups.com
I'd like to fit a truncated multivariate normal distribution. My model block would look like

model {
  y
~ left_truncated_multi_normal_cholesky(mu, L, lower_bound);
}

But that sampling statement (in bold) isn't available. So my next idea was:

functions {
  real left_truncated_multi_normal_cholesky_log
(vector y, vector mu, matrix L, vector lower_bound) {
   
return( multi_normal_cholesky_log(y, mu, L) - (1 - multi_normal_cholesky_cdf_log(lower_bound, mu, L)) );
    # I'm not sure if this is even the correct syntax, but hopefully it gets the idea across
  }
}
model
{
  y
~ left_truncated_multi_normal_cholesky(mu, L, lower_bound);
}

But those aren't available either.

The former would be easy to implement (I think).

The latter, maybe not: https://github.com/stan-dev/stan/issues/158. That request links to a paper (http://www.math.wsu.edu/faculty/genz/papers/mvn.pdf) that gives an algorithm for computing the CDF of the multivariate normal. However, that algorithm (top of page 3) depends on being able to generate standard uniform random variables. I am also aware that this is also not implemented in Stan outside of the Generated Quantities block, and I feel like it might not be a good idea to try and build a pseudorandom number generator into my Stan program. And I don't know a thing about numerical integration. So basically I have no idea how else I might go about implementing multi_normal_cholesky_cdf_log or, for that matter, left_truncated_multi_normal_cholesky.

Of course, it'd be really nice if I could write a program like

data {
 
int K;
  vector[K] y;
}
transformed data
{
  vector
[K] lower_bound;

  lower_bound
<- to_vector(rep_array(0, K));
}
parameters
{
  vector
[K] mu;
 
}
model
{
  y
~ multi_normal_cholesky(mu, L) T[lower_bound, ];
}

but I imagine that's not going to happen any time soon. Interestingly, this also seems to be impossible in JAGS, but I think it's possible in OpenBUGS. I would much rather use Stan and not OpenBUGS for this project, but as of right now I'm stuck on how.

That said, I'm only using the truncated normal for convenience and generality, not for any substantive reason. The alternative model I had in mind was a Gaussian copula with zero-inflated Gamma marginals. Should I just forget the truncation business and use that instead?

Luc Coffeng

unread,
Sep 10, 2014, 4:58:12 PM9/10/14
to stan-...@googlegroups.com
Hi Greg,

You might want to check some recent threads/posts by Tamas Papp on truncating multivariate normals / censored bivariate data. Shouldn't be hard to find using the search bar.

Luc

Greg W

unread,
Sep 10, 2014, 5:51:05 PM9/10/14
to stan-...@googlegroups.com
Thanks Luc. I did a search and found this thread: https://groups.google.com/d/topic/stan-users/PZE0tkwJ2A4/discussion. Is that what you had in mind?

I'm not sure what in that thread I can use here. It seems to be about estimating a censoring boundary. Instead, I have a known truncation boundary (it's the zero vector), and I just want a correctly normalized posterior density. That thread also doesn't seem to touch on dependence among the response variables, which in my intended use is not a throwaway nuisance parameter (hence my interest in a multivariate sampling statement). And besides, I'm not nearly clever enough to generalize the wisdom from that thread to my 7-dimensional case.

Greg W

unread,
Sep 10, 2014, 5:57:13 PM9/10/14
to stan-...@googlegroups.com
I guess I should restate my question since it might be unclear:

My likelihood is a 7-dimensional Gaussian distribution that is truncated at 0 in each dimension. All I want to do is estimate the parameters of that likelihood, the mean vector and the correlation matrix. How can I do that in Stan?

Ben Goodrich

unread,
Sep 11, 2014, 12:53:43 AM9/11/14
to stan-...@googlegroups.com
I don't think this is possible to do with Stan now or in the foreseeable future. If you were to do a logarithm transformation or some other (perhaps parameterized) transformation from the positive real numbers to the real numbers, it would be possible to do multivariate normal likelihoods for the transformed data.

Ben

Greg W

unread,
Sep 11, 2014, 10:55:28 PM9/11/14
to stan-...@googlegroups.com
Not what I was hoping to hear, but thanks. Should I take that to mean there won't be a multi_normal_cdf function any time soon?

At the very least, is there a viable alternative algorithm that doesn't require generating random numbers? I found some notes on Gauss-Hermite quadrature in the back of one of my textbooks; would that work? Or would any attempt at numerical integration written in pure Stan (as opposed to directly in C++) be too gnarly for the sampler to handle?

Andrew Gelman

unread,
Sep 11, 2014, 11:11:32 PM9/11/14
to stan-...@googlegroups.com
At this point let me invoke the folk theorem and ask whether you’re sure this is the computation you really want to do.  I have no idea, and of course sometimes people really need to do hard computation, but in my experience, a lot of times when people think they want to compute some model with sharp had constraints, it turns out that this model is itself some sort of approximation, and that underlying it all there’s a better model that’s actually easier to fit.
A

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
Sep 12, 2014, 12:42:36 AM9/12/14
to stan-...@googlegroups.com
No multi_normal_cdf either. I think we're about to merge some more numerical integration stuff for ODEs but the auto-differentiation is not trivial.

Ben

Asim Ansari

unread,
Sep 13, 2014, 7:51:37 PM9/13/14
to stan-...@googlegroups.com

Hi

This may be helpful.


Asim Ansari
Reply all
Reply to author
Forward
0 new messages