Clarification about Wishart/inv-Wishart distributions in Stan

1,804 views
Skip to first unread message

Christopher Candelaria

unread,
Jan 12, 2015, 3:29:09 PM1/12/15
to stan-...@googlegroups.com
I am in the process of converting a longitudinal model of student performance from WinBUGS to Stan. I am confused about the specification of a Wishart prior in Stan. I reference the "Jaws: repeated measures analysis of variance" example, as it helps point to my source of confusion.  

When using BUGS, I know that you specify distributions in terms of precision and not variance. In a multivariate model, this means that you would specify the inverse of the covariance matrix, rather than the covariance matrix. This is how the Jaws example is coded in BUGS: http://www.openbugs.net/Examples/Jaws.html. The model is multivariate normal, and it is distributed with mean vector m and precision matrix W. 

The Jaws example was coded into Stan (https://github.com/stan-dev/example-models/blob/master/bugs_examples/vol2/jaws/jaws.stan), where you can specify a multivariate normal distribution in terms of a mean vector and the covariance matrix. In the Jaws example, the covariance matrix is called Sigma. 

The Stan model code (excerpt) is as follows:

model {
  beta0 ~ normal(0, 32);
  beta1 ~ normal(0, 32);
  Sigma ~ inv_wishart(4, S); //Relevant part for this post
  for (n in 1:N) 
    Y[n] ~ multi_normal(mu, Sigma); //Relevant part for this post
}

In the example above, Sigma is distributed "inverse Wishart" with scale matrix S and degrees of freedom 4. First, I believe an inverse-Wishart distribution was chosen for Sigma because it is the conjugate prior distribution for the multivariate normal covariance matrix, so that makes sense. 

However, in a general model, shouldn't S be specified as inverse(S). In the Jaws example, S is an identity matrix, so it does not matter, but for a more general scale matrix, it would matter. Am I right about this? According to BDA, if inverse(Sigma) is distributed Wishart(df, S), then Sigma is distributed Inv-Wishart(df, inverse(S)).

Perhaps I am confused. What is the intuitive definition of the scale matrix for a Wishart and inv-Wishart distribution in Stan? For the BUGS Wishart function, the scale matrix is a guess at the order of magnitude of the COVARIANCE matrix (not the precision matrix) for the variable you are modeling. Is it the case that the inv_wishart(.) sampling statement's scale matrix is a guess at the order of magnitude of the precision matrix?

I am relatively new at Bayesian estimation, so I apologize if any of this is obvious. 


Bob Carpenter

unread,
Jan 12, 2015, 5:26:36 PM1/12/15
to stan-...@googlegroups.com
We have a discussion of our preferred priors, which aren't
conjugate, in the manual chapter on regression.

Doubly cool --- I just saw the latest BDA added the LKJ
correlation density!

More inline on Wishart/inverse Wishart, but I'd really urge
you to read the above. And then read the paper by Goodrich et al.
that we cite that compares Wishart, inverse Wishart, and the LKJ
priors we recommend.

> On Jan 12, 2015, at 3:29 PM, Christopher Candelaria <chris.ca...@gmail.com> wrote:
>
> I am in the process of converting a longitudinal model of student performance from WinBUGS to Stan. I am confused about the specification of a Wishart prior in Stan. I reference the "Jaws: repeated measures analysis of variance" example, as it helps point to my source of confusion.
>
> When using BUGS, I know that you specify distributions in terms of precision and not variance. In a multivariate model, this means that you would specify the inverse of the covariance matrix, rather than the covariance matrix. This is how the Jaws example is coded in BUGS: http://www.openbugs.net/Examples/Jaws.html. The model is multivariate normal, and it is distributed with mean vector m and precision matrix W.

Correct. Stan lets you do it either way.

> The Jaws example was coded into Stan (https://github.com/stan-dev/example-models/blob/master/bugs_examples/vol2/jaws/jaws.stan), where you can specify a multivariate normal distribution in terms of a mean vector and the covariance matrix. In the Jaws example, the covariance matrix is called Sigma.
>
> The Stan model code (excerpt) is as follows:
>
> model {
> beta0 ~ normal(0, 32);
> beta1 ~ normal(0, 32);
> Sigma ~ inv_wishart(4, S); //Relevant part for this post
> for (n in 1:N)
> Y[n] ~ multi_normal(mu, Sigma); //Relevant part for this post
> }
>
> In the example above, Sigma is distributed "inverse Wishart" with scale matrix S and degrees of freedom 4.

Correct.

> First, I believe an inverse-Wishart distribution was chosen for Sigma because it is the conjugate prior distribution for the multivariate normal covariance matrix, so that makes sense.

That's why BUGS chose it --- it's all they can do. Our translation
is just attempting to follow their model.

We're going to rewrite all their models with slightly different
priors to correspond to how we think they should be coded.

> However, in a general model, shouldn't S be specified as inverse(S). In the Jaws example, S is an identity matrix, so it does not matter, but for a more general scale matrix, it would matter. Am I right about this? According to BDA, if inverse(Sigma) is distributed Wishart(df, S), then Sigma is distributed Inv-Wishart(df, inverse(S)).


> Perhaps I am confused. What is the intuitive definition of the scale matrix for a Wishart and inv-Wishart distribution in Stan?

Not just you --- these reparameterizations are frustratingly confusing
and we could use some better doc (I'll add a clarification in the next
manual release).

We use a scale (covariance) matrix in both cases in Stan. That
is, our parameter is S in the BDA notation. BDA confusingly writes

Sigma ~ Inv-Wishart(inv(S))

so it's not clear what the parmaeter is.

This is similar to the case for the normal. BDA, like lots of
texts, writes (as many sources do),

y ~ N(mu,sigma^2)

but that makes it unclear what the parameter is --- sigma^2 or sigma?
In Stan, we just write

y ~ normal(mu,sigma)

and make it clear that sigma is the parameter.


> For the BUGS Wishart function, the scale matrix is a guess at the order of magnitude of the COVARIANCE matrix (not the precision matrix) for the variable you are modeling. Is it the case that the inv_wishart(.) sampling statement's scale matrix is a guess at the order of magnitude of the precision matrix?

Look at the expectations in the appendix of BDA. Translated
to Stan notation for dimension k:

E[wishart(nu,S)] = nu * S

E[inv_wishart(nu,S)] = S / (nu - k - 1)

BDA writes inv_wishart(nu,inv(S)) for what we write just
as inv_wishart(nu,S).

You can also try to develop intuitions by thinking about the conjugacy
and the way the Wisharts generalize the gamma distribution. I'm still not
very good at it.

- Bob

Christopher Candelaria

unread,
Jan 13, 2015, 1:51:17 AM1/13/15
to stan-...@googlegroups.com
Thanks, Bob, for the prompt reply! I appreciate the clarification. 

Would you mind sending me the name of the Goodrich et al. article that you had mentioned. I tried to search for it, but I was not sure I found the right paper.

-Chris

Bob Carpenter

unread,
Jan 13, 2015, 2:10:37 AM1/13/15
to stan-...@googlegroups.com
Here's the link:

http://www.stat.columbia.edu/~gelman/research/unpublished/Visualization.pdf

I don't think it ever got published, which is a shame.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages