LKJ prior interpretation...

Krzysztof Sakrejda

unread,

Dec 4, 2015, 4:14:24 PM12/4/15

to Stan users mailing list

Hi, just looking for some clarification. This bit from the Stan manual (2.8.0) is not making sense to me:

Our final recommendation is to give the correlation matrix Ω an LKJ prior with shape ν ≥ 1,

Ω ∼ LKJcorr(ν).

The LKJ correlation distribution is defined in Section 50.1, but the basic idea for mod- eling is that as ν increases, the prior increasingly concentrates around the unit corre- lation matrix (i.e., favors less correlation among the components of βj ). At ν = 1, the LKJ correlation distribution reduces to the identity distribution over correlation ma- trices. The LKJ prior may thus be used to control the expected amount of correlation among the parameters βj .

I think this means that v=1 is all kinds of correlation matrices possible (some sort of uninformative) and large v is suggesting mostly uncorrelated variables.... is there a way of suggesting that all the variables will be highly correlated? That seems to be something allowed for but not encouraged by v=1... sorry about the loose language.

The background is that I have a model that I want to make into a discrete mixture, but I don't want to figure out exactly how many mixture components there are so I was thinking of adding extra and constraining them using correlation. I think the horseshoe priors paper does this sort of thing so I have a lead on sorting this out but I'd appreciate any pointers.

Krzysztof

Ben Goodrich

unread,

Dec 4, 2015, 4:53:37 PM12/4/15

to Stan users mailing list

On Friday, December 4, 2015 at 4:14:24 PM UTC-5, Krzysztof Sakrejda wrote:

I think this means that v=1 is all kinds of correlation matrices possible (some sort of uninformative) and large v is suggesting mostly uncorrelated variables.... is there a way of suggesting that all the variables will be highly correlated? That seems to be something allowed for but not encouraged by v=1... sorry about the loose language.

For values of the shape parameter between 0 and 1 exclusive, the density is infinite when the correlation matrix is singular because

f(Omega | nu) \propto |Omega|^(nu - 1)

I think what you want is a prior that favors highly correlated variables but still precludes the possibility of them being singular, which might make sense but is not LKJ. Loosely speaking, this would entail declaring a cholesky_factor_corr[K] and putting priors on the diagonal elements that were concentrated near zero. Something like a beta prior with 1 < shape1 < shape2.

But I didn't understand what you said about the motivation. It seems that you would want a simplex vector for the mixture weights with concentration parameter(s) less than 1 to encourage the excess ones to be near zero.

Ben

Krzysztof Sakrejda

unread,

Dec 4, 2015, 6:45:25 PM12/4/15

to Stan users mailing list

On Friday, December 4, 2015 at 4:53:37 PM UTC-5, Ben Goodrich wrote:

On Friday, December 4, 2015 at 4:14:24 PM UTC-5, Krzysztof Sakrejda wrote:
I think this means that v=1 is all kinds of correlation matrices possible (some sort of uninformative) and large v is suggesting mostly uncorrelated variables.... is there a way of suggesting that all the variables will be highly correlated? That seems to be something allowed for but not encouraged by v=1... sorry about the loose language.

For values of the shape parameter between 0 and 1 exclusive, the density is infinite when the correlation matrix is singular because

f(Omega | nu) \propto |Omega|^(nu - 1)

I think what you want is a prior that favors highly correlated variables but still precludes the possibility of them being singular, which might make sense but is not LKJ. Loosely speaking, this would entail declaring a cholesky_factor_corr[K] and putting priors on the diagonal elements that were concentrated near zero. Something like a beta prior with 1 < shape1 < shape2.

OK, that helps. Thanks! This is mind-boggling but I guess I can just do some simulation to check if it's what I want and if it behaves.

But I didn't understand what you said about the motivation. It seems that you would want a simplex vector for the mixture weights with concentration parameter(s) less than 1 to encourage the excess ones to be near zero.

I don't quite think that that's what I want although I could see how it would work. I have a model:

y ~ Weibull(alpha, beta)

But it doesn't represent abundance of very small values or medium-large values of y very accurately. Turns out my ultimate results are very CDF for small values of y... so I was thinking of doing a mixture:

p(y|...) = q_1*Weibull(alpha_1, beta_1) + q_2*Weibull(alpha_2, beta_2) + ... + q_k*Weibull(alpha_k, beta_k)

Where q_1, ..., q_k are mixture weights and alpha/beta are Weibull parameters (matrices with two dimensions, indexed by two different factors), so alpha is ultimately indexed as alpha_{k,j,d}. I was thinking of using a correlation matrix to constrain each set of alpha_{.,j,d} to be similar to each other (I don't need all those parameters, just some flexibility around the Weibull shape. Your response makes me think there's an easier way (maybe just going to a random effect with one estimated variance (?) or something like that.

Krzysztof

Ben

Reply all

Reply to author

Forward