Il 16/12/2013 19:53, Shravan Vasishth ha scritto:
> Thanks Chris, I'll try to figure out what's in that article. Andrew, I
> did look at the appendix in BDA. I think that a little bit more
> explanation would help people like me who don't know much.
Well, let me try. "Someone" (!) will surely correct my mistakes.
Defining priors for variances looks easy (it's not that easy, but let's
go on).
Defining priors for 2x2 covariance matrices is easy too (see above): two
variances and one correlation, just a number in [-1,1].
But defining priors for KxK, K>2, covariance matrices is difficult
because of constraints among the correlation parameters. So you have to
define a prior for a large (K>2) covariance matrix "en bloc".
One often use the inverse-Wishart distribution, but its parameters are
difficult to interpret and a single parameter controls the precision of
all elements of the covariance matrix [1].
BTW: reading
http://andrewgelman.com/2012/08/29/more-on-scaled-inverse-wishart-and-prior-independence/
is mandatory!
This is why Barnard, McCulloch and Meng [2] recommend a separation
strategy, i.e.:
Sigma = D R D
where D is a diagonal matrix of std devs and R a correlation matrix.
This is pretty useful if you are willing to express prior beliefs about
the standard deviations, but less willing about the correlation matrix
(as in your models, I'd say).
But modeling a correlation matrix is a tough task. So, O'Malley and
Zaslavsky have suggested a scaled inverse-Wishart:
Sigma = diag(xi) Q diag(xi), Q ~ Inv-Wishart(...)
For example, you could set degrees-of-freedom = K+1 and get marginal
uniform distribution for the correlations, then "correct" the
constrained variances by a vector xi of scale parameters.
See Gelman & Hill, pp. 286-287.
A few years ago Lewandowski, Kurowicka and Joe [3] have found efficient
algorithms to generate random correlation matrices whose distribution
depends on a single "eta" parameter; if eta = 1 then their distribution
is jointly uniform.
So the original separation strategy is now feasible and very efficient
in Stan.
You could notice that the joint uniformity results in marginal priors
for individual correlations which are not uniform and the more favors
values close to zero over values close to +/-1 the more K is large, but
it could make sense because the positive definite constraint is more
restrictive as the correlations move away from zero [2].
HTH
Sergio
--------------------------
[1] A. James O�Malley & Alan M. Zaslavsky, "Domain-Level Covariance
Analysis for Multilevel Survey Data With Structured Nonresponse",
Journal of the American Statistical Association, 103:484, 1405-1418,
http://dx.doi.org/10.1198/016214508000000724
[2] John Barnard, Robert McCulloch and Xiao-Li Meng, "Modeling
Covariance Matrices in Terms of Standars Deviations and Correlations,
with Application to Shrinkage", Statistica Sinica 10(2000), 1281-1311,
http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf
[3] You know:
http://www.sciencedirect.com/science/article/pii/S0047259X09000876