> On Aug 28, 2016, at 1:33 AM,
kbo...@berkeley.edu wrote:
>
> With the K-1 method, I find that the last element is privileged relative to the other ones and samples poorly.
You mean just the K-th component has worse R-hat or
lower n_eff?
> A sum-to-zero vector is a point in the K-1 dimensional plane that is orthogonal to the (1, 1, 1, ...) vector, and the "basis vectors" of the K-1 method are complete but not orthogonal in this plane.
>
> I believe that a better method would be to define an orthogonal basis that maps out this plane. For example, the basis:
> [1, -1, 0, 0, 0, ...]
> [1/2, 1/2, -1, 0, 0, ...]
> [1/3, 1/3, 1/3, -1, 0, ...]
> ...
> (with normalization of sqrt(1/n + 1)) will work. There might be a better way of generating such a basis that I am not aware of. Here is some example code to implement this transformation:
>
> parameters {
> vector[W-1] T_raw;
> }
> transformed parameters {
> vector[W] T;
>
> for (w in 1:W-1) {
> T[w] = 0.;
> for (w2 in w:W-1) {
> T[w] = T[w] + T_raw[w2] / w2 / sqrt(1/w2 + 1);
> }
> }
> T[W] = 0.;
>
> for (w in 1:W-1) {
> T[w+1] = T[w+1] - T_raw[w] / sqrt(1/w + 1);
> }
> }
> I get better sampling performance with this method compared to either the simplex or K-1 methods. Would it be possible to implement something like this into the Stan language, or have something similar in the manual?
We've thought about implementing a sum-to-0 vector type in
Stan but haven't gotten around to it. We would've gone with
the K-1 vector with the last being the negative sum of the
first. So I'm curious as to why your approach works better,
at least in your problem.
One issue we have with all these parameterizations is how
to put priors on the elements, as we don't really want to
put priors on all K elements with only K-1 degrees of freedom,
but putting them only on K-1 has that asymmetry problem.
We run into similar problems with multilevel models, where
we have K groups, but usually use only (K - 1) non-zero parameters
in order to have identifiability. Or when we have a multi-logit
regression with K categories in the output and only K-1
coefficient vectors. We've talked about this one with Andrew,
too, and he's said the same thing about asymmetry, but we've
never come up with a solution that mixes well and has symmetry
for the priors.
For the unit vectors, we actually take K underlying parameters,
but then put a prior on them for identification. It seems like
we might be able to do something similar here.
- Bob