The means and medians are OK (should be 0, I believe), but the 90% intervals are not the same for
the six correlation entries. My understanding is that these should all be identical, as each is just
a pairwise correlation.
But then I realized that there's no reason to suppose that just because L is drawn uniformly
from the Cholesky factors for correlation matrices that (L * L') will thus be uniform over
correlation matrices --- it's missing the product-transpose Jacobian!
But then I tried looking at what we get just from correlation matrices, using an even more trivial
model:
parameters {
corr_matrix[4] Omega;
}
model {
}
and got:
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
Omega[2,1] -5.3e-03 3.2e-03 4.5e-01 -0.74 -7.1e-03 0.74 19718 5851 1.0e+00
Omega[3,1] 3.0e-03 3.2e-03 4.5e-01 -0.73 1.4e-03 0.74 19948 5919 1.0e+00
Omega[3,2] 3.6e-04 2.9e-03 4.1e-01 -0.67 2.7e-03 0.67 19724 5853 1.0e+00
Omega[4,1] 1.2e-03 3.2e-03 4.5e-01 -0.73 2.8e-03 0.73 19148 5682 1.0e+00
Omega[4,2] -8.1e-04 2.9e-03 4.1e-01 -0.67 -3.9e-03 0.67 20000 5935 1.0e+00
Omega[4,3] 7.4e-04 3.2e-03 4.5e-01 -0.73 4.0e-03 0.73 19601 5816 1.0e+00
There still seems to be some bias in the 90% intervals --- again, this .67 vs. .73 thing
repeats under all conditions I can find (and is reflected in the StdDev values varying, too).
2 * b - 1
The second question is whether the right thing to do is write:
cholesky_factor_corr[K] L;
L ~ lkj_corr_cholesky(nu);
Will that produce the lkj_corr distribution on (L * L')?
|J| * det(L)^(2 * eta - 2)But then I tried looking at what we get just from correlation matrices, using an even more trivial
model:
parameters {
corr_matrix[4] Omega;
}
model {
}
and got:
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
Omega[2,1] -5.3e-03 3.2e-03 4.5e-01 -0.74 -7.1e-03 0.74 19718 5851 1.0e+00
Omega[3,1] 3.0e-03 3.2e-03 4.5e-01 -0.73 1.4e-03 0.74 19948 5919 1.0e+00
Omega[3,2] 3.6e-04 2.9e-03 4.1e-01 -0.67 2.7e-03 0.67 19724 5853 1.0e+00
Omega[4,1] 1.2e-03 3.2e-03 4.5e-01 -0.73 2.8e-03 0.73 19148 5682 1.0e+00
Omega[4,2] -8.1e-04 2.9e-03 4.1e-01 -0.67 -3.9e-03 0.67 20000 5935 1.0e+00
Omega[4,3] 7.4e-04 3.2e-03 4.5e-01 -0.73 4.0e-03 0.73 19601 5816 1.0e+00
There still seems to be some bias in the 90% intervals --- again, this .67 vs. .73 thing
repeats under all conditions I can find (and is reflected in the StdDev values varying, too).
So my question is whether this is OK and I'm just testing or conceptualizing improperly.
transformed data { int K; real eta; K <- 4; eta <- 1;}parameters { corr_matrix[K] Omega;}model {
}generated quantities { corr_matrix[K] Sigma; Sigma <- lkj_corr_rng(K,eta);}> print(bug, digits = 2)Inference for Stan model: LKJ2.4 chains, each with iter=20000; warmup=10000; thin=1; post-warmup draws per chain=10000, total post-warmup draws=40000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff RhatOmega[1,1] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 40000 NaNOmega[1,2] 0.00 0.00 0.45 -0.81 -0.35 -0.01 0.34 0.81 32058 1Omega[1,3] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.81 33093 1Omega[1,4] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.82 30763 1Omega[2,1] 0.00 0.00 0.45 -0.81 -0.35 -0.01 0.34 0.81 32058 1Omega[2,2] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 35833 1Omega[2,3] 0.00 0.00 0.41 -0.76 -0.31 0.00 0.31 0.76 28157 1Omega[2,4] 0.00 0.00 0.41 -0.76 -0.31 0.00 0.32 0.76 28589 1Omega[3,1] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.81 33093 1Omega[3,2] 0.00 0.00 0.41 -0.76 -0.31 0.00 0.31 0.76 28157 1Omega[3,3] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1838 1Omega[3,4] 0.01 0.00 0.45 -0.82 -0.34 0.01 0.35 0.81 28662 1Omega[4,1] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.82 30763 1Omega[4,2] 0.00 0.00 0.41 -0.76 -0.31 0.00 0.32 0.76 28589 1Omega[4,3] 0.01 0.00 0.45 -0.82 -0.34 0.01 0.35 0.81 28662 1Omega[4,4] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 324 1Sigma[1,1] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 40000 NaNSigma[1,2] 0.00 0.00 0.45 -0.81 -0.34 0.00 0.35 0.81 40000 1Sigma[1,3] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.34 0.81 40000 1Sigma[1,4] 0.01 0.00 0.45 -0.81 -0.34 0.01 0.35 0.81 39744 1Sigma[2,1] 0.00 0.00 0.45 -0.81 -0.34 0.00 0.35 0.81 40000 1Sigma[2,2] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 40000 1Sigma[2,3] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.81 39279 1Sigma[2,4] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.81 40000 1Sigma[3,1] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.34 0.81 40000 1Sigma[3,2] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.81 39279 1Sigma[3,3] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 304 1Sigma[3,4] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.82 40000 1Sigma[4,1] 0.01 0.00 0.45 -0.81 -0.34 0.01 0.35 0.81 39744 1Sigma[4,2] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.81 40000 1Sigma[4,3] 0.00 0.00 0.45 -0.81 -0.35 0.00 0.35 0.82 40000 1Sigma[4,4] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 2431 1lp__ -3.45 0.02 1.94 -8.13 -4.51 -3.10 -2.02 -0.72 13730 1Omega[1,2:4] # correct
Omega[2,3:4] # incorrect
Omega[3,4] # correctSorry --- didn't see that later message, but that's exactly what
I'm seeing in terms of the behavior.
I'll add a test for the Jacobian. The way I did it for the
correlation matrix Cholesky factor was to use auto-diff to
compute the Jacobian, then Eigen to do the determinant, which
I could compare to the transform. I'll write such a test
for the correlation matrix and see what happens.
I think what Ben's been telling me all along has finally
sunk in.
Ideally, we want to parameterize a multivariate normal of
dimension K as follows [CPCs based parameterization]:
This would be equivalent to the longer and far less efficient form
we currently recommend declaring Omega to be a corr_matrix[K] and taking:
The intermediate form is what we'll get form the Cholesky factor for correlation
matrix:
parameters {
cholesky_factor_corr[K] L;
vector<lower=0>[K] sigma;
vector[K] mu;
}
model {
y ~ multi_normal_cholesky(mu, diag_matrix(sigma) * L);
L ~ lkj_corr_cholesky(eta);
sigma ~ ... whatevs ...
}
Wouldn't it be even better if we used precision so we wouldn't have to invert? Is
there a way to relate that all back to the cpcs so we can use reasonable priors?
The second question is whether the right thing to do is write:
cholesky_factor_corr[K] L;
L ~ lkj_corr_cholesky(nu);
Will that produce the lkj_corr distribution on (L * L')?
That is probably the right behavior, but I need to rewrite the lkj_corr_cholesky_log function to achieve it. The density of L can be expressed as|J| * det(L)^(2 * eta - 2)
and that J is triangular with diagonal elements that only involve powers of the diagonal of L.
> I pushed that into the branch. I think the two questions left to answer are
> • Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?
I would say no to this. I don't think we want to give programming
advice in the form of warning messages. It's a thin line, though.
> • Do we want to introduce the non-square cholesky_factor_corr[R,C] with R >= C type now?
We could. I asked before and you didn't think we'd need it, which is why I didn't bother.
I'm OK with updating it --- the pattern's already there for covariance matrix cholesky
factors.
> > I pushed that into the branch. I think the two questions left to answer are
> > • Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?
>
> I would say no to this. I don't think we want to give programming
> advice in the form of warning messages. It's a thin line, though.
>
> The difficulty is that we gave bad programming advice previously via the manual / mailing list when declaring a corr_matrix[K] in parameters was the only way to do it. I think we are going to run into the viral BUGS example problem where there are bad Stan examples floating around on the internet, which people find and then use, which perpetuates the problem.
That's a good point. Luckily we're not quite as entrenched in
printed books as BUGS is.
> Do we want to just throw an error? I don't mind breaking old .stan files as long as the error message clearly says what to do to fix it.
I'm confused --- it's not an error.