Cholesky factor of correlation matrix data type

302 views
Skip to first unread message

Bob Carpenter

unread,
May 3, 2014, 6:22:03 AM5/3/14
to stan...@googlegroups.com
I'm hoping Ben (or someone else) can help sort me out on this.

I added a data type for a Cholesky factor of a correlation matrix.
It's all pushed to this branch:

feature/issue-413-cholesky-corr-type

It all seems to work. For unit tests, I checked that autodiff on
the transform followed by a direct determinant calc gives the same
answer as the transform Jacobian (it does) and that the values round
trip back and forth properly under the constrain/free transforms (they do).

The problem I'm left with is conceptual, and it's also left me worried
about correlation matrices.

Consider this trivial model:

parameters {
cholesky_factor_corr[4] L;
}
model {
}
generated quantities {
matrix[4,4] Omega;
Omega <- L * L';
}

The fit is consistently this (under different seeds, longer warmup, etc.):

Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
Omega[2,1] 3.0e-04 4.1e-03 5.8e-01 -0.90 1.1e-03 0.90 19478 7569 1.0e+00
Omega[3,1] 8.6e-04 3.5e-03 5.0e-01 -0.81 2.6e-03 0.80 19754 7676 1.0e+00
Omega[3,2] -1.1e-04 3.6e-03 5.0e-01 -0.81 -2.0e-04 0.80 19731 7667 1.0e+00
Omega[4,1] -1.1e-03 3.2e-03 4.5e-01 -0.73 -4.0e-03 0.73 20000 7772 1.0e+00
Omega[4,2] -3.8e-03 3.2e-03 4.5e-01 -0.74 -8.1e-03 0.73 20000 7772 1.0e+00
Omega[4,3] -8.5e-04 3.2e-03 4.5e-01 -0.73 -1.1e-03 0.73 20000 7772 1.0e+00

The means and medians are OK (should be 0, I believe), but the 90% intervals are not the same for
the six correlation entries. My understanding is that these should all be identical, as each is just
a pairwise correlation.

But then I realized that there's no reason to suppose that just because L is drawn uniformly
from the Cholesky factors for correlation matrices that (L * L') will thus be uniform over
correlation matrices --- it's missing the product-transpose Jacobian!

But then I tried looking at what we get just from correlation matrices, using an even more trivial
model:

parameters {
corr_matrix[4] Omega;
}
model {
}

and got:

Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
Omega[2,1] -5.3e-03 3.2e-03 4.5e-01 -0.74 -7.1e-03 0.74 19718 5851 1.0e+00
Omega[3,1] 3.0e-03 3.2e-03 4.5e-01 -0.73 1.4e-03 0.74 19948 5919 1.0e+00
Omega[3,2] 3.6e-04 2.9e-03 4.1e-01 -0.67 2.7e-03 0.67 19724 5853 1.0e+00
Omega[4,1] 1.2e-03 3.2e-03 4.5e-01 -0.73 2.8e-03 0.73 19148 5682 1.0e+00
Omega[4,2] -8.1e-04 2.9e-03 4.1e-01 -0.67 -3.9e-03 0.67 20000 5935 1.0e+00
Omega[4,3] 7.4e-04 3.2e-03 4.5e-01 -0.73 4.0e-03 0.73 19601 5816 1.0e+00

There still seems to be some bias in the 90% intervals --- again, this .67 vs. .73 thing
repeats under all conditions I can find (and is reflected in the StdDev values varying, too).

So my question is whether this is OK and I'm just testing or conceptualizing improperly.

The second question is whether the right thing to do is write:

cholesky_factor_corr[K] L;
L ~ lkj_corr_cholesky(nu);

Will that produce the lkj_corr distribution on (L * L')?

- Bob



Michael Betancourt

unread,
May 3, 2014, 7:19:41 AM5/3/14
to stan...@googlegroups.com
Ben will have to give the definition answer, but I think you have
a misinterpretation here.

In the LKJ construction the off-diagonal elements of the Cholesky
matrix are partial (i.e. conditional) correlations, not the pairwise
covariances/correlations of the resulting covariance/correlation matrix.
Think about it this way: the LKJ prior is just the determinant of
the covariance matrix raised to some power, but the determinant
is given entirely by the diagonal elements of the Cholesky factor.
The off diagonal elements are completely unconstrained by the
prior!

I’m not sure about the correlation matrix, but I’m also not sure that
it’s wrong. The 0.67 values are the off-diagonal elements of the
second row, which is the central row of the lower triangle (4x4 matrix,
4 diagonal elements, 6 elements forming a triangle below the diagonal
and 6 elements forming a triangle above the diagonal). This seems
structured enough to be consistent with an implicit prior given by the
squared Jacobian alone.
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
May 3, 2014, 12:56:47 PM5/3/14
to stan...@googlegroups.com
On Saturday, May 3, 2014 6:22:03 AM UTC-4, Bob Carpenter wrote:
The means and medians are OK (should be 0, I believe), but the 90% intervals are not the same for
the six correlation entries.  My understanding is that these should all be identical, as each is just
a pairwise correlation.

The correlations only have the same marginal distribution under the LKJ(1) prior on the correlation matrix, which isn't achieved by a uniform prior on L.
 
But then I realized that there's no reason to suppose that just because L is drawn uniformly
from the Cholesky factors for correlation matrices that (L * L') will thus be uniform over
correlation matrices --- it's missing the product-transpose Jacobian!

Right. This is going to be a challenge to get users to recognize that they must put a non-uniform prior on L in order to imply a uniform (or even non-uniform LKJ) prior on L * L' when L * L' is typically not even constructed in their model. 
 
But then I tried looking at what we get just from correlation matrices, using an even more trivial
model:

  parameters {
    corr_matrix[4] Omega;
  }
  model {
  }

and got:

                      Mean     MCSE   StdDev     5%       50%    95%  N_Eff  N_Eff/s    R_hat
  Omega[2,1]      -5.3e-03  3.2e-03  4.5e-01  -0.74  -7.1e-03   0.74  19718     5851  1.0e+00
  Omega[3,1]       3.0e-03  3.2e-03  4.5e-01  -0.73   1.4e-03   0.74  19948     5919  1.0e+00
  Omega[3,2]       3.6e-04  2.9e-03  4.1e-01  -0.67   2.7e-03   0.67  19724     5853  1.0e+00
  Omega[4,1]       1.2e-03  3.2e-03  4.5e-01  -0.73   2.8e-03   0.73  19148     5682  1.0e+00
  Omega[4,2]      -8.1e-04  2.9e-03  4.1e-01  -0.67  -3.9e-03   0.67  20000     5935  1.0e+00
  Omega[4,3]       7.4e-04  3.2e-03  4.5e-01  -0.73   4.0e-03   0.73  19601     5816  1.0e+00

There still seems to be some bias in the 90% intervals --- again, this .67 vs. .73 thing
repeats under all conditions I can find (and is reflected in the StdDev values varying, too).

I'm not sure what is going on there. It should be .73. More generally when eta = 1, the marginal distribution of each correlation should be that of

2 * b - 1

where b is distributed beta with both shape parameters equal to K / 2.
 
The second question is whether the right thing to do is write:

   cholesky_factor_corr[K] L;
   L ~ lkj_corr_cholesky(nu);

Will that produce the lkj_corr distribution on (L * L')?

That is probably the right behavior, but I need to rewrite the lkj_corr_cholesky_log function to achieve it. The density of L can be expressed as

|J| * det(L)^(2 * eta - 2)

and that J is triangular with diagonal elements that only involve powers of the diagonal of L.

Ben

Ben Goodrich

unread,
May 3, 2014, 2:35:37 PM5/3/14
to stan...@googlegroups.com
On Saturday, May 3, 2014 6:22:03 AM UTC-4, Bob Carpenter wrote:
But then I tried looking at what we get just from correlation matrices, using an even more trivial
model:

  parameters {
    corr_matrix[4] Omega;
  }
  model {
  }

and got:

                      Mean     MCSE   StdDev     5%       50%    95%  N_Eff  N_Eff/s    R_hat
  Omega[2,1]      -5.3e-03  3.2e-03  4.5e-01  -0.74  -7.1e-03   0.74  19718     5851  1.0e+00
  Omega[3,1]       3.0e-03  3.2e-03  4.5e-01  -0.73   1.4e-03   0.74  19948     5919  1.0e+00
  Omega[3,2]       3.6e-04  2.9e-03  4.1e-01  -0.67   2.7e-03   0.67  19724     5853  1.0e+00
  Omega[4,1]       1.2e-03  3.2e-03  4.5e-01  -0.73   2.8e-03   0.73  19148     5682  1.0e+00
  Omega[4,2]      -8.1e-04  2.9e-03  4.1e-01  -0.67  -3.9e-03   0.67  20000     5935  1.0e+00
  Omega[4,3]       7.4e-04  3.2e-03  4.5e-01  -0.73   4.0e-03   0.73  19601     5816  1.0e+00

There still seems to be some bias in the 90% intervals --- again, this .67 vs. .73 thing
repeats under all conditions I can find (and is reflected in the StdDev values varying, too).

So my question is whether this is OK and I'm just testing or conceptualizing improperly.

This looks like a potentially serious bug somewhere. If you compare implicit vs. explicit drawing from an LKJ(1) distribution with

transformed data {
  int K;
  real eta;
  K <- 4;
  eta <- 1;
}
parameters {
  corr_matrix[K] Omega;
}
model {

}
generated quantities {
  corr_matrix[K] Sigma;
  Sigma <- lkj_corr_rng(K,eta);
}

it appears that the Sigma in generated quantities is correct, while the some of the margins in the Omega parameter are off

> print(bug, digits = 2)
Inference for Stan model: LKJ2.
4 chains, each with iter=20000; warmup=10000; thin=1; 
post-warmup draws per chain=10000, total post-warmup draws=40000.

            mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
Omega[1,1]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00 40000  NaN
Omega[1,2]  0.00    0.00 0.45 -0.81 -0.35 -0.01  0.34  0.81 32058    1
Omega[1,3]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.81 33093    1
Omega[1,4]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.82 30763    1
Omega[2,1]  0.00    0.00 0.45 -0.81 -0.35 -0.01  0.34  0.81 32058    1
Omega[2,2]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00 35833    1
Omega[2,3]  0.00    0.00 0.41 -0.76 -0.31  0.00  0.31  0.76 28157    1
Omega[2,4]  0.00    0.00 0.41 -0.76 -0.31  0.00  0.32  0.76 28589    1
Omega[3,1]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.81 33093    1
Omega[3,2]  0.00    0.00 0.41 -0.76 -0.31  0.00  0.31  0.76 28157    1
Omega[3,3]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00  1838    1
Omega[3,4]  0.01    0.00 0.45 -0.82 -0.34  0.01  0.35  0.81 28662    1
Omega[4,1]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.82 30763    1
Omega[4,2]  0.00    0.00 0.41 -0.76 -0.31  0.00  0.32  0.76 28589    1
Omega[4,3]  0.01    0.00 0.45 -0.82 -0.34  0.01  0.35  0.81 28662    1
Omega[4,4]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00   324    1
Sigma[1,1]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00 40000  NaN
Sigma[1,2]  0.00    0.00 0.45 -0.81 -0.34  0.00  0.35  0.81 40000    1
Sigma[1,3]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.34  0.81 40000    1
Sigma[1,4]  0.01    0.00 0.45 -0.81 -0.34  0.01  0.35  0.81 39744    1
Sigma[2,1]  0.00    0.00 0.45 -0.81 -0.34  0.00  0.35  0.81 40000    1
Sigma[2,2]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00 40000    1
Sigma[2,3]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.81 39279    1
Sigma[2,4]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.81 40000    1
Sigma[3,1]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.34  0.81 40000    1
Sigma[3,2]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.81 39279    1
Sigma[3,3]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00   304    1
Sigma[3,4]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.82 40000    1
Sigma[4,1]  0.01    0.00 0.45 -0.81 -0.34  0.01  0.35  0.81 39744    1
Sigma[4,2]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.81 40000    1
Sigma[4,3]  0.00    0.00 0.45 -0.81 -0.35  0.00  0.35  0.82 40000    1
Sigma[4,4]  1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00  2431    1
lp__       -3.45    0.02 1.94 -8.13 -4.51 -3.10 -2.02 -0.72 13730    1

So, that would suggest that we have a derivative wrong somewhere. The test_grad checks out, so at least it is consistently wrong. What is odd is that we have the pattern

Omega[1,2:4] # correct
Omega[2,3:4] # incorrect
Omega[3,4]   # correct

I'll look again at the logic of the Jacobians.

Ben
 

Bob Carpenter

unread,
May 3, 2014, 2:59:29 PM5/3/14
to stan...@googlegroups.com
@Michael: I was reporting Omega = L * L', not just L. The only constraint
on L is that it's lower-triangular, the rows have unit length, and the
diagonals are positive.

@Ben Thanks --- the marginal distribution of the correlation matrix entries
is just what I was after to test. I'll also add that to the doc.

I figured that extra Jacobian term would be needed. Should we put a rewrite
in this same branch --- I had planned on exposing it again in this branch, so
it makes sense to me.

I'm also not sure what's going on with cholesky_corr, but it consistenty
has problems for 4 x 4 matrices. Larger matrices aren't perfect, but they
seem closer to all being the same. The 4 x 4, no matter what seed/id I use,
winds up with a too-narrow Omega[4,2] 90% interval. It doesn't help cranking
adapt delta=0.999 (and it gets there, increasing treedepth by 2 and cranking
step size down to about 1/4 the original).

- Bob

Bob Carpenter

unread,
May 3, 2014, 3:02:36 PM5/3/14
to stan...@googlegroups.com
Sorry --- didn't see that later message, but that's exactly what
I'm seeing in terms of the behavior.

I'll add a test for the Jacobian. The way I did it for the
correlation matrix Cholesky factor was to use auto-diff to
compute the Jacobian, then Eigen to do the determinant, which
I could compare to the transform. I'll write such a test
for the correlation matrix and see what happens.

- Bob

Bob Carpenter

unread,
May 3, 2014, 3:15:02 PM5/3/14
to stan...@googlegroups.com
I think what Ben's been telling me all along has finally
sunk in.

Ideally, we want to parameterize a multivariate normal of
dimension K as follows:

data {
..
vector[K] y[N];
}

parameters {
vector<lower=-1,upper=1>[(K * (K - 1)) / 2] cpcs;
vector<lower=0>[K] sigma;
vector[K] mu;
}
model {
// likelihood
y ~ multi_normal_cpcs_scale(mu,cpcs,sigma);

// prior
cpcs ~ cpcs_lkj_corr(eta);
sigma ~ ... whatever Andrew likes for scales ...
}
generated quantities {
corr_matrix[K] Omega;
Omega <- cpcs_scales_to_cov_matrix(cpcs,sigma);
}

This would be equivalent to the longer and far less efficient form
we currently recommend declaring Omega to be a corr_matrix[K] and taking:

y ~ multi_normal(mu, diag_matrix(sigma) * Omega * diag_matrix(sigma));
Omega ~ lkj_corr(eta);
sigma ~ ... whatevs ...

The intermediate form is what we'll get form the Cholesky factor for correlation
matrix:

parameters {
cholesky_factor_corr[K] L;
vector<lower=0>[K] sigma;
vector[K] mu;
}
model {
y ~ multi_normal_cholesky(mu, diag_matrix(sigma) * L);

L ~ lkj_corr_cholesky(eta);
sigma ~ ... whatevs ...
}

We have more efficient ops than the way I wrote things with diag_matrix * Omega (* diag_matrix).
I just wrote it that way so it'd be clear to everyone what's going on. This isn't as
good as the raw cpcs version, but is much better than using the whole correlation matrix
because it's much easier to calculate determinants and do the solving with the Cholesky
factor.

Wouldn't it be even better if we used precision so we wouldn't have to invert? Is
there a way to relate that all back to the cpcs so we can use reasonable priors?

- Bob


Ben Goodrich

unread,
May 3, 2014, 5:04:47 PM5/3/14
to stan...@googlegroups.com
On Saturday, May 3, 2014 3:02:36 PM UTC-4, Bob Carpenter wrote:
Sorry --- didn't see that later message, but that's exactly what
I'm seeing in terms of the behavior.

I'll add a test for the Jacobian.  The way I did it for the
correlation matrix Cholesky factor was to use auto-diff to
compute the Jacobian, then Eigen to do the determinant, which
I could compare to the transform.  I'll write such a test
for the correlation matrix and see what happens.

I have a fix to the logic in the Jacobian that I will push as soon as the unit test finish running.

We have unit tests for the _rng functions that verify that the (margins of the) random variable are correct via a Pearson test. What we also need to verify for all declarable parameter types that have a proper uniform distribution that they behave the same way as the corresponding _rng function.

Ben

Ben Goodrich

unread,
May 3, 2014, 5:21:18 PM5/3/14
to stan...@googlegroups.com
On Saturday, May 3, 2014 3:15:02 PM UTC-4, Bob Carpenter wrote:
I think what Ben's been telling me all along has finally
sunk in.

Ideally, we want to parameterize a multivariate normal of
dimension K as follows [CPCs based parameterization]:

I'm not sure that parameterization is totally ideal and it applies to the whole elliptical distribution family, but that is basically the concept: Make the parameters in the textbook representation of the model into transformed parameters in the .stan representation of the model and put computationally efficient priors on the primitives that would imply the transformed parameters have the priors you want.
 
This would be equivalent to the longer and far less efficient form
we currently recommend declaring Omega to be a corr_matrix[K] and taking:

Yes

The intermediate form is what we'll get form the Cholesky factor for correlation
matrix:

 parameters {
   cholesky_factor_corr[K] L;
   vector<lower=0>[K] sigma;
   vector[K] mu;
 }
 model {
   y ~ multi_normal_cholesky(mu, diag_matrix(sigma) * L);

   L ~ lkj_corr_cholesky(eta);
   sigma ~ ... whatevs ...
 }

This is perhaps more ideal than the CPC approach because having a cholesky_factor_corr type could be useful in other situations besides just the multi_normal.

Wouldn't it be even better if we used precision so we wouldn't have to invert?  Is
there a way to relate that all back to the cpcs so we can use reasonable priors?

Maybe. I know how to construct a precision matrix such that its inverse is a correlation matrix, but I haven't figured out the distribution yet. The Jacobian is harder than it is for Wishart vs. Inverse Wishart because the diagonal of a correlation matrix is fixed.

Ben
 

Bob Carpenter

unread,
May 4, 2014, 6:24:55 AM5/4/14
to stan...@googlegroups.com
I hadn't thought of doing that, but it makes sense as an
additional test.

I think we should test the actual Jacobian calcs using autodiff
everywhere. It'll also work for improper types (covar matrix,
Cholesky factor of covar matrix, scalars with a lower or upper
bound but not both, and the two ordered vector types.

- Bob

Ben Goodrich

unread,
May 4, 2014, 10:12:48 PM5/4/14
to stan...@googlegroups.com
On Saturday, May 3, 2014 12:56:47 PM UTC-4, Ben Goodrich wrote: 
The second question is whether the right thing to do is write:

   cholesky_factor_corr[K] L;
   L ~ lkj_corr_cholesky(nu);

Will that produce the lkj_corr distribution on (L * L')?

That is probably the right behavior, but I need to rewrite the lkj_corr_cholesky_log function to achieve it. The density of L can be expressed as

|J| * det(L)^(2 * eta - 2)

and that J is triangular with diagonal elements that only involve powers of the diagonal of L.

I pushed that into the branch. I think the two questions left to answer are
  1. Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?
  2. Do we want to introduce the non-square cholesky_factor_corr[R,C] with R >= C type now? The main use I could see for that is the matrix of coefficients in models with continuous latent predictors. In general, you can multiply the coefficients by any square matrix and multiply the predictors by the inverse of that matrix without changing the likelihood. Some people use a cholesky_factor_cov[R,C] for the coefficients, but you can still multiply the coefficients by a diagonal matrix with positive diagonal elements and multiply the predictors by the inverse of that matrix without changing the likelihood. But if the rows were normalized to have unit length as they would be with a cholesky_factor_corr[R,C] type, then anything you multiply by except the identity matrix would violate the constraint.
Ben

Bob Carpenter

unread,
May 5, 2014, 1:39:03 PM5/5/14
to stan...@googlegroups.com

On May 5, 2014, at 4:12 AM, Ben Goodrich <goodri...@gmail.com> wrote:

> On Saturday, May 3, 2014 12:56:47 PM UTC-4, Ben Goodrich wrote:
> The second question is whether the right thing to do is write:
>
> cholesky_factor_corr[K] L;
> L ~ lkj_corr_cholesky(nu);
>
> Will that produce the lkj_corr distribution on (L * L')?
>
> That is probably the right behavior, but I need to rewrite the lkj_corr_cholesky_log function to achieve it. The density of L can be expressed as
>
> |J| * det(L)^(2 * eta - 2)
>
> and that J is triangular with diagonal elements that only involve powers of the diagonal of L.
>
> I pushed that into the branch. I think the two questions left to answer are
> • Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?

I would say no to this. I don't think we want to give programming
advice in the form of warning messages. It's a thin line, though.

> • Do we want to introduce the non-square cholesky_factor_corr[R,C] with R >= C type now?

We could. I asked before and you didn't think we'd need it, which is why I didn't bother.
I'm OK with updating it --- the pattern's already there for covariance matrix cholesky
factors.

> The main use I could see for that is the matrix of coefficients in models with continuous latent predictors. In general, you can multiply the coefficients by any square matrix and multiply the predictors by the inverse of that matrix without changing the likelihood. Some people use a cholesky_factor_cov[R,C] for the coefficients, but you can still multiply the coefficients by a diagonal matrix with positive diagonal elements and multiply the predictors by the inverse of that matrix without changing the likelihood. But if the rows were normalized to have unit length as they would be with a cholesky_factor_corr[R,C] type, then anything you multiply by except the identity matrix would violate the constraint.

Nifty.

- Bob

Ben Goodrich

unread,
May 5, 2014, 3:29:23 PM5/5/14
to stan...@googlegroups.com
On Monday, May 5, 2014 1:39:03 PM UTC-4, Bob Carpenter wrote:
> I pushed that into the branch. I think the two questions left to answer are
>         • Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?

I would say no to this.  I don't think we want to give programming
advice in the form of warning messages.  It's a thin line, though.

The difficulty is that we gave bad programming advice previously via the manual / mailing list when declaring a corr_matrix[K] in parameters was the only way to do it. I think we are going to run into the viral BUGS example problem where there are bad Stan examples floating around on the internet, which people find and then use, which perpetuates the problem. Do we want to just throw an error? I don't mind breaking old .stan files as long as the error message clearly says what to do to fix it.
 

>         • Do we want to introduce the non-square cholesky_factor_corr[R,C] with R >= C type now?

We could.  I asked before and you didn't think we'd need it, which is why I didn't bother.
I'm OK with updating it --- the pattern's already there for covariance matrix cholesky
factors.

There are not that many use cases but I guess we might as well.

Ben

Bob Carpenter

unread,
May 6, 2014, 4:43:59 AM5/6/14
to stan...@googlegroups.com

On May 5, 2014, at 9:29 PM, Ben Goodrich <goodri...@gmail.com> wrote:

> On Monday, May 5, 2014 1:39:03 PM UTC-4, Bob Carpenter wrote:
> > I pushed that into the branch. I think the two questions left to answer are
> > • Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?
>
> I would say no to this. I don't think we want to give programming
> advice in the form of warning messages. It's a thin line, though.
>
> The difficulty is that we gave bad programming advice previously via the manual / mailing list when declaring a corr_matrix[K] in parameters was the only way to do it. I think we are going to run into the viral BUGS example problem where there are bad Stan examples floating around on the internet, which people find and then use, which perpetuates the problem.

That's a good point. Luckily we're not quite as entrenched in
printed books as BUGS is.

My main priority will be fixing our examples. Over the next few weeks,
Andrew and I are going to revise the manual, and we'll clean this kind of
thing up.

> Do we want to just throw an error? I don't mind breaking old .stan files as long as the error message clearly says what to do to fix it.

I'm confused --- it's not an error.

>
>
> > • Do we want to introduce the non-square cholesky_factor_corr[R,C] with R >= C type now?
>
> We could. I asked before and you didn't think we'd need it, which is why I didn't bother.
> I'm OK with updating it --- the pattern's already there for covariance matrix cholesky
> factors.
>
> There are not that many use cases but I guess we might as well.

OK --- I can do that when I get back, assuming I can figure it out. If not,
you'll be hearing from me after we get back from holiday (Thursday) :-)

- Bob

Ben Goodrich

unread,
May 6, 2014, 8:50:24 AM5/6/14
to stan...@googlegroups.com
On Tuesday, May 6, 2014 4:43:59 AM UTC-4, Bob Carpenter wrote:
> > I pushed that into the branch. I think the two questions left to answer are
> >         • Do we want the parser to at least give a warning if someone declares a corr_matrix[K] in the parameters block with an explanation that they should declare a cholesky_factor_corr[K] ?
>
> I would say no to this.  I don't think we want to give programming
> advice in the form of warning messages.  It's a thin line, though.
>
> The difficulty is that we gave bad programming advice previously via the manual / mailing list when declaring a corr_matrix[K] in parameters was the only way to do it. I think we are going to run into the viral BUGS example problem where there are bad Stan examples floating around on the internet, which people find and then use, which perpetuates the problem.

That's a good point.  Luckily we're not quite as entrenched in
printed books as BUGS is.

Yet. Once there are more published articles with Stan and / or a Stan book, it will be harder to change. And with Google upranking Google Groups posts and Stack Overflow posts, it is harder to get mc-stan.org or a GitHub repo to the front of a search.
 
> Do we want to just throw an error? I don't mind breaking old .stan files as long as the error message clearly says what to do to fix it.

I'm confused --- it's not an error.  

We could make it an error to declare a corr_matrix[K] in parameters. I think a warning would be preferable to an error but an error would be preferable to nothing.

Ben

Reply all
Reply to author
Forward
0 new messages