LDLT Factor for Covariance Matrix not Positive definite, when using lkj_corr_cholesky

1,221 views
Skip to first unread message

sarthak dash

unread,
Nov 2, 2014, 8:07:05 PM11/2/14
to
Hi everyone,

I just decided to start using Stan after attending one of Bob's lectures(and this is my Day-2 of using Stan), and coded up a simple Finite mixture model for the same, and decided to impose a prior on the Covariance matrix in terms of Scale and a Correlation matrix(following the ideas in the manual).

However, I get an error message that says,

"Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:

stan::prob::multi_normal_log(N4stan5agrad3varE): LDLT_Factor of covariance parameterunderlying matrix is not positive definite. LDLT_Factor of covariance parameterlast conditional variance is 0:0."

I need some help in order to fix this issue. Please find attached my stan code and the dataset.

PS: I am using CmdStan v2.5

Thanks,

Sarthak

 


BayesianFMM.data
BayesianFMM.stan

Ben Goodrich

unread,
Nov 2, 2014, 9:18:03 PM11/2/14
to
On Sunday, November 2, 2014 8:07:05 PM UTC-5, sarthak dash wrote:
However, I get an error message that says,

"Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:

stan::prob::multi_normal_log(N4stan5agrad3varE): LDLT_Factor of covariance parameterunderlying matrix is not positive definite. LDLT_Factor of covariance parameterlast conditional variance is 0:0."

One Informational  Message is not that big a deal and isn't a fatal error. However, your code can be made more efficient by following these two rules:

1. Don't use any variant of multi_normal() when the left-hand side is NOT data; use the stochastic representation of the multinormal distribution
2. Don't use multi_normal() when you already have a Cholesky factor and thus can use multi_normal_cholesky()

So your code could be:
parameters {
  simplex
[K] theta;
  cholesky_factor_corr
[nDims] L;

  vector
[nDims] z_mu[K];
  vector
[nDims] z_beta;
  vector
[nDims] sigma_diag;
}
transformed parameters
{
  vector[nDims] mu[K];
  vector
[nDims] beta;

  matrix[nDims,nDims] sigma_L;
  sigma_L <- diag_pre_multiply(sigma_diag, L);
  beta <- Avg_mean + sigma_L * z_beta;
  for(k in 1:K) mu[k] <- beta + sigma_L * z_mu[k];
}
model {
  real ps[K];

  L ~ lkj_corr_cholesky(2);

  sigma_diag[1] ~ normal(201,5);
  sigma_diag[2] ~ normal(-31,3);
  sigma_diag[3] ~ normal(15,4);
  sigma_diag[4] ~ normal(14800, 8100);
  sigma_diag[5] ~ normal(60, 35);

  z_beta ~ normal(0,1); # implies beta ~ multi_normal_cholesky(Avg_mean, sigma_L);
  for (k in 1:K) {
         z_mu[k] ~ normal(0,1); # implies mu[k] ~ multi_normal_cholesky(beta,sigma_L);
  }

  for (n in 1:N) {
    for (k in 1:K) {
        ps[k] <- log(theta[k]) + multi_normal_cholesky_log(y[n], mu[k], sigma_L);
    }
    increment_log_prob(log_sum_exp(ps));
  }

}

I'm not quite sure why you had sigma_nu and lambda as the same thing in your code, but I replaced them with their Cholesky factor sigma_L.

Ben

Ben Goodrich

unread,
Nov 2, 2014, 9:47:58 PM11/2/14
to stan-...@googlegroups.com
On Sunday, November 2, 2014 9:18:03 PM UTC-5, Ben Goodrich wrote:
However, your code can be made more efficient by following these two rules:

1. Don't use any variant of multi_normal() when the left-hand side is NOT data; use the stochastic representation of the multinormal distribution
2. Don't use multi_normal() when you already have a Cholesky factor and thus can use multi_normal_cholesky()

I accidentally omitted the NOT in my previous post; edited now.

Ben

sarthak dash

unread,
Nov 3, 2014, 10:37:14 AM11/3/14
to stan-...@googlegroups.com
Hi Ben,

Thanks a lot for helping me on those rules. I understand them now, and it makes sense, in terms of numerical stability.
However, when I execute the code(for sampling), it says something like,
Gradient evaluation took 0.112778 seconds, 1000 transitions using 10 leapfrog steps per transition would take 1127.78 seconds.Adjust your expectations accordingly! "

And, then the execution goes up till 300/2000 iterations, and stops. (I did give it about 30 - 45 mins for the same). I was under the impression that the entire run would be taking around 1127.78*2 seconds(or close to 40 minutes), but it seems that I am incorrect in this case.

Could you please help me in "adjusting my expectations" ?

Thanks again,
Sarthak

Ben Goodrich

unread,
Nov 3, 2014, 10:43:42 AM11/3/14
to stan-...@googlegroups.com
On Monday, November 3, 2014 10:37:14 AM UTC-5, sarthak dash wrote:
Hi Ben,

Thanks a lot for helping me on those rules. I understand them now, and it makes sense, in terms of numerical stability.
However, when I execute the code(for sampling), it says something like,
Gradient evaluation took 0.112778 seconds, 1000 transitions using 10 leapfrog steps per transition would take 1127.78 seconds.Adjust your expectations accordingly! "

And, then the execution goes up till 300/2000 iterations, and stops. (I did give it about 30 - 45 mins for the same). I was under the impression that the entire run would be taking around 1127.78*2 seconds(or close to 40 minutes), but it seems that I am incorrect in this case.

Could you please help me in "adjusting my expectations" ?

That time estimate is really rough under the best of circumstances. My guess is that around 300 iterations it hit a place in the posterior where it (thinks it) needs much more than 10 leapfrog steps per transition. Once it eventually finishes, we might be able to figure out what the problem is.

Ben

sarthak dash

unread,
Nov 3, 2014, 10:47:33 AM11/3/14
to stan-...@googlegroups.com
Ok, I will restart the code again, and allow it to run for a couple of hours.
Hopefully, that would be enough for the sampler to finish up.

Regards,
Sarthak

sarthak dash

unread,
Nov 3, 2014, 11:22:34 AM11/3/14
to stan-...@googlegroups.com
Well, this is strange. I recompiled the code again, and started the code. And the execution was over in like 10 minutes.

"#  Elapsed Time: 251.072 seconds (Warm-up), 245.781 seconds (Sampling), 496.853 seconds (Total)"

whereas the last time, it got stuck at 300/2000 iterations for about 30 minutes(before I manually terminated the code).  Any suggestions ?

Regards,

Sarthak

Ben Goodrich

unread,
Nov 3, 2014, 2:13:30 PM11/3/14
to stan-...@googlegroups.com
On Monday, November 3, 2014 11:22:34 AM UTC-5, sarthak dash wrote:
Well, this is strange. I recompiled the code again, and started the code. And the execution was over in like 10 minutes.

"#  Elapsed Time: 251.072 seconds (Warm-up), 245.781 seconds (Sampling), 496.853 seconds (Total)"

whereas the last time, it got stuck at 300/2000 iterations for about 30 minutes(before I manually terminated the code).  Any suggestions ?

The wall time is a random variable. That is part of the reason why the time estimate is very rough.

Ben

Bob Carpenter

unread,
Nov 3, 2014, 2:29:20 PM11/3/14
to stan-...@googlegroups.com
More constructively, better behaved posteriors (tighter priors,
better parameterizations) or better inits (less far out in the tail)
can help reduce the time variation during warmup.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages