On 9/3/13 10:22 AM, Sergio Polini wrote:
> I'm writing the code for Ch17, but I'm in trouble...
Uh oh. Let's see if we can help.
> I'm trying to rewrite wishart1.bug (data and models attached.)
> My first attempt:
>
> data {
> int<lower=0> N;
> int<lower=0> J;
> real y[N];
> real x[N];
> int county[N];
> }
> transformed data {
> cov_matrix[2] Scale;
> int df;
> for (i in 1:2)
> for (j in 1:2)
> if (i == j) Scale[i,j] <- 1.0; else Scale[i,j] <- 0.0;
> df <- 3;
> }
You can replace the loops with this:
Scale <- diag_matrix(rep_vector(1.0,2));
I like the idea of naming variables like df in the transformed
data block.
This will be faster given that it's in a loop if you first
pull out the Cholesky factor of the scale and then use
multi_normal_cholesky:
Matrix[2,2] L;
L <- cholesky_decompose(Sigma2_B_raw);
...
multi_normal_cholesky(mu_B_raw, L);
> for (i in 1:N)
> y[i] ~ normal(a[county[i]] + b[county[i]] * x[i], sigma_y);
This will be faster if you put the predictors into a vector
and then call the vectorized version of normal. Loops are fast
in Stan, but calls to probability distributions are not.
> generated quantities {
> real<lower=-1,upper=1> rho;
> rho <- Sigma2_B_raw[1,2] / sqrt(Sigma2_B_raw[1,1] * Sigma2_B_raw[2,2]);
> }
> It works, but even after 20,000 iterations:
> a) effective sample size is very low: about 50 for x_a and B_raw[*,1], 7 for x_b, 3 or 4 for B_raw[*,2], a few hundreds
> for most of the others, 8 for sigma_b;
It's a tough model. This is the kind of model where we might
want to standardize (a multi-dimensioanl Matt trick).
> b) Rhat is greater than 1.7 for B_raw[*,2], 1.2 for sigma_b.
How are you initializing? We want to run longer than this.
What is rho on the output? If it's very high (close to 1), it's going to be
problematic for sampling.
> I must say that I don't understand well how statements in the transformed parameters block work.
Have you read the manual language sections on transformed parameters?
In a nutshell, all of the data, transformed data, and parameters are
available when you compute transformed parameters, which will then be
available in the model.
> For instance, may I define a[j] as xi_a * B_raw[j,1] when B_raw is not yet defined, because it is modeled in the model
> block?
B_raw is declared in the parameters block, so it will be
defined at the point the transformed parameter block is
executed. The sampling statmeents do NOT define the variables
on their left side, they just increment the total log probability
based on variables that already have to be defined.
> Yes, I think, but I've tried another way: no transformed parameters block, "a" and "b" replaced with "xi_a * B_raw" in
> the model block, the derived quantities in the generated quantities block:
That'll work, too. If you don't want the output, move
the transformed parameters into local variables in the model,
or just get rid of them altogether.
> ...
>
> model {
> Sigma2_B_raw ~ inv_wishart(df, Scale);
> for (j in 1:J)
> B_raw[j] ~ multi_normal(mu_B_raw, Sigma2_B_raw);
> for (i in 1:N)
> y[i] ~ normal(xi_a * B_raw[county[i],1] + xi_b * B_raw[county[i],2] * x[i], sigma_y);
> }
> generated quantities {
> vector[J] a;
> vector[J] b;
> real mu_a;
> real mu_b;
> real sigma_a;
> real sigma_b;
> real<lower=-1,upper=1> rho;
> for (j in 1:J) {
> a[j] <- xi_a * B_raw[j,1];
> b[j] <- xi_b * B_raw[j,2];
> }
> mu_a <- xi_a * mu_B_raw[1];
> mu_b <- xi_b * mu_B_raw[2];
> sigma_a <- xi_a * sqrt(Sigma2_B_raw[1,1]);
> sigma_b <- xi_b * sqrt(Sigma2_B_raw[2,2]);
> rho <- Sigma2_B_raw[1,2] / sqrt(Sigma2_B_raw[1,1] * Sigma2_B_raw[2,2]);
> }
>
> Result (after 20,000 iterations): lower ESS, higher Rhat!
Probably just be randomization if it's the same model.
I'd try init=0, or better yet, try sensible initial values if
you can compute them before fitting the model.
You don't need to go this far here. Just the computation
of the Cholesky factor up front as I suggest above will save a factorization
for each n in 1:N in the calls to multi_normal, as well as a lot of
error checking.
> Moreover, I can't understand how to recover the covariance matrix, the old Sigma2_B_raw.
I didn't follow the details here --- usually you just
multiply L * L' if L is the Cholesky factor for the matrix.
But I'm guessing that's not the answer to your question.
- Bob