question re adaptation of stan model p 77 of reference manual

50 views
Skip to first unread message

John Sutton

unread,
Oct 23, 2016, 11:15:52 PM10/23/16
to Stan users mailing list
I have borrowed the Stan multilevel model with variable slopes and intercepts and group-level predictors for my own work. It runs great. I want to introduce some flexibility by specifying group-level predictors for only some of the variable coefficients--say, for example, the intercept. I'm stumped, and would appreciate any pointers, tips, winks-and-nods, and/or general directions to appropriate sample code that may have surfaced on the user list. 

I'll append the code I'm using below, just to avoid ambiguity. Changes to the example are few, the most conspicuous being the variable error scale (sigma_y) to allow for heteroscedasticity. Thanks in advance for attention to my question. 

Best,
John

data {
  int<lower=0> N;                 // num country-years
  int<lower=1> K;                 // num ind predictors
  int<lower=1> J;                 // num countries
  int<lower=1> L;                 // num country predictors
  int<lower=1,upper=J> cid[N];    // country id
  matrix[N,K] x;                  // individual predictors
  matrix[J,L] u;                  // country predictors
  vector[N] y;                    // outcomes
}

parameters {
  matrix[K,J] z;
  cholesky_factor_corr[K] L_Omega;           // prior correlation
  vector<lower=0, upper=pi()/2>[K] tau_unif; 
  matrix[L,K] gamma;                // country-level coeffs
  vector<lower=0>[J] sigma_y;       // heteroscedastic error scale
}

transformed parameters {
  matrix[J,K] beta;
  vector<lower=0>[K] tau;         // prior scale
  for (k in 1:K)
    tau[k] = 2.5 * tan(tau_unif[k]);
  beta = u * gamma + (diag_pre_multiply(tau, L_Omega) * z)';
}

model {
  to_vector(z) ~ normal(0,1);
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(gamma) ~ normal(0, 5);
  y ~ normal(rows_dot_product(beta[cid], x), sigma_y[cid]);
}

generated quantities {
  vector[N] y_hat;
  corr_matrix[K] Omega;
  Omega = multiply_lower_tri_self_transpose(L_Omega);
  y_hat = rows_dot_product(beta[cid], x);
}


PS, and by the way: when I borrow the code from the last line of the optimized model 
y ~ normal(beta[cid] .* x, sigma_y[cid]);

I get an error:

No matches for: 
  vector ~ normal(matrix, vector)

 But it's not a problem for me.

Bob Carpenter

unread,
Oct 24, 2016, 2:25:04 PM10/24/16
to stan-...@googlegroups.com
Can you send the whole program that has the syntax error?

I'm not sure what problem you're having, because you shold
be able to code up group-level coefficients for any of the groups
you like. You just write out the regression you want and the priors
for the coefficients you want.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

John Sutton

unread,
Oct 27, 2016, 8:15:20 PM10/27/16
to Stan users mailing list
Sorry if the answer to my question should have been obvious. The problem I had was how how to estimate gammas (group-level effects coefficients) only for the intercept and estimate means for all the effects coefficients. This is simple enough when one isn't using multivariate priors, but gets more complicated (to me) when you go all Cholesky. Assignments like this:

beta = u * gamma + (diag_pre_multiply(tau, L_Omega) * z)';

aren't the thing when you're only interested in the effect of u * gamma on the intercept.

I continued to bang away at it and came up with the attached Stan script (reproduced below), which is in part an elaboration of a map2stan model in Richard McElreath's book. It's inelegant, but it runs fast and converges well, and the estimates for the effects coefficients (g) are spot on. However the estimates for G0 (intercept and group effect) are, I'm quite sure, wrong, though not entirely implausible. If you can look at it and let me know if you see anything obviously wrong it would be a big help.

Thanks and best,
John

data {
  int<lower=0> N;                 // num country-years
  int<lower=1> J;                 // num countries
  int<lower=1> K;                 // num predictors
  int<lower=1> L;                 // num country predictors 
  real y[N];                      // outcome
  matrix[N,K] X;                  // predictors (not incl intercept)
  int<lower=1, upper=17> cid[N];  // country
  matrix[J,2] U;                  // country predictors (incl intercept)
}

parameters {
  matrix[K+1,J] z;
  cholesky_factor_corr[K+1] L_Rho;
  real g0;
  vector[K] g;                  
  vector<lower=0>[J] sigma_y;
  real<lower=0> sigma_b0;
  vector<lower=0>[K] sigma_b;
  vector[L] G0;
}

transformed parameters{
  matrix[J, K+1] b_temp;
  vector[K+1] sigma_b_temp;       // 
  vector[J] b0;                   // country intercepts
  matrix[J,K] b;                  // country parameters
  matrix[5,K+1] Rho;
  sigma_b_temp[1] = sigma_b0;
  sigma_b_temp[2:K+1] = sigma_b;
  b_temp = (diag_pre_multiply(sigma_b_temp, L_Rho) * z)';
  b0 = col(b_temp, 1);
  for (k in 1:K)
    b[,k] = b_temp[,k+1];
  Rho = L_Rho * L_Rho' ;
}

model {
  vector[N] mu;
  vector[N] B0;
  matrix[N,K] B;
  L_Rho ~ lkj_corr_cholesky(2);
  sigma_y ~ cauchy(0, 2);
  sigma_b ~ cauchy(0, 2);
  g0 ~ normal(U * G0, 10);
  g ~ normal(0, 5);
  G0 ~ normal(0, 10);
  to_vector(z) ~ normal(0, 1);
  for (n in 1:N){
    B0[n] = g0 + b0[cid[n]];
  }
  for (n in 1:N){
    for (k in 1:K){
      B[n,k] = g[k] + b[cid[n],k];
    }
  }
  for (n in 1:N){
    mu[n] = B0[n] + B[n] * X[n]';
  }
  y ~ normal(mu, sigma_y[cid]);
}


pris_vi.stan

Bob Carpenter

unread,
Oct 27, 2016, 8:33:34 PM10/27/16
to stan-...@googlegroups.com

> On Oct 27, 2016, at 8:15 PM, John Sutton <jack...@gmail.com> wrote:
>
> Sorry if the answer to my question should have been obvious.

The problem was that I didn't understand the question, not
that I thought the answer was obvious.

> The problem I had was how how to estimate gammas (group-level effects coefficients) only for the intercept and estimate means for all the effects coefficients.

This is the part I don't understand. In Stan, you get a posterior
mean estimate for all of the parameters declared in the parameters
block of a model. So I'm not sure what you're trying to control.

Hopefully someone else will be able to help. It might help
to start a new thread with a more specific question related
to regression in the subject.

- Bob

John Sutton

unread,
Oct 30, 2016, 5:34:22 PM10/30/16
to Stan users mailing list
Sorry. The problem is simple. The model that generalizes the radon analysis in the manual is essentially:

y[i] = a[j] + b[jk]X[ik]
a[j] ~ N(g[a0] + g[a1] u[j], sigma^2[a])
b[jk] ~ N(g[bk0] + g[bk1] u[j], sigma^2[bk])

Now change the last line to:

b[jk] ~ N(g[bk], sigma^2[bk])

In other words, drop the group-level predictor for the betas. Turns out the answer was pretty obvious, but I had to bang my head on it for awhile:
 
data {
  int<lower=0> N;                 // num country-years
  int<lower=1> K;                 // num ind predictors
  int<lower=1> J;                 // num countries
  int<lower=1> L;                 // num country predictors
  int<lower=1,upper=J> cid[N];    // country id
  matrix[N,K] X;                  // individual predictors
  matrix[J,L] R;                  // country predictors
  vector[N] y;                    // outcomes
}

parameters {
  matrix[K,J] z;
  cholesky_factor_corr[K] L_Omega;           // prior correlation
  vector<lower=0, upper=pi()/2>[K] tau_unif; 
  vector[L] g0;                // country-level coeffs
  vector[K-1] g;                  
  vector<lower=0>[J] sigma_y;       // heteroscedastic error scale
}

transformed parameters {
  matrix[J,K] beta;
  matrix[J,K] beta_temp;
  vector<lower=0>[K] tau;         // prior scale
  for (k in 1:K)
    tau[k] = 2.5 * tan(tau_unif[k]);
  beta_temp = (diag_pre_multiply(tau, L_Omega) * z)';
  beta[,1] = R * g0 + beta_temp[,1];
  for (k in 2:K)
    beta[,k] = g[k-1] + beta_temp[,k];
}

model {
  to_vector(z) ~ normal(0,1);
  L_Omega ~ lkj_corr_cholesky(2);
  g0 ~ normal(0,5);
  g ~ normal(0,5); 
  y ~ normal(rows_dot_product(beta[cid], X), sigma_y[cid]);
}

generated quantities {
  vector[N] y_hat;
  corr_matrix[K] Omega;
  Omega = multiply_lower_tri_self_transpose(L_Omega);
  y_hat = rows_dot_product(beta[cid], X);
}


This script works well, and it's generalizable--i.e., any subset of betas can be given group predictors. Still, I'd welcome any suggestions for improvement. Apologies again for my lack of clarity. I think you'd be spared many such questions if you had updated examples of the radon models in ch. 17 of G&H. The ones currently available are outdated and confusing.

 Best,
John
Reply all
Reply to author
Forward
0 new messages