QR factorization in multilevel models

510 views
Skip to first unread message

Raghu Naik

unread,
Jan 18, 2017, 3:11:09 PM1/18/17
to stan-...@googlegroups.com
Dear STAN group,

I am trying to figure out how to recover the original beta (J x K) coefficients when the observation predictor matrix x (N x K) matrix has been subjected to a QR decomposition in a multilevel model ( where N is the number of observations, K
is the number of observation level predictors, and J is the number of groups).

I have gone through the QR reparametrization section (pg 121) in the latest (2.14) stan manual, although I am not sure how to implement beta = R_inv * theta step in the generated quantities block, in multilevel models. 

I am particularly interested in this issue in context  of a multilevel poisson model that I ran as below with 
- one first level predictor and its quadratic effect  (squared term) and
- three second level group level standardized predictors.

Here I had executed the qr decomp on the first level predictor and its squared term in R through the poly() function  as below:
x0 = poly(z, 2) ## N x 2 ; z is the individual level predictor
x = [1 x0] ## N x 3 ; added a column of ones

I had used this x matrix as the observation matrix  in the stan program. The model ran fine. Now I am not sure how to recover the original beta coefficients from the obtained beta coefficients that are based on the Q (observation matrix). 

Any feedback would be much appreciated.

Regards,

Raghu

-----------------------------------------------------------------------------stan code----------------------------------------------------

data {
    int<lower=0> N; // 7000 transactions
    int<lower=1> K; // 3 (including intercept)
    int<lower=1> J; // 1000
    int<lower=1> L; // 4 (including intercept)
    int<lower=1,upper=J> jj[N]; // group for individual
    matrix[N,K] x; // 7000 x 3
    matrix[J,L] u; // 1000 x 4
    int<lower=0> y[N];  //  7000 x 1 counts = 0,1,2,...
}

parameters {
    matrix[K,J] z;
    cholesky_factor_corr[K] L_Omega;
    vector<lower=0,upper=pi()/2>[K] tau_unif;
    matrix[L,K] gamma; // group coeffs
}

transformed parameters {
    matrix[J,K] beta;
    vector<lower=0>[K] tau; // prior scale
    for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
    beta = u * gamma + (diag_pre_multiply(tau,L_Omega) * z)';
}

model {
    vector[N] log_lambda;

    // priors
    to_vector(z) ~ normal(0,1);
//    tau ~ cauchy(0,2.5);
    L_Omega ~ lkj_corr_cholesky(2);
    to_vector(gamma) ~ normal(0,1);


    for (n in 1:N)
    log_lambda[n] = x[n] * beta[jj[n]]';


    y ~ poisson_log(log_lambda);

}



Michael Betancourt

unread,
Jan 18, 2017, 6:01:31 PM1/18/17
to Stan users mailing list
You want to do something like

vector[N_covar] beta;
beta = mdivide_right_tri_low(beta_tilde', R')’;

Note all of the transposes to make sure that you use the dense part of the R matrix.


> On Jan 18, 2017, at 3:11 PM, Raghu Naik <naik....@gmail.com> wrote:
>
> Dear STAN group,
>
> I am trying to figure out how to recover the original beta (J x K) coefficients when the observation predictor matrix x (N x K) matrix has been subjected to a QR decomposition in a multilevel model ( where N is the number of observations, K
> is the number of observation level predictors, and J is the number of groups).
>
> I have gone through the QR reparametrization section (pg 121) in the latest (2.14) stan manual, although I am not sure how to implement beta = R_inv * theta step in the generated quantities block, in multilevel models.
>
> I am particularly interested in this issue in context of a multilevel poisson model that I ran as below with
> - one first level predictor and its quadratic effect (squared term) and
> - three second level group level standardized predictors.
>
> Here I had executed the qr decomp on the first level predictor and its squared term in R as below:
> x0 = [x x^2] ## N x 2
> x1 = qr(x0)
> x2 = qr.Q[x1] ## N x 2
> x = [ones x2] ## N x 3
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Raghu Naik

unread,
Jan 25, 2017, 10:57:40 AM1/25/17
to stan-...@googlegroups.com
Thanks Michael. 

I have simulated a hierarchical model dataset with observation level quadratic effects to fix my understanding on how the QR decomposition should be used in these models.  The simulation code is as below:

N <- 1000
N_groups <- 100
group <- rep(1:N_groups,each=N/N_groups)
set.seed(5555)
B <- mvrnorm(N_groups,c(-1,0.5,0.25),matrix(c(1,-0.5,-0.5,-0.5,1,0.25,-0.5,0.25,1),3,3))
x1 <- runif(N, 1,5) # rnorm(N)
x2 <- I(x1^2)
cor(x1,x2)
y <- rnorm(N, B[group,1] + B[group,2]*x1 + B[group,3]*x2 ,2) 

## get the data together

dat1 <- list(
    N = N,
    J = N_groups,
    K = 3,
    y = y,
    x = cbind(x1,x2),
    # x = cbind(scale(x1, scale=FALSE),scale(x2, scale = FALSE)),
    group = group
)

I used a standard approach and a QR decomposition based approach based on section 8.2 QR Reparameterization of the manual, to estimate the model. I am focusing on recovering the group level mean and covariance parameters. 

The results from the standard non-centered approach:

Inference for Stan model: mixed_simulated2a.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
a            -0.56    0.01 0.48 -1.48 -0.89 -0.57 -0.24  0.40  3546 1.00
b             0.30    0.01 0.36 -0.40  0.06  0.31  0.55  1.01  3494 1.00
c             0.27    0.00 0.11  0.06  0.20  0.27  0.35  0.50  2039 1.00
cov_mat[1,1]  0.44    0.02 0.57  0.00  0.08  0.27  0.62  1.69  1167 1.00
cov_mat[1,2] -0.11    0.01 0.35 -0.93 -0.12 -0.02  0.02  0.11   903 1.00
cov_mat[1,3] -0.10    0.02 0.19 -0.51 -0.21 -0.08  0.01  0.24    79 1.04
cov_mat[2,1] -0.11    0.01 0.35 -0.93 -0.12 -0.02  0.02  0.11   903 1.00
cov_mat[2,2]  0.35    0.02 0.36  0.00  0.11  0.28  0.47  1.19   372 1.00
cov_mat[2,3] -0.10    0.02 0.16 -0.44 -0.20 -0.09  0.00  0.17   107 1.04
cov_mat[3,1] -0.10    0.02 0.19 -0.51 -0.21 -0.08  0.01  0.24    79 1.04
cov_mat[3,2] -0.10    0.02 0.16 -0.44 -0.20 -0.09  0.00  0.17   107 1.04
cov_mat[3,3]  0.89    0.00 0.14  0.65  0.79  0.88  0.97  1.18  1047 1.01

The results for the non-centered QR approach:

Inference for Stan model: mixed_simulated2b.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

               mean se_mean    sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
beta[1]        0.31    0.01  0.38  -0.45   0.05   0.32   0.57   1.04  4000 1.00
beta[2]        0.28    0.00  0.11   0.06   0.20   0.28   0.35   0.50  1397 1.00
a             -0.56    0.01  0.50  -1.53  -0.91  -0.56  -0.23   0.44  4000 1.00
b              4.32    0.04  1.17   2.02   3.54   4.31   5.09   6.62  1118 1.00
c              0.83    0.01  0.33   0.18   0.61   0.82   1.05   1.48  1397 1.00
cov_mat[1,1]   2.22    0.11  2.72   0.00   0.24   1.18   3.18   9.67   565 1.00
cov_mat[1,2]  -3.05    0.54  5.19 -15.65  -5.58  -1.52   0.33   3.90    92 1.04
cov_mat[1,3]   0.60    0.09  1.23  -1.24  -0.09   0.29   1.10   3.81   185 1.03
cov_mat[2,1]  -3.05    0.54  5.19 -15.65  -5.58  -1.52   0.33   3.90    92 1.04
cov_mat[2,2] 115.87    0.90 17.74  85.42 103.19 114.24 127.03 154.68   392 1.01
cov_mat[2,3]  27.87    0.11  4.26  20.53  24.91  27.58  30.54  36.98  1439 1.00
cov_mat[3,1]   0.60    0.09  1.23  -1.24  -0.09   0.29   1.10   3.81   185 1.03
cov_mat[3,2]  27.87    0.11  4.26  20.53  24.91  27.58  30.54  36.98  1439 1.00
cov_mat[3,3]   7.90    0.06  1.37   5.60   6.96   7.75   8.71  10.91   571 1.01

The covariance matrix estimates go crazy here. (Update: realized that this may be because of transformation of the observation level variables requiring different beta coefficients). 

Also, the QR estimates do not seem to be much better than the standard estimates for linear and quadratic effects. 

Wondering if the way I am implementing QR decomposition for mixed models is correct. 

Would very much appreciate any feedback. I have attached both versions of the code below.

Thanks. 

Raghu

_____________________________________________the codes_______________________________________

standard version
// non-centered version

data{
    int N;
    int J;
    int K; // = 3, number of observation level predictors including intercept
    matrix[N,K-1] x; // taking the x predictors
    real y[N];
    int group[N];
}
parameters{
    real a;
    real b;
    real c;
    real<lower=0> sigma;
    vector<lower=0>[K] tau;
    cholesky_factor_corr[K] L_Omega;
    vector[K] z[J];
}
transformed parameters {
    vector[K] v[J];
    matrix[K,K] Rho;

    Rho = L_Omega * L_Omega';

    for (j in 1:J)
        v[j] =  diag_pre_multiply(tau,L_Omega) * z[j];
}
model{
    vector[N] mu;
    for (j in 1:J)
        z[j] ~ normal(0,1);
    a ~ normal(0,10);
    b ~ normal(0,10);
    L_Omega ~ lkj_corr_cholesky(2);
    sigma ~ cauchy(0,2);
    tau ~ cauchy(0,2);

// likelihood
    for ( i in 1:N )
        mu[i] = a + v[group[i],1] + (b + v[group[i],2])*x[i,1] + (c + v[group[i], 3])*x[i,2] ;
    y ~ normal(mu,sigma);
}
generated quantities {
  matrix[K,K] cov_mat;

  cov_mat =  diag_pre_multiply(tau,L_Omega) * diag_pre_multiply(tau,L_Omega)';
}



non-centered version with QR decomp
// non-centered version with QR deocomposition

data{
    int N;
    int J;
    int K; // = 3, number of observation level predictors including intercept
    matrix[N,K-1] x; // taking the x predictors
    real y[N];
    int group[N]; // same as jj notation
}
transformed data {

  matrix[N, K-1] Q_ast;
  matrix[K-1, K-1] R_ast;
  matrix[K-1, K-1] R_ast_inverse;

  Q_ast = qr_Q(x)[, 1:K-1] * sqrt(N - 1);

  R_ast = qr_R(x)[1:K-1, ] / sqrt(N - 1);

  R_ast_inverse = inverse(R_ast);

}
parameters{
    real a;
    real b;
    real c;
    real<lower=0> sigma;
    vector<lower=0>[K] tau;
    cholesky_factor_corr[K] L_Omega;
    vector[K] z[J];
}
transformed parameters {
    vector[K] v[J];
    matrix[K,K] Rho;

    Rho = L_Omega * L_Omega';

    for (j in 1:J) // PK
        v[j] =  diag_pre_multiply(tau,L_Omega) * z[j];
}
model{
    vector[N] mu;
    for (j in 1:J)
        z[j] ~ normal(0,1);
    a ~ normal(0,10);
    b ~ normal(0,10);
    L_Omega ~ lkj_corr_cholesky(2);
    sigma ~ cauchy(0,2);
    tau ~ cauchy(0,2);

// likelihood
    for ( i in 1:N )
        mu[i] = a + v[group[i],1] + (b + v[group[i],2])*Q_ast[i,1] + (c + v[group[i], 3])*Q_ast[i,2] ;
    y ~ normal(mu,sigma);
}
generated quantities {
  matrix[K,K] cov_mat;
  vector[K-1] beta;

  cov_mat =  diag_pre_multiply(tau,L_Omega) * diag_pre_multiply(tau,L_Omega)';

  {
    vector[K-1] beta_tilde;
    beta_tilde[1] = b;
    beta_tilde[2] = c;

    beta = R_ast_inverse * beta_tilde;
  }
}



Reply all
Reply to author
Forward
0 new messages