Y = X * A + U
Y = [y_{p+1}', ..., y_T']'
X = [x_{p+1},..., x_T ]' with x_t = [1, y_{t-1}', ..., y_{t-p}']
S = (Y - X*Â)'(Y-XÂ)
Data block & transformed data block
I prepare the data mostly in R. In particular, I construct the matrix X in R and hand it over to Stan.
data {
int<lower=0> T; // No. of time periods observed
int<lower=0> M; // No. of variables in y_t
int<lower=1> p; // Lag order
matrix[T, M] y; // Matrix of time series
matrix[T-p, M] Y; // Matrix of time series for LHS
matrix[T-p, 1+M*p] X; // Matrix of lags of LHS variables
}
transformed data {
int<lower=1> K; // No. of parameters per equation
int<lower=1> K_slope; // No. of slope parameters per equation
int<lower=0> T_eff; // The effective number of obs. we can use, see below
matrix[M, M] S; // Scale matrix for inv. Wishart prior on Sigma
int nu; // DF for Wishart prior on Sigma
S <- crossprod(Y - X * (crossprod(X)\X'*Y));
K <- 1 + M * p;
K_slope <- M * p;
T_eff <- T - p;
nu <- T_eff - K - M - 1;
}
Parameters & transformed parameters block.
Parameters are A and Sigma, but for readability I like to separate them into slopes and intercepts.
parameters {
matrix[K, M] A; // Matrix of AR coefficients incl. intercept
cov_matrix[M] Sigma; // Covariance of the error terms
}
transformed parameters {
row_vector[M] intercepts;
matrix[K_slope, M] slopes;
intercepts <- row(A, 1);
slopes <- block(A, 2, 1, K_slope, M);
}
Model block.
Given the matrix X and A, I can write the mean for Y as a function of its own lags as the matrix product X * A. For
model {
matrix[T_eff, M] mus;
// A toy prior on the slopes
for(i in 1:K)
A[i] ~ normal(0, 1);
// Inv-Wishart for covariance matrix
Sigma ~ inv_wishart(nu, S);
// Likelihood
mus <- X * A; // ((T-P) x M) matrix of conditional means
for(n in 1:T_eff)
Y[n] ~ multi_normal(mus[n], Sigma);
}
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=... LC_CTYPE=... LC_MONETARY=... LC_NUMERIC=C LC_TIME=...
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.6.0 inline_0.3.13 Rcpp_0.11.5 MHadaptive_1.1-8 MASS_7.3-39 mnormt_1.5-1
loaded via a namespace (and not attached):
[1] stats4_3.1.3 tools_3.1.3
We have advice on how to set up effective priors for sampling inthe manual. We recommend (a) working with Cholesky factors, and(b) using a scaled correlation matrix with LKJ priors. Stan doesnot use any conjugacy information, so using inv_wishart isn't goingto be any faster. There are links to a paper by Ben on why we likethe LKJ.There's also no efficient inverse Wishart on the Cholesky factorscale. It's on our to-do list, but given that we don't use orrecommend (inverse) Wishart priors, it's not a high priority.
You also want to vectorize, so this bit of the model
matrix[T_eff, M] mus;
// A toy prior on the slopes
for(i in 1:K)
A[i] ~ normal(0, 1);
// Likelihoodbecomes this:
mus <- X * A; // ((T-P) x M) matrix of conditional means
for(n in 1:T_eff)
Y[n] ~ multi_normal(mus[n], Sigma);
A ~ normal(0, 1);
Y ~ multi_normal(X * A, Sigma);
But you really want to replace that Sigma with a Cholesky factor,
and use:
Y ~ multi_normal_cholesky(X * A, L_Sigma);
I tried this, but I think this is not possible here, because both A and Y are matrices. So I get a syntax error when trying to hand a matrix over to 'normal' or 'multi_normal'. What I do is to continue to resort to the standard workaround and constuct a temporary matrix of linear predictors and loop over its rows. I.e. writing Y[n] returns the n-th row of the matrix Y and the likelihood part of my code
// Likelihood
matrix[T_eff, M] mus;
mus <- X * A; // ((T-P) x M) matrix of conditional means
for(n in 1:T_eff)
Y[n] ~ multi_normal_cholesky(mus[n], L_Omega);
loops over all T_eff rows of Y and mus. What I can vectorize is the prior on A by using the 'to_vector' function (see below) - thanks for that hint!
Data & Transformed data block.
Note that I have added prior hyperparameters to the data, so I do not have to recompile the code every time I play with the prior.
data {
int<lower=0> T; // No. of time periods observed
int<lower=0> M; // No. of variables in y_t
int<lower=1> p; // Lag order
matrix[T-p, M] Y; // Matrix of time series for LHS
matrix[T-p, 1+M*p] X; // Matrix of lags of LHS variables
// Prior hyperparameters
real<lower=0> s_A; // SD of the normal prior on elements of A
real<lower=0> cauchy_scale;
real<lower=0> lkj_shape;
}
transformed data {
int<lower=1> K; // No. of parameters per equation
int<lower=1> K_slope; // No. of slope parameters per equation
int<lower=0> T_eff; // The effective number of obs. we can use, see below
K <- 1 + M * p;
K_slope <- M * p;
T_eff <- T - p;
}
Parameters & Transformed parameters.
parameters {
matrix[K, M] A; // Matrix of AR coefficients incl. intercept
cholesky_factor_corr[M] L_Omega;
vector<lower=0>[M] tau; // Vector of error variances
}
transformed parameters {
matrix[M,M] Sigma;
Sigma <- diag_pre_multiply(tau,L_Omega) * diag_pre_multiply(tau,L_Omega)';
}
Model block.
model {
// Likelihood
matrix[T_eff, M] mus;
mus <- X * A; // ((T-P) x M) matrix of conditional means
for(n in 1:T_eff)
Y[n] ~ multi_normal_cholesky(mus[n], L_Omega);
// Priors:
// A toy prior on A
to_vector(A) ~ normal(0, s_A);
// Prior on the error variances
tau ~ cauchy(0, cauchy_scale);
// LKJ prior for cholesky factor of correlation matrix
L_Omega ~ lkj_corr_cholesky(lkj_shape);
}
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
A[1,1] 0.11 0.14 0.14 -0.02 -0.02 0.11 0.25 0.25 1 569.40
A[1,2] 0.49 0.19 0.19 0.30 0.30 0.49 0.67 0.67 1 569.27
A[2,1] 0.75 0.12 0.12 0.64 0.64 0.75 0.87 0.87 1 396.18
A[2,2] -0.08 0.15 0.15 -0.23 -0.23 -0.08 0.08 0.08 1 396.43
A[3,1] 0.07 0.11 0.11 -0.04 -0.04 0.07 0.17 0.17 1 250.54
A[3,2] 0.57 0.14 0.14 0.43 0.43 0.57 0.72 0.72 1 250.65
A[4,1] 0.20 0.10 0.10 0.10 0.10 0.20 0.30 0.30 1 560.83
A[4,2] 0.50 0.13 0.13 0.36 0.36 0.50 0.63 0.63 1 560.71
A[5,1] -0.55 0.03 0.04 -0.59 -0.59 -0.55 -0.52 -0.52 1 93.86
A[5,2] -0.64 0.05 0.05 -0.69 -0.69 -0.64 -0.59 -0.59 1 93.86
L_Omega[1,1] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 500 NaN
L_Omega[1,2] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 500 NaN
L_Omega[2,1] 1.00 0.00 0.00 1.00 1.00 1.00 1.00 1.00 1 2.36
L_Omega[2,2] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1 2.54
tau[1] 0.28 0.00 0.00 0.28 0.28 0.28 0.29 0.29 1 13.08
tau[2] 0.38 0.00 0.00 0.38 0.38 0.38 0.38 0.38 1 13.08
Sigma[1,1] 0.28 0.00 0.00 0.28 0.28 0.28 0.29 0.29 1 13.08
Sigma[1,2] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 500 NaN
Sigma[2,1] 0.38 0.00 0.00 0.38 0.38 0.38 0.38 0.38 1 13.08
Sigma[2,2] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1 2.66
lp__ 1843.21 17.10 18.08 1821.54 1826.95 1829.11 1861.98 1864.78 1 2.86
The output is still not like what we would expect. The NaNs andthe Rhats. Did also experiment with the parameter lkj_shape,but it doesn't seem to be the reason.
row_vector[M] mus[T_eff];
for (t in 1:T_eff)
mus[t] <- X[t] * A;
// Priors:
// A toy prior on A
to_vector(A) ~ normal(0, s_A);
Y ~ multi_normal(mus,Sigma);library(vars)
VAR(X[,-1],p=1)
VAR Estimation Results:
=======================
Estimated coefficients for equation lag.1.1:
============================================
Call:
lag.1.1 = lag.1.1.l1 + lag.1.2.l1 + const
lag.1.1.l1 lag.1.2.l1 const
0.8334306 -0.1056876 -0.1073510
Estimated coefficients for equation lag.1.2:
============================================
Call:
lag.1.2 = lag.1.1.l1 + lag.1.2.l1 + const
lag.1.1.l1 lag.1.2.l1 const
0.02514594 0.34146860 0.18897348
> c(t(cbind(alpha,A.1)))
[1] -0.10 0.85 -0.10 0.20 0.05 0.35
> print(fitted.model)
Inference for Stan model: VAR.
2 chains, each with iter=100; warmup=50; thin=1;
post-warmup draws per chain=50, total post-warmup draws=100.
mean se_mean sd 2.5% 25% 50% 75% 97.5%
n_eff
A[1,1] 3.000000e-02 1.000000e-01 1.050000e+00 -1.010000e+00 -1.010000e+00 3.000000e-02 1.070000e+00 1.070000e+00 100
A[1,2] 1.030000e+00 1.400000e-01 1.400000e-01 8.900000e-01 8.900000e-01 1.030000e+00 1.170000e+00 1.170000e+00 1
A[2,1] 7.800000e-01 3.700000e-01 3.700000e-01 4.100000e-01 4.100000e-01 7.800000e-01 1.150000e+00 1.150000e+00 1
A[2,2] 6.700000e-01 8.600000e-01 8.600000e-01 -1.900000e-01 -1.900000e-01 6.700000e-01 1.520000e+00 1.520000e+00 1
A[3,1] -5.100000e-01 8.000000e-02 8.400000e-01 -1.340000e+00 -1.340000e+00 -5.100000e-01 3.300000e-01 3.300000e-01 100
A[3,2] -5.200000e-01 8.000000e-02 8.000000e-02 -6.000000e-01 -6.000000e-01 -5.200000e-01 -4.400000e-01 -4.400000e-01 1
lp__ -1.689988e+17 1.263999e+17 1.270366e+17 -2.953987e+17 -2.953987e+17 -1.689988e+17 -4.259894e+16 -4.259894e+16 1
Rhat
A[1,1] 3.749290e+15
A[1,2] 3.588760e+14
A[2,1] 1.332038e+15
A[2,2] 1.534515e+15
A[3,1] 3.007336e+15
A[3,2] 5.296542e+14
lp__ 8.426658e+15
Perfect --- that's exactly what we're looking for. Would you mind
I can post the .Rmd; however, I used the RMarkdown “notebook” approach, which means all you need to do to create the .PDF is point rmarkdown::render (which calls knitr::spin) in the direction of the R script; super easy! But I'm pretty sure I can flip some switch to get the above call to save the intermediate .rmd as well :)
For the licenses, is there something I can add to the header (or front matter, whatever it's called) to do that? Or what format should I add it in?
-Ryan