Model consultation: Vector Autoregression (VAR) in Stan

2,567 views
Skip to first unread message

Paul Viefers

unread,
Jul 7, 2015, 7:02:32 AM7/7/15
to stan-...@googlegroups.com
Dear Stan users,

I am trying to fit a vector autoregressive model in Stan. For those not familiar with kind of model and to render the post a bit self-contained, let me sketch what I'm trying to estimate.

Model.
Let time be indexed by t=1, ..., T. For every time period t I have a (Mx1) vector of variables y_t. In matrix notation my model reads

Y = X * A + U

where
 
Y = [y_{p+1}', ..., y_T']' 

is the (T-p x M) matrix of contemporaneous observations, and 

X = [x_{p+1},..., x_T ]' with x_t = [1, y_{t-1}', ..., y_{t-p}']
 
is the ((T-p) x (1+ M*p)) matrix of lagged observations (incl. an intercept). The matrix of coefficients A is therefore ((1+ M*p) x M) with the first row carrying the intercepts and all remaining rows the slopes.

I assume that U ~ Normal(0, Sigma) and for the moment an idependent N(0,1) prior for the coefficients in A. The prior for Sigma is inverse-Wishart with scale matrix S and DF nu. 

S = (Y - X*Â)'(Y-XÂ)

where  = (X'X)^{-1} X'Y is the OLS estimate for A and nu = T-p-M-K-1 .


Implementation in Stan (also see VAR.stan attached).
I run the model from R, using RStan. I ran the model on an artificial dataset which I generate using the attached R script (see below). Since the model code in Stan is quite short, let me streamline it here:

Data block & transformed data block
I prepare the data mostly in R. In particular, I construct the matrix X in R and hand it over to Stan. 
data {
   
int<lower=0> T;         // No. of time periods observed
   
int<lower=0> M;         // No. of variables in y_t
   
int<lower=1> p;         // Lag order
    matrix
[T, M] y;         // Matrix of time series
    matrix
[T-p, M] Y;       // Matrix of time series for LHS
    matrix
[T-p, 1+M*p] X;   // Matrix of lags of LHS variables
}
transformed data
{
   
int<lower=1> K;         // No. of parameters per equation
   
int<lower=1> K_slope;   // No. of slope parameters per equation
   
int<lower=0> T_eff;     // The effective number of obs. we can use, see below
    matrix
[M, M] S;         // Scale matrix for inv. Wishart prior on Sigma
   
int nu;                 // DF for Wishart prior on Sigma
   
    S
<- crossprod(Y - X * (crossprod(X)\X'*Y));
    K <- 1 + M * p;
    K_slope <- M * p;
    T_eff <- T - p;
    nu <- T_eff - K - M - 1;
}

Parameters & transformed parameters block.
Parameters are A and Sigma, but for readability I like to separate them into slopes and intercepts.
parameters {
    matrix
[K, M] A;         // Matrix of AR coefficients incl. intercept
    cov_matrix
[M] Sigma;    // Covariance of the error terms
}
transformed parameters
{
    row_vector
[M] intercepts;
    matrix
[K_slope, M] slopes;
   
    intercepts
<- row(A, 1);
    slopes
<- block(A, 2, 1, K_slope, M);
}

Model block.
Given the matrix X and A, I can write the mean for Y as a function of its own lags as the matrix product X * A. For
model {
    matrix
[T_eff, M] mus;
   
   
// A toy prior on the slopes
   
for(i in 1:K)
        A
[i] ~ normal(0, 1);
   
   
// Inv-Wishart for covariance matrix
   
Sigma ~ inv_wishart(nu, S);

   
// Likelihood
    mus
<- X * A;           // ((T-P) x M) matrix of conditional means
   
for(n in 1:T_eff)
        Y
[n] ~ multi_normal(mus[n], Sigma);
}


The file VAR_data.R generates an artificial dataset for this model with T=300, M=3, p=2 and then compiles the model from R and runs two chains. The file VAR.RData contains the artificial dataset and model results in the object 'fitted.model'. 

My sessionInfo()

R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1


locale
:
[1] LC_COLLATE=...  LC_CTYPE=...    LC_MONETARY=... LC_NUMERIC=C                    LC_TIME=...    


attached
base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    


other attached packages
:
[1] rstan_2.6.0      inline_0.3.13    Rcpp_0.11.5      MHadaptive_1.1-8 MASS_7.3-39      mnormt_1.5-1    


loaded via a
namespace (and not attached):
[1] stats4_3.1.3 tools_3.1.3


What comes out is not very encouraging. Since sampling here on my computer takes very long (unlike for other models with more parameters), I only ran two chains with 500 iterations each. Both the effective sample size and the Rhat statistic for the parameters are horrible and I am wondering why. The posterior means of the elements of the covariance are ok, but their Rhat still terrible. For parameters in A everything is just terrible. Do I simply need to run things longer?

I would expect a VAR to behave weird with really large dimensions, but this toy model is very small and should converge quickly, no? I would be grateful for any hints at the glaring mistakes or inefficiencies in my code - I'm still learning.

Many thanks in advance and cheers, 
Paul




VAR.RData
VAR_data.R
VAR.stan

Bob Carpenter

unread,
Jul 7, 2015, 1:16:24 PM7/7/15
to stan-...@googlegroups.com
We have advice on how to set up effective priors for sampling in
the manual. We recommend (a) working with Cholesky factors, and
(b) using a scaled correlation matrix with LKJ priors. Stan does
not use any conjugacy information, so using inv_wishart isn't going
to be any faster. There are links to a paper by Ben on why we like
the LKJ.

There's also no efficient inverse Wishart on the Cholesky factor
scale. It's on our to-do list, but given that we don't use or
recommend (inverse) Wishart priors, it's not a high priority.

You also want to vectorize, so this bit of the model

matrix[T_eff, M] mus;

// A toy prior on the slopes
for(i in 1:K)
A[i] ~ normal(0, 1);

// Likelihood
mus <- X * A; // ((T-P) x M) matrix of conditional means
for(n in 1:T_eff)
Y[n] ~ multi_normal(mus[n], Sigma);


becomes this:

A ~ normal(0, 1);
Y ~ multi_normal(X * A, Sigma);

But you really want to replace that Sigma with a Cholesky factor,
and use:

Y ~ multi_normal_cholesky(X * A, L_Sigma);

I couldn't follow all your linear algebra in the transformed data, so
have no idea what it's doing or whether there might be a problem. Hopefully
you would've tested it all somehow (like with print() statements) to make sure
it's doing what you want.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <VAR.RData><VAR_data.R><VAR.stan>

Paul Viefers

unread,
Jul 8, 2015, 11:41:55 AM7/8/15
to stan-...@googlegroups.com
Bob,

Many thanks for the swift response. It's really awesome you guys provide support here for people to learn.

I spent the afternoon going over the code in an attempt to fix things. 

A. Summary
Before going into details, I have implemented your suggestions as far as possible and it works much better now, but still not good I'd say. For a 2-dimensional vector process y_t (M=2), the model works somehow, but that's just a toy linear model with 10 parameters in A and 3 parameters in Sigma and no hierarchical structure. Intercepts are recovered much better than slopes, variances and covariances are awful. This gives me an indication that the code is not terribly off base - its' just not very good and I'd like to know why.
Setting M=3 and (i) the sampler slows down dramatically - both in terms of wall time and effective sample size, (ii) Rhat explodes and (iii) posteriors means nowhere close to the actual value. Increasing the number of observed time periods (T) does not help dramatically.

I have attached the newest version of my code and the R file that creates the artificial data (for M=2 and M=3) and runs the model.

B. Details.
Concerning your comments and suggestions: 
  1. Prior on the covariance matrix: Inverse-Wishart -> LKJ + Cholesky

  1. We have advice on how to set up effective priors for sampling in 
    the manual.  We recommend (a) working with Cholesky factors, and 
    (b) using a scaled correlation matrix with LKJ priors.  Stan does 
    not use any conjugacy information, so using inv_wishart isn't going 
    to be any faster.   There are links to a paper by Ben on why we like 
    the LKJ.   
    There's also no efficient inverse Wishart on the Cholesky factor 
    scale.  It's on our to-do list, but given that we don't use or 
    recommend (inverse) Wishart priors, it's not a high priority.

  1. I suspected nobody here will like to see (inverse-)Wishart priors on covariances, but it served as a first brush and was meant to make thing comparable to other implementations, e.g in R with the Gibbs. I saw you suggest these priors for the covariance matrix of regression coefficients in a hierarchical model, but note that I am applying it to the sampling covariance matrix, so it didn't come to my mind immediately to use it for that purpose. But I implemented it the way it is suggested in the manual (an LKJ cholesky prior and a separate half-cauchy prior on the vector of variances) and I think it is a major improvement. Previously, the chain wasn't exploring at all and got stuck at the starting values, because all proposals were rejected. Now the chain isn't stuck anymore - so it's much more well-behaved. However, the scale matrix of the inverse-Wishart (S in my notation, see transformed parameter block in previous post) in my earlier code really helped to recover the parameters in the covariance matrix. The LKJ prior is worse in that respect - assuming I did not commit a mistake.
     
  2. Vectorization
You also want to vectorize, so this bit of the model 

    matrix[T_eff, M] mus; 
    
    // A toy prior on the slopes 
    for(i in 1:K) 
        A[i] ~ normal(0, 1); 
    
    // Likelihood 
    mus <- X * A;           // ((T-P) x M) matrix of conditional means 
    for(n in 1:T_eff) 
        Y[n] ~ multi_normal(mus[n], Sigma); 


becomes this: 

  A ~ normal(0, 1); 
  Y ~ multi_normal(X * A, Sigma); 

But you really want to replace that Sigma with a Cholesky factor, 
and use: 

  Y ~ multi_normal_cholesky(X * A, L_Sigma); 
 
I tried this, but I think this is not possible here, because both A and Y are matrices. So I get a syntax error when trying to hand a matrix over to 'normal' or 'multi_normal'. What I do is to continue to resort to the standard workaround and constuct a temporary matrix of linear predictors and loop over its rows. I.e. writing Y[n] returns the n-th row of the matrix Y and the likelihood part of my code 

// Likelihood
        matrix
[T_eff, M] mus;

        mus
<- X * A;           // ((T-P) x M) matrix of conditional means
       
for(n in 1:T_eff)

            Y
[n] ~ multi_normal_cholesky(mus[n], L_Omega);  

loops over all T_eff rows of Y and mus. What I can vectorize is the prior on A by using the 'to_vector' function (see below) - thanks for that hint!

C. Updated model code.
Following your advice and changing the priors on the covariance and writing it terms of the Cholesky leads to the following updated code 

Data & Transformed data block.
Note that I have added prior hyperparameters to the data, so I do not have to recompile the code every time I play with the prior.
data {
   
int<lower=0> T;         // No. of time periods observed
   
int<lower=0> M;         // No. of variables in y_t
   
int<lower=1> p;         // Lag order

    matrix
[T-p, M] Y;       // Matrix of time series for LHS
    matrix
[T-p, 1+M*p] X;   // Matrix of lags of LHS variables

   
   
// Prior hyperparameters
    real
<lower=0> s_A;      // SD of the normal prior on elements of A
    real
<lower=0> cauchy_scale;
    real
<lower=0> lkj_shape;

}
transformed data
{
   
int<lower=1> K;         // No. of parameters per equation
   
int<lower=1> K_slope;   // No. of slope parameters per equation
   
int<lower=0> T_eff;     // The effective number of obs. we can use, see below

    K
<- 1 + M * p;
    K_slope
<- M * p;
    T_eff
<- T - p;
}

Parameters & Transformed parameters.
parameters {
    matrix
[K, M] A;         // Matrix of AR coefficients incl. intercept

    cholesky_factor_corr
[M] L_Omega;
    vector
<lower=0>[M] tau; // Vector of error variances
}
transformed parameters
{
    matrix
[M,M] Sigma;
   
Sigma <- diag_pre_multiply(tau,L_Omega) * diag_pre_multiply(tau,L_Omega)';
}

Model block.
model {
   
// Likelihood
        matrix
[T_eff, M] mus;
        mus
<- X * A;           // ((T-P) x M) matrix of conditional means
       
for(n in 1:T_eff)
            Y
[n] ~ multi_normal_cholesky(mus[n], L_Omega);
       
   
// Priors:
       
// A toy prior on A
        to_vector
(A) ~ normal(0, s_A);
   
       
// Prior on the error variances
        tau
~ cauchy(0, cauchy_scale);
   
       
// LKJ prior for cholesky factor of correlation matrix
        L_Omega
~ lkj_corr_cholesky(lkj_shape);
}

D. Bottomline
This is not a complicated model. As far as I can see it's a simple linear model - no complicated hierarchical structures. Admittedly, VARs are very parameter-rich, but (a) that's what Stan is supposedly good at and (b) my code isn't even good for low dimensions. I still don't understand why the model is not converging very quickly and fails to recover the parameters :-/

Thanks again for the help - I will continue to carefully reconsider the code and the model again.

-Paul
VAR_v04.stan
VAR_data_2_variables.R
VAR_data_3_variables.R

Bob Carpenter

unread,
Jul 15, 2015, 1:35:46 AM7/15/15
to stan-...@googlegroups.com
I have some comments inline, but I'll put the comments on the model code
here.

If you need to compute Sigma, it's more efficient to do that in the
generated quantities block --- that way it doesn't throw a ton of
complicated functions and expressions at autodiff.

Also, you don't ever want to do a complicated computation twice (like diag_pre_multiply),
so this:

Sigma <- diag_pre_multiply(tau,L_Omega) * diag_pre_multiply(tau,L_Omega)';

is better off written as this:

Sigma <- multiply_lower_tri_self_transpose(diag_pre_multiply(tau, L_Omega));

I think the example using post_multiply in the manual is wrong and
this is right.

You want to scale L_Omega with a vector of positive-constrained
standard deviations --- as is, you're constraining the diagonal of
the covariance matrix to be zero. There are instructions in the
regression chapter of the manual.

> On Jul 8, 2015, at 11:41 AM, Paul Viefers <paulv...@gmail.com> wrote:
>
> Bob,
>
> Many thanks for the swift response. It's really awesome you guys provide support here for people to learn.
>
> I spent the afternoon going over the code in an attempt to fix things.
>
> A. Summary
> Before going into details, I have implemented your suggestions as far as possible and it works much better now, but still not good I'd say. For a 2-dimensional vector process y_t (M=2), the model works somehow, but that's just a toy linear model with 10 parameters in A and 3 parameters in Sigma and no hierarchical structure.

Often the hierarchical models will fit much better for
something like this by getting the prior standard deviation
right.

> Intercepts are recovered much better than slopes, variances and covariances are awful.

But how is the interval coverage? If you don't have much data, there
will be a lot of posterior uncertainty in a model like this.

> This gives me an indication that the code is not terribly off base - its' just not very good and I'd like to know why.

What do you think is not very good? Speed, or you think you're
getting the wrong answer? Did all the parameters have R-hat values
near 1?

> Setting M=3 and (i) the sampler slows down dramatically - both in terms of wall time and effective sample size, (ii) Rhat explodes and (iii) posteriors means nowhere close to the actual value. Increasing the number of observed time periods (T) does not help dramatically.
>
> I have attached the newest version of my code and the R file that creates the artificial data (for M=2 and M=3) and runs the model.
>
> B. Details.
> Concerning your comments and suggestions:
> • Prior on the covariance matrix: Inverse-Wishart -> LKJ + Cholesky
>
> We have advice on how to set up effective priors for sampling in
> the manual. We recommend (a) working with Cholesky factors, and
> (b) using a scaled correlation matrix with LKJ priors. Stan does
> not use any conjugacy information, so using inv_wishart isn't going
> to be any faster. There are links to a paper by Ben on why we like
> the LKJ.
> There's also no efficient inverse Wishart on the Cholesky factor
> scale. It's on our to-do list, but given that we don't use or
> recommend (inverse) Wishart priors, it's not a high priority.
>
> I suspected nobody here will like to see (inverse-)Wishart priors on covariances, but it served as a first brush and was meant to make thing comparable to other implementations, e.g in R with the Gibbs.

You and a bunch of other people. It's amazing to me how
ingrained standard practices become. It does make sense to
make sure Stan's doing the "right thing", though it's surprisingly
subtle to measure what the right thing is.


> I saw you suggest these priors for the covariance matrix of regression coefficients in a hierarchical model, but note that I am applying it to the sampling covariance matrix, so it didn't come to my mind immediately to use it for that purpose. But I implemented it the way it is suggested in the manual (an LKJ cholesky prior and a separate half-cauchy prior on the vector of variances) and I think it is a major improvement. Previously, the chain wasn't exploring at all and got stuck at the starting values, because all proposals were rejected. Now the chain isn't stuck anymore - so it's much more well-behaved.

Part of that's just the Cholesky factor implementation. If we
did the same thing for Wisharts they'd be more robust. It's on
our to-do list.

> However, the scale matrix of the inverse-Wishart (S in my notation, see transformed parameter block in previous post) in my earlier code really helped to recover the parameters in the covariance matrix. The LKJ prior is worse in that respect - assuming I did not commit a mistake.

It shouldn't be, assuming you were using equally "informative"
version sof the priors. In fact, you can drive the LKJ prior down
to uniform if you want, but we recommend some regularization for
all the usual reasons (here, mainly avoiding degenerate boundary
cases).

>
> • Vectorization
> You also want to vectorize, so this bit of the model
>
> matrix[T_eff, M] mus;
>
> // A toy prior on the slopes
> for(i in 1:K)
> A[i] ~ normal(0, 1);
>
> // Likelihood
> mus <- X * A; // ((T-P) x M) matrix of conditional means
> for(n in 1:T_eff)
> Y[n] ~ multi_normal(mus[n], Sigma);
>
>
> becomes this:
>
> A ~ normal(0, 1);
> Y ~ multi_normal(X * A, Sigma);
>
> But you really want to replace that Sigma with a Cholesky factor,
> and use:
>
> Y ~ multi_normal_cholesky(X * A, L_Sigma);
>
> I tried this, but I think this is not possible here, because both A and Y are matrices.

You need to make Y an array of vectors, and same with X * A. It's
totally worth what looks to an R user like it's going to be slower.
Write the loop, create the array of vectors for X * A, and try it
yourself.

> So I get a syntax error when trying to hand a matrix over to 'normal' or 'multi_normal'. What I do is to continue to resort to the standard workaround and constuct a temporary matrix of linear predictors and loop over its rows. I.e. writing Y[n] returns the n-th row of the matrix Y and the likelihood part of my code
>
> // Likelihood
> matrix[T_eff, M] mus;
> mus <- X * A; // ((T-P) x M) matrix of conditional means
> for(n in 1:T_eff)
> Y[n] ~ multi_normal_cholesky(mus[n], L_Omega);
>
> loops over all T_eff rows of Y and mus. What I can vectorize is the prior on A by using the 'to_vector' function (see below) - thanks for that hint!

At this point, all you need to do is this:

vector[M] mus[T_eff];
for (t in 1:T_eff)
mus[t] <- X[t] * A;
Y ~ multi_normal_cholesky(mus, L_Omega);

You need to make sure Y is also an array of vectors.
How much data do you have? Are the parameters getting recovered
in their intervals and just the point estimates are bad? Estimating
covariance matrices is hard with not much data. But I'm assuming
you've already fit it with Gibbs code from somewhere and that worked (you
have to be careful that the Gibbs answers are right --- it can have a hard
time with multivariate structures, which as you note, is one of the reasons
we wrote Stan).

- Bob

andre.p...@googlemail.com

unread,
Oct 19, 2015, 3:58:00 PM10/19/15
to Stan users mailing list
Although the original is from Paul. I have had a look and did the 
supposed corrections. 
The output is still not like what we would expect. The NaNs and 
the Rhats. Did also experiment with the parameter lkj_shape,
but it doesn't seem to be the reason.

                mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff   Rhat
A
[1,1]          0.11    0.14  0.14   -0.02   -0.02    0.11    0.25    0.25     1 569.40
A
[1,2]          0.49    0.19  0.19    0.30    0.30    0.49    0.67    0.67     1 569.27
A
[2,1]          0.75    0.12  0.12    0.64    0.64    0.75    0.87    0.87     1 396.18
A
[2,2]         -0.08    0.15  0.15   -0.23   -0.23   -0.08    0.08    0.08     1 396.43
A
[3,1]          0.07    0.11  0.11   -0.04   -0.04    0.07    0.17    0.17     1 250.54
A
[3,2]          0.57    0.14  0.14    0.43    0.43    0.57    0.72    0.72     1 250.65
A
[4,1]          0.20    0.10  0.10    0.10    0.10    0.20    0.30    0.30     1 560.83
A
[4,2]          0.50    0.13  0.13    0.36    0.36    0.50    0.63    0.63     1 560.71
A
[5,1]         -0.55    0.03  0.04   -0.59   -0.59   -0.55   -0.52   -0.52     1  93.86
A
[5,2]         -0.64    0.05  0.05   -0.69   -0.69   -0.64   -0.59   -0.59     1  93.86
L_Omega
[1,1]    1.00    0.00  0.00    1.00    1.00    1.00    1.00    1.00   500    NaN
L_Omega
[1,2]    0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   500    NaN
L_Omega
[2,1]    1.00    0.00  0.00    1.00    1.00    1.00    1.00    1.00     1   2.36
L_Omega
[2,2]    0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00     1   2.54
tau
[1]          0.28    0.00  0.00    0.28    0.28    0.28    0.29    0.29     1  13.08
tau
[2]          0.38    0.00  0.00    0.38    0.38    0.38    0.38    0.38     1  13.08
Sigma[1,1]      0.28    0.00  0.00    0.28    0.28    0.28    0.29    0.29     1  13.08
Sigma[1,2]      0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   500    NaN
Sigma[2,1]      0.38    0.00  0.00    0.38    0.38    0.38    0.38    0.38     1  13.08
Sigma[2,2]      0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00     1   2.66
lp__        
1843.21   17.10 18.08 1821.54 1826.95 1829.11 1861.98 1864.78     1   2.86

Andre


VAR_v06.stan
VAR_data_2_variables.R

Ben Goodrich

unread,
Oct 19, 2015, 4:21:12 PM10/19/15
to Stan users mailing list
On Monday, October 19, 2015 at 3:58:00 PM UTC-4, andre.p...@googlemail.com wrote:
The output is still not like what we would expect. The NaNs and 
the Rhats. Did also experiment with the parameter lkj_shape,
but it doesn't seem to be the reason.

Message has been deleted

andre.p...@googlemail.com

unread,
Oct 20, 2015, 12:42:07 PM10/20/15
to Stan users mailing list
Hi everybody,

we know a-priori the Sigma, thus we can pass it it. Thus the model becomes:
     row_vector[M] mus[T_eff];
     
for (t in 1:T_eff)
        mus
[t] <- X[t] * A;  

   
// Priors:

       
// A toy prior on A
       
      to_vector
(A) ~ normal(0, s_A);

      Y
~ multi_normal(mus,Sigma);

Checking we do the right thing:

library(vars)

VAR(X[,-1],p=1)

VAR
Estimation Results:
=======================

Estimated coefficients for equation lag.1.1:
============================================
Call:
lag
.1.1 = lag.1.1.l1 + lag.1.2.l1 + const

lag
.1.1.l1 lag.1.2.l1      const
 
0.8334306 -0.1056876 -0.1073510


Estimated coefficients for equation lag.1.2:
============================================
Call:
lag
.1.2 = lag.1.1.l1 + lag.1.2.l1 + const

lag
.1.1.l1 lag.1.2.l1      const
0.02514594 0.34146860 0.18897348

Stan does:

> c(t(cbind(alpha,A.1)))
[1] -0.10  0.85 -0.10  0.20  0.05  0.35
> print(fitted.model)
Inference for Stan model: VAR.
2 chains, each with iter=100; warmup=50; thin=1;
post
-warmup draws per chain=50, total post-warmup draws=100.


                mean      se_mean           sd          
2.5%           25%           50%           75%         97.5%
 n_eff
A
[1,1]  3.000000e-02 1.000000e-01 1.050000e+00 -1.010000e+00 -1.010000e+00  3.000000e-02  1.070000e+00  1.070000e+00   100
A
[1,2]  1.030000e+00 1.400000e-01 1.400000e-01  8.900000e-01  8.900000e-01  1.030000e+00  1.170000e+00  1.170000e+00     1
A
[2,1]  7.800000e-01 3.700000e-01 3.700000e-01  4.100000e-01  4.100000e-01  7.800000e-01  1.150000e+00  1.150000e+00     1
A
[2,2]  6.700000e-01 8.600000e-01 8.600000e-01 -1.900000e-01 -1.900000e-01  6.700000e-01  1.520000e+00  1.520000e+00     1
A
[3,1] -5.100000e-01 8.000000e-02 8.400000e-01 -1.340000e+00 -1.340000e+00 -5.100000e-01  3.300000e-01  3.300000e-01   100
A
[3,2] -5.200000e-01 8.000000e-02 8.000000e-02 -6.000000e-01 -6.000000e-01 -5.200000e-01 -4.400000e-01 -4.400000e-01     1
lp__  
-1.689988e+17 1.263999e+17 1.270366e+17 -2.953987e+17 -2.953987e+17 -1.689988e+17 -4.259894e+16 -4.259894e+16     1
               
Rhat
A
[1,1] 3.749290e+15
A
[1,2] 3.588760e+14
A
[2,1] 1.332038e+15
A
[2,2] 1.534515e+15
A
[3,1] 3.007336e+15
A
[3,2] 5.296542e+14
lp__  
8.426658e+15


VAR_v06a.stan
VAR_data_2_variables_ext.R

Bob Carpenter

unread,
Oct 20, 2015, 6:28:24 PM10/20/15
to stan-...@googlegroups.com
Those Rhat values indicate the chains haven't converged to the same
stationary distribution.

Otherwise, I didn't see a question in here for us.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <VAR_v06a.stan><VAR_data_2_variables_ext.R>

andre.p...@googlemail.com

unread,
Oct 20, 2015, 6:54:16 PM10/20/15
to Stan users mailing list
Bob,

very true the chains haven't converged. The model should be very well behaved due to little additional
noise and very few parameters and T=500 enough data to be fitted. 
How could we expect that a model will unkown correlation matrix in more complex models will ever 
converge to something useful, when already this toy model have problems?

Don't get me wrong. 
I just want to find out the reason, because you do a lot of good 
work. 

Andre

Rob Trangucci

unread,
Oct 20, 2015, 11:54:27 PM10/20/15
to stan-...@googlegroups.com
I was having trouble understanding the model you attached as it was coded, so I wrote the VAR model naively, and I was able to get it to converge nicely. R file for generating VAR(2) data and the general VAR(p) model attached. I didn't model the first 3 observations.

Rob
VAR_data_2_variables_ext.R
VAR_p.stan

Bob Carpenter

unread,
Oct 20, 2015, 11:57:25 PM10/20/15
to stan-...@googlegroups.com
Thanks, Rob, that's awesome.

- Bob
> <VAR_data_2_variables_ext.R><VAR_p.stan>

andre.p...@googlemail.com

unread,
Oct 21, 2015, 6:31:38 AM10/21/15
to Stan users mailing list
Wow, that's beyond every expectations. Really really awesome.
With allowance of Rob, the model should be put to GIT as another
solution of STAN. 

My thanks to  Rob, Bob, Ben and Paul.
Andre


Ryan Batt

unread,
Dec 9, 2015, 6:47:28 PM12/9/15
to Stan users mailing list
Was the pitfall in the original model ever discovered? Hoping to learn from others' mistakes!

James Savage

unread,
Dec 10, 2015, 12:28:34 PM12/10/15
to Stan users mailing list

Hi Ryan - 

FWIW I put together some VAR models a while ago and you may find the code helpful. This one fits two VAR models using two different weighting schemes. Note I didn't model parameter covariance, which is a bit naughty. I also didn't use Cholesky factorisation of the covariance matrix-- I'll see if I can find some code with those two modifications. It works well anyhow. 

Ryan Batt

unread,
Dec 10, 2015, 1:11:49 PM12/10/15
to stan-...@googlegroups.com
Thanks James! It's good to have working examples :) But, in all sincerity, I want to know what was going wrong in the OP's model, too. I didn't study it per se, but nothing crazy jumped out at me. I'm not asking anyone to figure out why something was wrong, but if the OP had noticed the mistake, and then points it out, I would be better equipped to avoid that same mistake myself (and because it seemed to be so sneaky, I'm assuming that it's an easy one to make).

Nonetheless, I'm sure that everyone on here appreciates your shared efforts, so thank again!

-Ryan

Ryan Batt

unread,
Dec 11, 2015, 3:51:03 PM12/11/15
to Stan users mailing list
Hi all,

Sorry for cross-posting, but I added covariates to Rob's model and put it in a knitr. https://groups.google.com/d/msg/stan-users/PmGMnqXRD28/AnXr4IVOAgAJ

-Ryan

Bob Carpenter

unread,
Dec 11, 2015, 6:21:38 PM12/11/15
to stan-...@googlegroups.com
Perfect --- that's exactly what we're looking for. Would you mind

* posting the .Rmd

* adding yourself as author (we want to keep copyright with
authors and make it explicit)

* adding a license for the code (we like BSD, but given that it's
R, GPL is fine, too --- both are compatible with GPL); if you
could choose one of the creative commons licenses for the
text, that'd also be great (we use CC BY 4.0 for the manual, but
feel free to choose a more restrictive one)

* removing the link to the paper we don't have the rights
to distribute (I don't like closed source papers, but also
don't want to violate copyright even indirectly this way).

I hope this isn't making it too onerous --- I hate dealing
with this bureuacratic stuff, especially when I'm trying to
contribute something for free!

Then I can put it up in our examples directory when I get
around to setting up the web format, which I should get
to pretty soon (hopefully in the next couple weeks -- it's
a perfect holiday project).

This is also making me come up with a checklist for what we
need, which should make it easier for the next person.

- Bob

Ryan Batt

unread,
Dec 11, 2015, 8:16:58 PM12/11/15
to Stan users mailing list
Perfect --- that's exactly what we're looking for.  Would you mind 
I wouldn't mind at all! I'm glad I can contribute something useful. Helps give back for all the free advice :)  That said, Rob did most of the work in the Stan model; so hopefully he'll see this and give me the OK to put him on that doc as a coauthor.

Also, if you don't mind, I'll just post the updated document on the linked thread; I'll post back here to indicate it's been updated, and when I post the update here I'll link back to this too. Like I said before, sorry for cross-posting. I initially hadn't realized that VAR and MAR were the same.

I can post the .Rmd; however, I used the RMarkdown “notebook” approach, which means all you need to do to create the .PDF is point rmarkdown::render (which calls knitr::spin) in the direction of the R script; super easy! But I'm pretty sure I can flip some switch to get the above call to save the intermediate .rmd as well :)

For the licenses, is there something I can add to the header (or front matter, whatever it's called) to do that? Or what format should I add it in?

-Ryan

Bob Carpenter

unread,
Dec 11, 2015, 8:21:45 PM12/11/15
to stan-...@googlegroups.com

> On Dec 11, 2015, at 8:16 PM, Ryan Batt <bat...@gmail.com> wrote:
>
> Perfect --- that's exactly what we're looking for. Would you mind
> I wouldn't mind at all! I'm glad I can contribute something useful. Helps give back for all the free advice :) That said, Rob did most of the work in the Stan model; so hopefully he'll see this and give me the OK to put him on that doc as a coauthor.

I'll leave that up to you and him. The other alternative is
to cite and/or link to what he did.

> Also, if you don't mind, I'll just post the updated document on the linked thread; I'll post back here to indicate it's been updated, and when I post the update here I'll link back to this too. Like I said before, sorry for cross-posting. I initially hadn't realized that VAR and MAR were the same.

Works for me.

> I can post the .Rmd; however, I used the RMarkdown “notebook” approach, which means all you need to do to create the .PDF is point rmarkdown::render (which calls knitr::spin) in the direction of the R script; super easy! But I'm pretty sure I can flip some switch to get the above call to save the intermediate .rmd as well :)

Thanks for explaining. No, we don't want the .rmd if it's
complete without it.

> For the licenses, is there something I can add to the header (or front matter, whatever it's called) to do that? Or what format should I add it in?

Good point. Footnote for the text of the paper makes
sense and then a commment at the top of the script file.

- Bob

Ryan Batt

unread,
Apr 12, 2017, 10:04:05 AM4/12/17
to Stan users mailing list
Ashamedly, I still have not completed these tasks.

As a minor step forward, here's a link to the R script on a GitHub repo (the repo was for a different project) so that others may find this code until I get around to making it pretty:


-Ryan
Reply all
Reply to author
Forward
0 new messages