Rhat NaNs problem in Seemingly unrelated regression example with two time series

133 views
Skip to first unread message

Raghu Naik

unread,
Aug 5, 2015, 9:53:58 AM8/5/15
to Stan users mailing list
Folks,

I am playing with seemingly unrelated regressions (pg. 71 of the 2.7 manual) using the Grunfeld investment data available in the R systemfit package. I use both the multi_normal and multi_normal_cholesky approaches provided in the manual to model two time series. 

The first approach goes well. The second  LKJ priors approach throws up Rhat NaNs on some correlations coefficients. I wanted to find out how that can be fixed.

Here is the data code:

library(systemfit)
library
(rstan)


data
("GrunfeldGreene")


## use the GE and Westinghouse data

ge
<- subset(GrunfeldGreene, firm %in% c("General Electric"))
west
<- subset(GrunfeldGreene, firm %in% c("Westinghouse"))

N
= nrow(ge) ### number of obs

## standardize predictors

ge$value_std
<-   (ge$value - mean(ge$value)) / sd(ge$value)
west$value_std
<- (west$value - mean(west$value)) / sd(west$value)


ge$capital_std
<- (ge$capital - mean(ge$capital)) / sd(ge$capital)
west$capital_std
<- (west$capital - mean(west$capital)) / sd(west$capital)


y
<- cbind(ge$invest, west$invest)
x1
<- cbind(rep(1,N), ge$value_std, ge$capital_std)  ## GE predictors with intercept
x2
<- cbind(rep(1,N), west$value_std, west$capital_std) ## Westinghouse predictors with intercept

K
= ncol(y) ## number of time series / equations
J
= ncol(x1) ## number of predictors
data_ge_west
<- list(y=y, x1=x1, x2=x2, N=N, K=K, J=J)



Code for the multi_normal approach that works well:
data {
int<lower=1> K;
int<lower=1> J;
int<lower=0> N;
vector
[J] x1[N];
vector
[J] x2[N];
vector
[K] y[N];
}


parameters
{
matrix
[K,J] beta;
cov_matrix
[K] Sigma;
}
model
{
vector
[K] mu[N];
for (n in 1:N) {
// mu[n] <- beta * x[n];
mu
[n][1] <- beta[1] * x1[n];
mu
[n][2] <- beta[2] * x2[n];
}


y
~ multi_normal(mu, Sigma);
}



The results of this approach with default warmups, iterations, chains, and thinning:

4 chains, each with iter=2000; warmup=1000; thin=1;
post
-warmup draws per chain=1000, total post-warmup draws=4000.


              mean se_mean     sd    
2.5%     25%     50%     75%   97.5% n_eff Rhat
beta
[1,1]   102.16    0.19   8.08   85.52   97.01  102.31  107.19  118.29  1749    1
beta
[1,2]    17.88    0.25   7.97    2.86   12.61   17.58   22.88   34.69  1049    1
beta
[1,3]    32.89    0.20   8.50   14.59   27.77   33.15   38.40   48.90  1727    1
beta
[2,1]    42.87    0.07   2.88   37.06   41.03   42.84   44.65   48.64  1815    1
beta
[2,2]    13.66    0.11   4.03    5.52   11.10   13.68   16.30   21.77  1278    1
beta
[2,3]     2.71    0.14   4.28   -5.92    0.02    2.83    5.43   10.86   984    1
Sigma[1,1] 1317.66   21.91 687.11  558.48  865.51 1136.43 1570.10 3039.40   983    1
Sigma[1,2]  364.40    7.25 216.43  116.93  224.89  309.80  446.86  907.93   892    1
Sigma[2,1]  364.40    7.25 216.43  116.93  224.89  309.80  446.86  907.93   892    1
Sigma[2,2]  167.72    2.45  85.06   72.43  111.66  146.81  200.49  382.09  1202    1
lp__      
-111.92    0.08   2.57 -118.14 -113.31 -111.45 -110.04 -108.25   976    1


Code for the LKJ prior approach:

data {
int<lower=1> K;
int<lower=1> J;
int<lower=0> N;
vector
[J] x1[N];
vector
[J] x2[N];
vector
[K] y[N];
}


parameters
{
matrix
[K,J] beta;
cholesky_factor_corr
[K] L_Omega;
vector
<lower=0>[K] L_sigma;
}
model
{
vector
[K] mu[N];
matrix
[K,K] L_Sigma;


L_Sigma
<- diag_post_multiply(L_Omega, L_sigma);
 to_vector
(beta) ~ normal(0, 50);
L_Omega
~ lkj_corr_cholesky(2);
 L_sigma
~ cauchy(0, 2.5);


// likelihood




for (n in 1:N) {
// mu[n] <- beta * x[n];
mu
[n][1] <- beta[1] * x1[n];
mu
[n][2] <- beta[2] * x2[n];
}


y
~ multi_normal_cholesky(mu, L_Sigma);


}


Results of this approach:

4 chains, each with iter=2000; warmup=1000; thin=1;
post
-warmup draws per chain=1000, total post-warmup draws=4000.


                mean se_mean   sd    
2.5%     25%     50%     75%   97.5% n_eff Rhat
beta
[1,1]     100.64    0.15 6.51   87.87   96.33  100.62  104.92  113.46  1786    1
beta
[1,2]      16.13    0.17 6.52    3.79   11.92   15.97   20.18   29.53  1507    1
beta
[1,3]      33.62    0.16 6.68   19.53   29.46   33.85   38.29   46.14  1819    1
beta
[2,1]      42.42    0.06 2.37   37.90   40.80   42.44   43.96   47.19  1793    1
beta
[2,2]      12.94    0.09 3.41    6.01   10.77   12.98   15.21   19.61  1444    1
beta
[2,3]       3.49    0.10 3.65   -3.85    1.10    3.44    5.91   10.57  1406    1
L_Omega
[1,1]    1.00    0.00 0.00    1.00    1.00    1.00    1.00    1.00  4000  NaN
L_Omega
[1,2]    0.00    0.00 0.00    0.00    0.00    0.00    0.00    0.00  4000  NaN
L_Omega
[2,1]    0.27    0.00 0.06    0.15    0.23    0.27    0.31    0.39  2190    1
L_Omega
[2,2]    0.96    0.00 0.02    0.92    0.95    0.96    0.97    0.99  2057    1
L_sigma
[1]     28.97    0.13 5.26   20.93   25.13   28.31   31.87   41.41  1705    1
L_sigma
[2]      7.30    0.04 1.44    5.08    6.30    7.09    8.08   10.64  1380    1
lp__        
-131.08    0.07 2.35 -136.51 -132.44 -130.72 -129.34 -127.55  1302    1

Thanks for all your  help.

Raghu

Ben Goodrich

unread,
Aug 5, 2015, 10:21:17 AM8/5/15
to Stan users mailing list
On Wednesday, August 5, 2015 at 9:53:58 AM UTC-4, Raghu Naik wrote:
The first approach goes well. The second  LKJ priors approach throws up Rhat NaNs on some correlations coefficients. I wanted to find out how that can be fixed.

For a Cholesky factor, L, the L[i,j] elements for j > i are fixed at 0. Moreover, for a Cholesky factor of a correlation matrix, L[1,1] is fixed at 1. Therefore, no function of their "distributions" --- including Rhat --- is well-defined.

However, this

L_Sigma <- diag_post_multiply(L_Omega, L_sigma);

is miscoded and should be

L_Sigma <- diag_pre_multiply(L_sigma, L_Omega);

so that

L_Sigma * transpose(L_Sigma) = diag(L_sigma) * L_Omega * transpose(L_Omega) * diag(L_sigma)

is a covariance matrix.

Ben

Raghu Naik

unread,
Aug 5, 2015, 11:36:13 AM8/5/15
to Stan users mailing list
Thanks Ben. Should have thought of that :-). 

Bob Carpenter

unread,
Aug 5, 2015, 12:49:33 PM8/5/15
to stan-...@googlegroups.com
That's my fault --- I messed it up in the manual. It'll get fixed
for the next release:

https://github.com/stan-dev/stan/issues/1526#issuecomment-128067607

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Raghu Naik

unread,
Aug 5, 2015, 1:09:58 PM8/5/15
to Stan users mailing list
All the same, thanks for putting the section in on SURs. 
Reply all
Reply to author
Forward
0 new messages