Hi Bob: I wrote the Stan and R code below and still the estimates for lambda near 1 aren't very good. But I could be doing something odd/wrong in the Stan code so if you or anyone else wants to
take a look at it, it's appreciated. The intention of the Stan code for the 3 parameters, sigmasquared, beta and lambda respectively was for
A) sigmasquared to be restricted between zero and infinity and have the prior for sigmasquared be 1/sigmasquared.
B) beta to be unrestricted and have its prior be flat and improper.
C) lambda to be restricted between 0 and 1 and have its prior be (1-lambda)^2.
Based on your previous useful explanation, I think that's what the code at the bottom of this posting does. It's essentially an attempt at implementing the fix to the ridge at lambda = 1 problem described in adl_gibbs2.pdf. ( I also tried the version where beta1mlambda is made into one variable and then divided by (1-lambda) in the generated quantities section but that didn't work either ).
I just have a couple of questions because I have not dared to try to open the hood of Stan. I've written code in C and C++, more C than C++, but I would need infrastructure training/tutelage to get involved with going inside of STAN. My guess is that this would require more time than the Stan team has time for at the moment.
1) Does Stan construct the U(x) = -log(p(x) symbolically ? for example, similarly to how mathematica would construct an expression. I ask because this "fix" described in adl_gibbs2.pdf might be specific to the gibbs sampling framework. I don't know enough about the details of Stan to know whether the fix should work in Stan also. Essentially, atleast in the gibba sampling framework, by giving that prior above to lambda, the (1-lambda) in the denominator of posterior density of beta conditional on lambda gets zapped and one ends up with (1-lambda) in the numerator which is fine because the density of beta conditional on lambda = 1 becomes zero. But I'm not sure if, in Stan, U(x) gets constructed symbolically using the prior for lambda. Michael mention earlier that the lambda parameter
gets unconstrained so that the unconstrained version is on the interval [-inf, +inf] but I don't see how that happens. If it does, then doesn't posterior density need to be modified ?
2) Does the Stan team ever offer tutoring/training sessions for Stan newbies who have a decent amount of experience programming but no experience dealing with something the size and nature of
Stan. I'd like to understand Stan in more depth and I think the only way to accomplish that would be to get inside it.
Thanks for your great help. If the code below seems fine to you, then I may have hit another brick wall with the koyck model. But I'm used to that happening with Mr. Koyck.
# R CODE
#====================================================================
library("rstan");
set.seed(123)
T <- 200;
x <- rnorm(T,0,1);
#=========================================
# NOT OKAY : 0.3
#=========================================
beta <- 1.0;
lambda <- 0.95
sigmasquared <- 1.0
#=========================================
# NOT OKAY 1.2
#=========================================
#beta <- 2.6;
#lambda <- 0.95
#sigmasquared <- 1.0
#=========================================
# NOT OKAY : -0.1
#=========================================
#beta <- 0.5;
#lambda <- 0.95
#sigmasquared <- 1.0
#====================================
# NOT OKAY: 0.6
#====================================
#beta <- 1.0;
#lambda <- 0.90
#sigmasquared <- 1.0
#====================================
# NOT GREAT : 1.8
#========================================
#beta <- 2.6;
#lambda <- 0.90
#sigmasquared <- 1.0
#========================================
# NOT SO GOOD : 0.2
#========================================
#beta <- 0.5;
#lambda <- 0.90
#sigmasquared <- 1.0
#========================================
#NOT BAD: 0.8
#=========================================
#beta <- 1.0;
#lambda <- 0.85
#sigmasquared <- 1.0
#=========================================
#SO SO : 2.0
#=========================================
#beta <- 2.6;
#lambda <- 0.85
#sigmasquared <- 1.0
#=========================================
#SO SO : 0.3
#=========================================
#beta <- 0.5;
#lambda <- 0.85
#sigmasquared <- 1.0
#====================================
# SO SO: 0.7
#====================================
#beta <- 1.0;
#lambda <- 0.80
#sigmasquared <- 1.0
#====================================
# DECENT: 2.2
#====================================
#beta <- 2.6;
#lambda <- 0.80
#sigmasquared <- 1.0
#====================================
#SO SO : 0.3
#====================================
#beta <- 0.5;
#lambda <- 0.80
#sigmasquared <- 1.0
#====================================
# OKAY: 0.8
#====================================
#beta <- 1.0;
#lambda <- 0.75
#sigmasquared <- 1.0
#====================================
# GOOD: 2.7
#====================================
#beta <- 2.6;
#lambda <- 0.75
#sigmasquared <- 1.
#====================================
# STILL SO SO : 0.3
#====================================
#beta <- 0.5;
#lambda <- 0.75
#sigmasquared <- 1.0
#====================================
# OKAY: 0.8
#====================================
#beta <- 1.0;
#lambda <- 0.70
#sigmasquared <- 1.0
#====================================
# GOOD: 2.4
#====================================
#beta <- 2.6;
#lambda <- 0.70
#sigmasquared <- 1.0
#====================================
# GOOD: 0.4
#====================================
#beta <- 0.5;
#lambda <- 0.70
#sigmasquared <- 1.0
#===========================================
y <- rep(NA,T);
y[1] <- rnorm(1,beta * (1 - lambda) * x[1], sqrt(sigmasquared));
for (t in 2:T)
y[t] <- rnorm(1, beta * (1 - lambda) * x[t] + lambda * y[t-1], sqrt(sigmasquared));
fit <- stan(file="koyck_paperb.stan", data=c("T","y","x"), iter=1000, chains=4);
print(fit)
#=====================================================================================
# STAN CODE FOR IMPLEMENTING RIDGE FIX DESCRIBED IN ADL_GIBBS2.pdf. PRIOR ON LAMBDA
# IS (1-LAMBDA)^2, BETA IS IMPROPER AND FLAT, SIGMASQUARED IS ~ 1/SIGMASQUARED
#=====================================================================================
data {
int<lower=0> T; // number of time points
real y[T]; // output at time t
real x[T]; // predictor for time t
}
parameters {
real beta; // part of x coeff
real <lower=0, upper=1> lambda; // lagged coeff
real <lower=0> sigmasquared; // noise scale
}
model {
y[1] ~ normal(beta*(1-lambda) * x[1], sqrt(sigmasquared));
for (t in 2:T)
y[t] ~ normal( beta*(1-lambda) * x[t] + lambda * y[t-1],sqrt(sigmasquared));
lp__<- lp__ + 2 * log1m(lambda);
lp__<- lp__ - log(sigmasquared);
}
#============================================================================================================