Hi Folks,
First, thank you all so much for rstan and for staffing this forum. My research would be approximately 5000 times harder without you and I'm forever grateful .
Right now, I'm simulating Gaussian Processes in rstan and trying to fit models for inference about the covariance functions. I find that this sometimes produces relatively poor inference on the covariance kernel's lengthscale parameter ("rho_sq"), and am wondering if I'm making some obvious mistake that I just don't have the experience to spot. I've attached my R file and the session log file, including the execution of sessionInfo().
To generate data, I'm using the stan simulation model in the manual in section 15.2 (on page 160, in version 2.9.0). First I construct an "x" vector of 100 points on [0,1], then draw from a GP on the space, like this:
data {
int<lower=1> N;
real x[N];
}
transformed data {
vector[N] mu;
cov_matrix[N] Sigma;
for (i in 1:N)
mu[i] <- 0;
for (i in 1:N)
for (j in 1:N)
Sigma[i,j] <- exp(-0.1*pow(x[i] - x[j],2))
+ if_else(i==j, 3, 0.0);
}
parameters {
vector[N] y;
}
model {
y ~ multi_normal(mu,Sigma);
}
So you can see I've set eta to 1, rho_sq to 0.1 and sigma to 3. I then fit the following simplified version of the stan model in the manual in section 15.3 to one of the (vector) draws from the GP above. This is model 1 in the R file:
data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
transformed data {
vector[N] mu;
for (i in 1:N) mu[i] <- 0;
}
parameters {
real<lower=0> inv_rho_sq;
real<lower=0> sigma_sq;
}
transformed parameters {
real<lower=0> rho_sq;
rho_sq <- inv(inv_rho_sq);
}
model {
matrix[N,N] Sigma;
// off-diagonal elements
for (i in 1:(N-1)) {
for (j in (i+1):N) {
Sigma[i,j] <- exp(-rho_sq * pow(x[i] - x[j],2));
Sigma[j,i] <- Sigma[i,j];
}
}
// diagonal elements
for (k in 1:N)
Sigma[k,k] <- 1 + sigma_sq; // + jitter
inv_rho_sq ~ cauchy(0,5);
sigma_sq ~ cauchy(0,5);
y ~ multi_normal(mu,Sigma);
}
This model works pretty well, however, the inference on rho_sq is patchy relative to the inference on sigma_sq. The mean of the rho_sq posterior jumps around because of the heavy right tail and this is quite sensitive to the dispersion in the priors (compare output of models 1, 2, in the attached file). I tried to get rid of the heavy tail using uniform priors but that does worse inference overall (see output of model 3). The inference breaks totally if I do either of these 2 things: (1) try to have improper uniform priors to do MLE (model 4), or (2) try to put a prior on rho_sq instead of inv_rho_sq (model 5).
Obviously you can see I've dispensed with inference on the eta parameter for now, setting it to its correct value of 1. I did this because at first I thought the instances of poor inference were due to a weak identification issue with eta in there, but this model is not performing any better without it. Fortunately, the inference on sigma_sq is always really good, which is the main goal of my project -- but I do want to know what's going on with rho_sq.
Does anyone know if there's a way to avoid the fat right tail on the rho_sq without the pathologies of the uniform? Should I be using normals here instead - and would that be generally helpful or just for this specific rho_sq? And do we know why the posterior gives up when I put a prior on rho_sq directly, or use an improper uniform prior?
As I said, I'm attaching an R file that does this all together. I usually save the stan models separately as per the manual's suggestion, but I figured it would be more convenient for you guys to have it in 1 place. I'm also attaching the log file from running this R script on the unix server I have access to at MIT.
I hope this is clear, let me know if I should provide further information or clarification. Thanks in advance for any input or suggestions you have!
Cheers
Rachael