Hi,
I'm comparing a Gamma regression model (log link) using pure rstan and rstanarm and I'm unsure if it's the same.
Generate data:
set.seed(12345)
N<-100 #sample size
dat <- data.frame(x1=runif(N,-1,1),x2=runif(N,-1,1))
# model matrix
X<-model.matrix(~x1*x2,dat)
K<-dim(X)[2] #number of regression params
betas<-runif(K,-2,2) # betas
# true betas :
# [1] 0.17718465 0.78518356 -0.75241019 0.02661803
phi <- 5
#simulate gamma data
mus <- exp(X%*%betas)
y < -rgamma(100,shape=mus**2/phi,rate=mus/phi)
# the model matrix contains the intercept and the x1*x2 interaction, we remove these for rstanarm
# and define it in the formula syntax
df <- data.frame(y=y_g, X[,-c(1,4)])
# rstanarm model
mm <- stan_glm(y~x1+x2+x1*x2, data=df, family = Gamma(link="log"), iter=5000)
#results
> print(mm, digits=2)
stan_glm(formula = y ~ x1 + x2 + x1 * x2, family = Gamma(link = "log"),
data = df, iter = 5000)
Estimates:
Median MAD_SD
(Intercept) -0.17 0.60
x1 3.03 1.06
x2 -2.86 0.94
x1:x2 3.37 1.94
shape 0.05 0.00
these estimates are quite far away from the truth.
When I fit this model in pure stan:
model_string <- "
data {
int N; //the number of observations
int K; //the number of columns in the model matrix
real y[N]; //the response
matrix[N,K] X; //the model matrix
}
parameters {
vector[K] betas; //the regression parameters
real<lower=0> phi; //the variance parameter
}
transformed parameters {
vector[N] mu; //the expected values (linear predictor)
vector[N] alpha; //shape parameter for the gamma distribution
vector[N] beta; //rate parameter for the gamma distribution
mu <- exp(X*betas); //using the log link
alpha <- mu .* mu / phi;
beta <- mu / phi;
}
model {
betas[1] ~ cauchy(0,10); //prior for the intercept following Gelman 2008
for(i in 2:K)
betas[i] ~ cauchy(0,2.5);//prior for the slopes following Gelman 2008
y ~ gamma(alpha,beta);
}
generated quantities {
vector[N] y_rep;
for(n in 1:N){
y_rep[n] <- gamma_rng(alpha[n],beta[n]); //posterior draws to get posterior predictive checks
}
}
"
m2 <- stan(model_code = model_string, data = list(N=N,K=K,X=X,y=y_g),pars=c("betas","phi","y_rep"),
chains = 3, cores=3)
here the estimates are better (except for the interaction term):
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
betas[1] 0.06 0.01 0.16 -0.22 -0.05 0.06 0.17 0.39 931 1
betas[2] 0.82 0.00 0.10 0.63 0.75 0.82 0.89 1.03 1390 1
betas[3] -0.68 0.00 0.10 -0.87 -0.74 -0.68 -0.61 -0.48 1165 1
betas[4] -0.45 0.01 0.18 -0.80 -0.57 -0.45 -0.33 -0.11 1249 1
What am I doing wrong?
Best,
Christoph