Please let me clarify this a bit more. I'm basically trying to understand how to do model comparisons with Stan 1.1.1.
I coded up a toy example: predicting linear data with 3 (nested) models.
1. Linear regression. Y = beta0 + beta1 * X
2. 2nd order polynomial regression. Y = beta0 + beta1 * X + beta2 * X^2
3. 3rd order polynomial regression. Y = beta0 + beta1 * X + beta2 * X^2 + beta3 * X^3
This is data for this toy example (Toy_data.pdf): y = 10 + 0.1*x + noise w/ normal(mean=0, sd=1). N=100 data points.
I used 1000 burnin samples and 9000 samples after burnin w/ just 1 chain. I set the seed to a certain number(=1) for replication.
1. Linear regression. Y = intercept + slope*X. Here is an actual Stan code. The R code is also attached (toyExample_linear.R)
model {
# prior
beta0 ~ normal(0, 1.0E12); # normal prior w/ mean = 0, SD = 1.0E12
beta1 ~ normal(0, 1.0E12); # normal prior w/ mean = 0, SD = 1.0E12
sigma ~ uniform(0, 10); # uniform prior for sigma
for (i in 1:N) {
y[i] ~ normal( beta0 + beta1 * x[i], sigma); # Y_n = beta0 + beta1 * Xn + e_n where e_n ~Normal(0, sigma)
}
}
Here is its output. Lp__ is -40.4.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 10.1 0 0.2 9.8 10.0 10.1 10.2 10.5 2443 1
beta1 0.1 0 0.0 0.1 0.1 0.1 0.1 0.1 2520 1
sigma 0.9 0 0.1 0.8 0.9 0.9 1.0 1.1 2841 1
lp__ -40.4 0 1.3 -43.6 -41.0 -40.0 -39.5 -39.0 2417 1
2. This is the 2nd model. Y = intercept + slope*X + beta2*X^2 (toyExample_2poly.R)
model {
# prior
beta0 ~ normal(0, 1.0E12);
beta1 ~ normal(0, 1.0E12);
beta2 ~ normal(0, 1.0E12);
sigma ~ uniform(0, 10);
for (i in 1:N) {
y[i] ~ normal( beta0 + beta1 * x[i] + beta2 * pow(x[i], 2), sigma); # Y_n = beta0 + beta1 * Xn + beta2 * Xn^2 + e_n where e_n ~Normal(0, sigma)
}
}
Here is its output. Lp__ is -40.9.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 10.1 0 0.3 9.5 9.9 10.1 10.3 10.6 876 1
beta1 0.1 0 0.0 0.1 0.1 0.1 0.1 0.1 723 1
beta2 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 735 1
sigma 0.9 0 0.1 0.8 0.9 0.9 1.0 1.1 378 1
lp__ -40.9 0 1.4 -44.3 -41.7 -40.6 -39.8 -39.1 1063 1
3. This is the 3rd model. Y = intercept + slope*X + beta2 * X^2 + beta3 * X^3 (toyExample_3poly.R)
model {
# prior
beta0 ~ normal(0, 1.0E12);
beta1 ~ normal(0, 1.0E12);
beta2 ~ normal(0, 1.0E12);
beta3 ~ normal(0, 1.0E12);
sigma ~ uniform(0, 10);
for (i in 1:N) {
y[i] ~ normal( beta0 + beta1 * x[i] + beta2 * pow(x[i], 2) + beta3 * pow(x[i], 3), sigma);
}
}
Here is its output. Lp__ is -41.2.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 10.1 0.0 0.4 9.4 9.8 10.2 10.4 10.9 86 1
beta1 0.1 0.0 0.0 0.0 0.1 0.1 0.1 0.2 90 1
beta2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 91 1
beta3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 92 1
sigma 0.9 0.0 0.1 0.8 0.9 0.9 1.0 1.1 26 1
lp__ -41.2 0.1 1.5 -45.0 -42.0 -40.9 -40.1 -39.2 393 1
So,
1. Is it correct that lp__ value "penalizes" the complexity of the functions (how much?), but not necessarily the number of parameters?
2. Do you suggest using AIC (rather than BIC or other adjustment) to account for the number of parameters? I'm aware of various assumptions underlying AIC vs. BIC.
3. I noted that n_eff is smaller for more complex models (presumably because of correlations between parameters) and wonder if it can be related to lp__..
Stan is (really) awesome but I would like to use it to properly compare various models as well. I wish I could read the Chap 7 of the new BDA now but any other useful papers/chapters related to this topic would be appreciated. Thanks a lot for your help in advance.
I'm having trouble uploading small files, so let me upload a zip file containing all the codes/figure.