Here is a comparison of lmer vs Stan output on a mildly complicated dataset from a psychology expt. (Kliegl et al 2011). The data are attached.
Data and paper, along with lots of challenging datasets for Stan, are available from:
I should say that datasets from psychology and psycholinguistic can be much more complicated than this example. So this was only a modest test of Stan.
The basic result is that I was able to recover in Stan the parameter estimates (fixed effects) that were primarily of interest, compared to the lmer output. The sds of the variance components all come out pretty much the same in Stan vs lmer. The correlations estimated in Stan are much smaller than lmer, but this seems to be normal: bayesian models seem to be more conservative when it comes to estimating correlations between random effects (I know Andrew does not like this term, and also not fixed effects, but they are convenient).
Traceplots are attached. They look generally fine to me.
One very important fact about lmer vs Stan is that lmer took 23 seconds to return an answer, but Stan took 18,814 seconds (about 5 hours), running 500 iterations and 2 chains.
One caveat is that I do have to try to figure out how to speed up Stan so that we get the best performance out of it that is possible. But could Stan developers provide some fully worked examples of a mildly challenging dataset like this What's the absolute best that Stan can do with professional intervention, compared to lmer?
If the present analysis can be sped up a lot more, I have another dataset in the Stan-queue with 150 subjects and three contrasts, plus another random factor (items) that I will test Stan vs lmer on.
Details:
key:
beta[1] = Intercept
beta[2] = c1
beta[3] = c2
beta[4] = c3
1. Fixed effects, Stan vs lmer:
Stan:
Predictors:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 381.1 3.5 7.0 367.5 376.2 381.1 385.9 394.1 4 1.4
beta[2] 31.6 0.8 3.5 23.9 29.7 32.0 34.1 37.9 17 1.2
beta[3] 14.0 0.1 2.2 9.9 12.5 13.8 15.4 18.3 292 1.0
beta[4] 2.9 0.1 2.0 -1.3 1.5 3.0 4.2 7.0 251 1.0
lmer:
Estimate Std. Error t value
beta[1] 389.73 7.15 54.5
beta[2] 33.78 3.31 10.2
beta[3] 13.98 2.32 6.0
beta[4] 2.75 2.22 1.2
2. Random effects, Stan vs lmer:
## I created this by hand to match lmer output:
Stan: Var sd
# id (Int) 3152.9 56.2
# c1 531.6 23.1 0.51
# c2 89.3 9.5 -0.02 0.11
# c3 77.5 8.8 -0.1 -0.5 0.03
# Residual 4886 69.9
lmer: Var sd
# id (Int) 3098.3 55.66
# c1 550.8 23.47 0.6
# c2 121.0 11.00 -0.13 -0.01
# c3 93.2 9.65 -0.3 -0.9 0.4
# Residual 4877.1 69.84
## calculations of correlations from Stan output:
r12: 654.3 / (56.151 * 23.056) = 0.5054 = r12
r13: -8.4 / (56.151 * 9.4499) = -0.01583 = r13
r14: -46.5 / (56.151 * 8.8034) = -0.094069 = r14
r23: 24.7 / (23.056 * 9.4499) = 0.11337 = r23
r24: -95.7 / (23.056 * 8.8034) = -0.4715 = r24
r34: 2.4 / (9.4499 * 8.8034) = 0.028849 = r34
## original stan output:
Sigma[1,1] 3152.9 32.4 589.5 2197.9 2729.1 3090.3 3522.1 4432.8 331 1.0
Sigma[1,2] 654.3 8.5 180.7 367.8 522.5 628.3 767.5 1022.8 455 1.0
Sigma[1,3] -8.4 6.4 91.8 -203.6 -58.8 -4.5 49.7 161.7 207 1.0
Sigma[1,4] -46.5 7.4 91.1 -235.8 -101.9 -41.4 11.0 118.9 151 1.0
Sigma[2,1] 654.3 8.5 180.7 367.8 522.5 628.3 767.5 1022.8 455 1.0
Sigma[2,2] 531.6 6.4 109.0 357.4 455.4 517.5 599.2 780.8 289 1.0
Sigma[2,3] 24.7 2.9 37.3 -50.4 0.6 26.0 49.8 94.7 162 1.0
Sigma[2,4] -95.7 4.2 43.6 -186.6 -121.0 -91.4 -68.0 -16.6 109 1.0
Sigma[3,1] -8.4 6.4 91.8 -203.6 -58.8 -4.5 49.7 161.7 207 1.0
Sigma[3,2] 24.7 2.9 37.3 -50.4 0.6 26.0 49.8 94.7 162 1.0
Sigma[3,3] 89.3 11.6 53.1 11.5 50.0 85.7 120.6 207.4 21 1.1
Sigma[3,4] 2.4 2.8 21.2 -31.6 -11.0 -0.6 13.4 49.1 56 1.0
Sigma[4,1] -46.5 7.4 91.1 -235.8 -101.9 -41.4 11.0 118.9 151 1.0
Sigma[4,2] -95.7 4.2 43.6 -186.6 -121.0 -91.4 -68.0 -16.6 109 1.0
Sigma[4,3] 2.4 2.8 21.2 -31.6 -11.0 -0.6 13.4 49.1 56 1.0
Sigma[4,4] 77.5 9.7 44.5 11.2 45.5 68.6 102.4 175.3 21 1.1
#LMER fit: time taken: 23 seconds
# Groups Name Variance Std.Dev. Corr
# id (Intercept) 3098.3 55.66
# c1 550.8 23.47 0.603
# c2 121.0 11.00 -0.129 -0.014
# c3 93.2 9.65 -0.247 -0.846 0.376
# Residual 4877.1 69.84
#Number of obs: 28710, groups: id, 61
#
#Fixed effects:
# Estimate Std. Error t value
#(Intercept) 389.73 7.15 54.5
#c1 33.78 3.31 10.2
#c2 13.98 2.32 6.0
#c3 2.75 2.22 1.2
##Stan: time taken: 18,814 seconds (5.2261 hours)
> fit2 <- stan(model_code = klieglvaryingintslopes_codefast,
+ data = dat2,
+ iter = 500, chains = 2)
(CHAIN 1).
Iteration: 500 / 500 [100%] (Sampling)
Elapsed Time: 8255.26 seconds (Warm-up)
1650.07 seconds (Sampling)
9905.33 seconds (Total)
(CHAIN 2).
Iteration: 500 / 500 [100%] (Sampling)
Elapsed Time: 8062.87 seconds (Warm-up)
845.716 seconds (Sampling)
8908.59 seconds (Total)
> print(fit2)
Inference for Stan model: klieglvaryingintslopes_codefast.
2 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=500.
The model code:
klieglvaryingintslopes_codefast <- 'data {
int<lower=1> N;
real rt[N]; //outcome
real c1[N]; //predictor
real c2[N]; //predictor
real c3[N]; //predictor
int<lower=1> I; //number of subjects
int<lower=1, upper=I> id[N]; //subject id
vector[4] mu_prior; //vector of zeros passed in from R
}
parameters {
vector[4] beta; // intercept and slope
vector[4] u[I]; // random intercept and slopes
real<lower=0> sigma_e; // residual sd
vector<lower=0>[4] sigma_u; // subj sd
corr_matrix[4] Omega; // correlation matrix for random intercepts and slopes
}
transformed parameters {
matrix[4,4] D;
D <- diag_matrix(sigma_u);
}
model {
matrix[4,4] L;
matrix[4,4] DL;
real mu[N]; // mu for likelihood
//priors:
beta ~ normal(0,50);
sigma_e ~ cauchy(0,2);
sigma_u ~ cauchy(0,2);
Omega ~ lkj_corr(4.0);
L <- cholesky_decompose(Omega);
DL <- D * L;
for (i in 1:I) // loop for subj random effects
u[i] ~ multi_normal_cholesky(mu_prior, DL);
for (n in 1:N) {
mu[n] <- beta[1] + beta[2]*c1[n] + beta[3]*c2[n] + beta[4]*c3[n] + u[id[n], 1] + u[id[n], 2]*c1[n] + u[id[n], 3]*c2[n] + u[id[n], 4]*c3[n];
}
rt ~ normal(mu,sigma_e); // likelihood
}
generated quantities {
cov_matrix[4] Sigma;
Sigma <- D * Omega * D;
}
'
## data loading and lmer call:
load("KWDYZ_test.rda")
summary(m2 <- lmer(rt ~ 1 + c1 + c2 + c3 + (1 + c1 + c2 + c3 | id),data=dat))
--
Shravan Vasishth
Professor für Psycholinguistik und Neurolinguistik