Error terms - lme4 vs rstanarm

Paul Scutti

unread,

Oct 16, 2016, 12:59:48 PM10/16/16

to Stan users mailing list

Hi Stan users,

I am running a basic random intercept model both in lme4 and rstanarm. While I am getting identical results for the fixed effects, the same cannot be said for the random effects. The standard deviation for the residual is ok (0.74 for both), but for the intercept I get 0.31 in lme4 and more than double (0.63) in rstanarm. Any thoughts?

Below you will find the summaries and sessionInfo, while I am also attaching the workspace should you need further information.

Cheers,

Paul

mod_lmer_1

Linear mixed model fit by REML ['lmerMod']

Formula: DV1 ~ VarF1 + VarF2 + VarF3 + VarF4 + VarF5 + VarF6 + VarF7 +

VarF8 + VarF9 + VarT1 + VarT2 + VarT3 + (1 | Level1)

Data: mydata

REML criterion at convergence: 7546.922

Random effects:

Groups Name Std.Dev.

Level1 (Intercept) 0.3072

Residual 0.7386

Number of obs: 3346, groups: Level1, 2

Fixed Effects:

(Intercept) VarF1 VarF2 VarF3 VarF4 VarF5 VarF6

-0.031232 0.036068 0.100994 0.050455 0.029563 -0.038123 -0.029646

VarF7 VarF8 VarF9 VarT1 VarT2 VarT3

-0.006044 0.003626 -0.024349 0.243189 0.271287 0.070910

mod_rstanarm_1

stan_glmer(formula = DV1 ~ VarF1 + VarF2 + VarF3 + VarF4 + VarF5 +

VarF6 + VarF7 + VarF8 + VarF9 + VarT1 + VarT2 + VarT3 + (1 |

Level1), data = mydata, cores = 2, prior = priorC, prior_intercept = priorI,

prior_covariance = decov(regularization = 2), adapt_delta = 0.9999)

Estimates:

Median MAD_SD

(Intercept) 0.0 0.3

VarF1 0.0 0.0

VarF2 0.1 0.0

VarF3 0.1 0.0

VarF4 0.0 0.0

VarF5 0.0 0.0

VarF6 0.0 0.0

VarF7 0.0 0.0

VarF8 0.0 0.0

VarF9 0.0 0.0

VarT1 0.2 0.0

VarT2 0.3 0.0

VarT3 0.1 0.0

sigma 0.7 0.0

Error terms:

Groups Name Std.Dev.

Level1 (Intercept) 0.62

Residual 0.74

Num. levels: Level1 2

Sample avg. posterior predictive

distribution of y (X = xbar):

Median MAD_SD

mean_PPD 0.0 0.0

Observations: 3346 Number of unconstrained parameters: 18

sessionInfo()

R version 3.3.1 (2016-06-21)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

locale:

[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252

[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C

[5] LC_TIME=English_United Kingdom.1252

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] rstan_2.12.1 StanHeaders_2.12.0 ggplot2_2.1.0 arm_1.9-1 lme4_1.1-12

[6] Matrix_1.2-6 MASS_7.3-45 rstanarm_2.12.1 Rcpp_0.12.7 shiny_0.14

loaded via a namespace (and not attached):

[1] gtools_3.5.0 zoo_1.7-13 shinyjs_0.7 reshape2_1.4.1 splines_3.3.1

[6] lattice_0.20-33 colorspace_1.2-6 miniUI_0.1.1 htmltools_0.3.5 stats4_3.3.1

[11] loo_0.1.6 base64enc_0.1-3 nloptr_1.0.4 matrixStats_0.50.2 plyr_1.8.4

[16] stringr_1.1.0 munsell_0.4.3 gtable_0.2.0 htmlwidgets_0.7 threejs_0.2.2

[21] codetools_0.2-14 coda_0.18-1 colourpicker_0.2 inline_0.3.14 httpuv_1.3.3

[26] parallel_3.3.1 markdown_0.7.7 xts_0.9-7 xtable_1.8-2 scales_0.4.0

[31] DT_0.2 shinystan_2.2.1 jsonlite_1.1 abind_1.4-5 mime_0.5

[36] gridExtra_2.2.1 digest_0.6.10 stringi_1.1.1 grid_3.3.1 tools_3.3.1

[41] magrittr_1.5 shinythemes_1.1 rsconnect_0.4.3 dygraphs_1.1.1.2 rstudioapi_0.6

[46] minqa_1.2.4 R6_2.1.3 nlme_3.1-128

ErrorTerms_161016.RData

Ben Goodrich

unread,

Oct 16, 2016, 1:15:27 PM10/16/16

to Stan users mailing list

On Sunday, October 16, 2016 at 12:59:48 PM UTC-4, Paul Scutti wrote:

I am running a basic random intercept model both in lme4 and rstanarm. While I am getting identical results for the fixed effects, the same cannot be said for the random effects. The standard deviation for the residual is ok (0.74 for both), but for the intercept I get 0.31 in lme4 and more than double (0.63) in rstanarm. Any thoughts?

Prior on the standard deviation, only having two groups with which to estimate that standard deviation, mode of likelihood vs. mean of posterior, the irrelevance of comparing point estimates, etc.

Ben

P.S. Setting regularization = 2 does not do anything when there is only one hyperparameter to estimate.

Jonah Gabry

unread,

Oct 16, 2016, 6:10:32 PM10/16/16

to Stan users mailing list

Hi Paul,

The estimates from rstanarm are not intended to always mirror those from lme4. lme4 is doing a type of maximum likelihood estimation giving you point estimates at the mode of the likelihood. rstanarm is doing full Bayesian inference, which means we're estimating the full joint posterior distribution of all parameters, which is a compromise between the likelihood and the prior, and then if we summarize the posterior we typically look at expectations, which can give you something quite different than the mode of the likelihood. So even before estimating anything we shouldn't necessarily expect to the get the same answers, especially if those answers are point estimates. For some model+data combinations there won't be much difference and for others there might be substantial differences. In most cases I would trust the estimates from Stan over those from lme4 when there are important differences (and not only for the estimates of the parameters themselves but also for prediction). Assuming of course that there were no red flags when running the MCMC.

If you do want to reproduce the lme4 standard deviation this should come closer

stan_glmer(

  DV1 ~ VarF1 + VarF2 + VarF3 + VarF4 + VarF5 +
    VarF6 + VarF7 + VarF8 + VarF9 + VarT1 + 
    VarT2 + VarT3 + (1 | Level1),

  data = mydata,
  QR = TRUE,
  prior = normal(0,1),
  prior_covariance = decov(scale = 0.25),
  prior_intercept = priorI,
  adapt_delta = 0.99,
  cores = 2
)

although I'm not sure whether or not it's the right thing to do in this particular case. Often with only a few group levels it's good to be skeptical of the lme4 estimate of the standard deviation of the "random effects". The uncertainty estimates are probably too small.

Hope that helps,

Jonah

P.S. See this recent thread for more on why Ben's P.S. is true

Paul Scutti

unread,

Oct 17, 2016, 11:45:38 AM10/17/16

to Stan users mailing list

Hi Jonah and Ben,

Thank you for taking the time to clarify some important points: it was really helpful. I re-ran the model with prior_covariance = decov(scale = 0.25) and the the standard deviation of the random intercept is now in line with what I was expecting.

In my random intercept model that regularisation=2 was a left-over of a previous model with random slopes, and I was not expecting it to play any role in the new one. So Jonah thank you for pointing out that prior_covariance can indeed be applied to a 1x1 covariance matrix scenario.

Best regards,

Paul

Bruno Nicenboim

unread,

Oct 17, 2016, 1:52:50 PM10/17/16

to Stan users mailing list

Hi Paul,

you might be interested in having a look at

http://www.ling.uni-potsdam.de/~vasishth/pdfs/BatesEtAlParsimonious2015.pdf

They show that "even under convergence, overparameterization may lead to uninterpretable models" in lme4.

Bests,

Bruno

Andrew Gelman

unread,

Oct 17, 2016, 4:21:36 PM10/17/16

to stan-...@googlegroups.com

Yes, more and more I've been convinced of the effectiveness of informative priors in these situations.

A

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Scutti

unread,

Oct 19, 2016, 9:22:34 AM10/19/16

to Stan users mailing list

Hi Bruno,

Thank you for the link. I read the paper today, installed the RePsychLing package to test it out, and promptly got some interesting results out of my analysis.

I work in market research where multilevel models are basically unheard of but, given the nature of our data, I actually believe we should use them all the time. The paper you referred me to and this mailing list in general go a long way in bridging the gap between research and everyday application.

Regards,

Paul

Ben Goodrich

unread,

Oct 19, 2016, 11:15:54 AM10/19/16

to Stan users mailing list

On Wednesday, October 19, 2016 at 9:22:34 AM UTC-4, Paul Scutti wrote:

Thank you for the link. I read the paper today, installed the RePsychLing package to test it out, and promptly got some interesting results out of my analysis.
I work in market research where multilevel models are basically unheard of but, given the nature of our data, I actually believe we should use them all the time. The paper you referred me to and this mailing list in general go a long way in bridging the gap between research and everyday application.

The RePsychLing package was started on GitHub before rstanarm was on CRAN. I don't think RePsychLing does anything that rstanarm can't do and rstanarm can do a lot of additional things and is more convenient. This (pre-Stan) Bayesian book

http://www.perossi.org/home/bsm-1

has a lot of marketing applications and data, although you would have to use the Stan language to implement most of those case studies.

Ben

Paul Scutti

unread,

Oct 19, 2016, 11:28:24 AM10/19/16

to Stan users mailing list

Hi Ben,

Yes, I found RePsychLing useful in the context of lme4 only. Generally speaking though rstanarm seems to me more intuitive and much better at explaining your data and your models.

Thank your for the link to Rossi's website.

Best,

Paul

Bob Carpenter

unread,

Oct 19, 2016, 11:31:26 AM10/19/16

to stan-...@googlegroups.com

It'd be great if someone wanted to translate all of
the models in that book to Stan; we already have an
issue for it:

https://github.com/stan-dev/example-models/issues/65

We're happy to help with feedback on Stan code. Or
we're happy to take case studies if someone just wants
to do one of the models in more depth.

- Bob

Reply all

Reply to author

Forward