Stan (MCMC) with vague prior gives different results compared to MLE or "bayesglm" ?

Burak Gunhan

unread,

Jan 16, 2017, 8:52:07 AM1/16/17

to Stan users mailing list

Dear Stan experts,

I am trying to implement a GLM model (binomial likelihood) with logit link using Stan when there is so-called "sparse data bias" as defined by Greenland et al. (2016) (http://www.bmj.com/content/352/bmj.i1981) or seperation problems as discussed in Gelman et al. (2008)(http://www.stat.columbia.edu/~gelman/research/published/priors11.pdf).

Dataset is coming from randomized controlled trial and as follows (the original source is: https://www.ncbi.nlm.nih.gov/pubmed/9920948):

# dataset
# responders sampleSize treatment
# 1               30               control
# 23              59               experimental

The observed odds ratio is 23 * 29 / 36 = 18.5, so log OD: 2.92.

Firstly, I fitted the model using MLE (via glm function), I get the estimate 2.92 and standard deviation 1.05.

Secondly, I used "bayesglm" function from "arm" package with normal prior with mean zero and variance 10000, then the estimate is 2.91 (stdev 1.05).

Thirdly, I used "stan_glm" function from "rstanarm" package with normal prior with mean zero and variance 10000, BUT then the posterior mean estimate is 3.54 (stdev 1.37). As you notice it is substantially different from previous values. It can't be "rstanarm" issue, since I have tried using directly Stan which gives same result. Also I tried JAGS which again gave very close result to 3.54 (stdev 1.37)). So maybe it is a MCMC issue.

This is strange for me. I know that I need to use "penalization" for MLE or "weakly informative prior" such as cauchy(0, 2.5), but this is not issue in this case. Somehow MCMC overestimates more in this specific dataset compares to MLE. When I use the Firth correction, estimates are 2.54 (with stdev 0.9) or when I use bayesglm with cauchy(0, 2.5) then the result is 2.55 (stdev 0.86). BUT when I use Stan with cauchy(0, 2.5), then the estimate is 2.93 (stdev. 1.1). Hence, MCMC still overestimates the treatment effect.

My question is "am I missing something trivial?" OR "what would be the possible explanation so that MCMC results differ from "arm" and MLE results?

I attached the R code which produce these results.

Thank you very much for your help,

Burak

My sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics grDevices utils     datasets methods   base

other attached packages:
[1] logistf_1.22    arm_1.9-3       lme4_1.1-12     Matrix_1.2-7.1 MASS_7.3-45     rstanarm_2.13.1 Rcpp_0.12.8

loaded via a namespace (and not attached):
[1] lattice_0.20-34    zoo_1.7-14         gtools_3.5.0       assertthat_0.1     digest_0.6.11      mime_0.5
[7] R6_2.2.0           plyr_1.8.4         stats4_3.3.2       coda_0.19-1        ggplot2_2.2.1      colourpicker_0.3
[13] lazyeval_0.2.0     minqa_1.2.4        miniUI_0.1.1       nloptr_1.0.4       rpart_4.1-10       DT_0.2
[19] shinythemes_1.1.1 splines_3.3.2      shinyjs_0.9        stringr_1.1.0      htmlwidgets_0.8    loo_1.0.0
[25] munsell_0.4.3      shiny_0.14.2       httpuv_1.3.3       rstan_2.14.1       base64enc_0.1-3    mgcv_1.8-16
[31] rstantools_1.1.0   htmltools_0.3.5    nnet_7.3-12        tibble_1.2         gridExtra_2.2.1    threejs_0.2.2
[37] codetools_0.2-15   matrixStats_0.51.0 dplyr_0.5.0        grid_3.3.2         nlme_3.1-128       jsonlite_1.2
[43] xtable_1.8-2       gtable_0.2.0       DBI_0.5-1          magrittr_1.5       StanHeaders_2.14.0 scales_0.4.1
[49] stringi_1.1.2      reshape2_1.4.2     mice_2.25          dygraphs_1.1.1.4   xts_0.9-7          tools_3.3.2
[55] shinystan_2.2.1    markdown_0.7.7     survival_2.40-1    rsconnect_0.7      abind_1.4-5        parallel_3.3.2
[61] inline_0.3.14      colorspace_1.3-2   bayesplot_1.1.0

MCMC_vs_MLE.R

kent

unread,

Jan 16, 2017, 9:47:43 AM1/16/17

to stan-...@googlegroups.com

It looks to me as if the problem is with your prior. If run your code, but use the default priors in stan_glm() I get

Estimates:

mean sd 2.5% 25% 50% 75% 97.5%

(Intercept) -3.290 0.900 -5.320 -3.833 -3.191 -2.626 -1.811

treatment 2.804 0.922 1.268 2.145 2.709 3.370 4.835

mean_PPD 0.270 0.061 0.157 0.225 0.270 0.315 0.393

log-posterior -45.459 1.023 -48.282 -45.847 -45.139 -44.729 -44.459

Diagnostics:

mcse Rhat n_eff

(Intercept) 0.031 1.005 834

treatment 0.030 1.003 918

mean_PPD 0.001 1.000 2419

log-posterior 0.031 1.002 1065

A normal prior with a scale of 100 puts virtually all of the weight in the prior at 0 or 1.

Kent

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Burak Gunhan

unread,

Jan 16, 2017, 10:00:47 AM1/16/17

to Stan users mailing list

Thank you for your reply.

I am aware that there is "data sparsity" problem, so I need to use weakly informative priors. For example as you suggested, the default priors of "stan_glm" is:

------
Intercept
~ normal(location = 0, scale = 10)
Coefficients
~ normal(location = 0, scale = 2.5)

But, when I fit the same model using these weakly informative priors via "arm::bayesglm", then the the coefficient estimate is 2.53 with standard deviation 0.85. As you can see, MCMC still gives different result.

My question is "why results of MCMC with flat priors are different than results of MLE" and " why MCMC with more informative priors are different than arm::bayesglm with more informative priors."

Burak

Ben Goodrich

unread,

Jan 16, 2017, 10:30:22 AM1/16/17

to Stan users mailing list

On Monday, January 16, 2017 at 10:00:47 AM UTC-5, Burak Gunhan wrote:

But, when I fit the same model using these weakly informative priors via "arm::bayesglm", then the the coefficient estimate is 2.53 with standard deviation 0.85. As you can see, MCMC still gives different result.

My question is "why results of MCMC with flat priors are different than results of MLE" and " why MCMC with more informative priors are different than arm::bayesglm with more informative priors."

The arm::bayesglm function is essentially returning a posterior mode; the rstanarm::stan_glm function is returning a posterior median or you could calculate the posterior mean, both of which are preferable to a mode. If the posterior is skewed, you could get results like this.

Ben

Burak Gunhan

unread,

Jan 16, 2017, 10:56:59 AM1/16/17

to Stan users mailing list

Thank you fro your reply.
I understand that maybe not a good idea to compare arm::bayesglm' results with rstanarm::stan_glm's results. And exactly, the posterior is skewed.

stan_glm with flat priors gives results of posterior median 3.26 with 95 % CI (1.43, 6.48) ( while MLE is 2.92).

OK, now my question is follows:
Is it normal to see that MCMC with flat prios may differ from MLE results when posterior is skewed?

And, normally I think I should trust MCMC results more than MLE results, but in this application, don't you think that MCMC causes more biased results compared to MLE? because MCMC give median estimate for odds ratio 26 , while MLE gives 18 and Firth correction gives 12? SO MCMC more overestimate the treatment effect?

Thank you very much for your answer,

Burak

Ben Goodrich

unread,

Jan 16, 2017, 1:23:36 PM1/16/17

to Stan users mailing list

On Monday, January 16, 2017 at 10:56:59 AM UTC-5, Burak Gunhan wrote:

OK, now my question is follows:
Is it normal to see that MCMC with flat prios may differ from MLE results when posterior is skewed?

Yes. But it is more specific than "MCMC with flat priors". It is tautological to say that the posterior mean or median will differ from the mode of the likelihood function when the posterior distribution is skewed. I would also expect the posterior to often be skewed in applications like this.

And, normally I think I should trust MCMC results more than MLE results, but in this application, don't you think that MCMC causes more biased results compared to MLE? because MCMC give median estimate for odds ratio 26 , while MLE gives 18 and Firth correction gives 12? SO MCMC more overestimate the treatment effect?

If the posterior distribution is skewed, then the asymptotic assumptions of MLEs are not holding because both the posterior distribution and the sampling distribution are multivariate normal under the same asymptotic conditions.

Ben

Bob Carpenter

unread,

Jan 16, 2017, 4:35:20 PM1/16/17

to stan-...@googlegroups.com

In hierarchical models and mixture models, you often don't even have an MLE.
Or it may degenerate to a boundary. In many of these cases, the posterior
mean will be well defined.

As Ben says, if the posterior is skewed, the mode won't equal the mean.
The posterior mean is the point estimate minimizing expected squared error
vs. the true estimate; the posterior median minimizes expected absolute
error. The posterior mode has no interpretation and depends on parameterization.

If you're doing MLE, you can think of the prior as a penalty term. Or
you can think in terms of calculating the posterior mode of the Bayesian
model (which doesn't really have an interpretation as it depends on
the parameterization, unlike the posterior mean).

This might help:

http://mc-stan.org/documentation/case-studies/mle-params.html

- Bob

Burak Gunhan

unread,

Jan 19, 2017, 6:38:39 AM1/19/17

to Stan users mailing list

After having read your messages several times and the part of Stan manual which explains MLE and Bayesian mode estimation etc., I think I understand the point.

Thank you very much, and I really appreciate, because you are not only answering Stan-related questions but also MCMC, MLE, statistical computing type of questions.

Best regards,

Burak

Bob Carpenter

unread,

Jan 19, 2017, 2:59:26 PM1/19/17

to stan-...@googlegroups.com

> On Jan 19, 2017, at 6:38 AM, Burak Gunhan <burak...@gmail.com> wrote:
>
> After having read your messages several times and the part of Stan manual which explains MLE and Bayesian mode estimation etc., I think I understand the point.
>
> Thank you very much, and I really appreciate, because you are not only answering Stan-related questions but also MCMC, MLE, statistical computing type of questions.

You're welcome.

It's all part of the bigger message. The textbooks and
literature out there, especially in machine learning,
often conflate the model and the algorithms to fit it.
The same model behaves differently with MLE than with MCMC
because one searches for posterior mode and the other to sample
(which allows you to calculate the posterior mean and median,
but not the mode).

Andrew, Vince and Sophia wrote a paper on how to avoid
degenerate maximum likelihood estimates with appropriate
priors/penalty functions:

Official:
http://biostats.bepress.com/ucbbiostat/paper284/

Preprint:
http://www.stat.columbia.edu/~gelman/bayescomputation/bayescomputation/Chung2011.pdf

- Bob

Andrew Gelman

unread,

Jan 19, 2017, 4:08:02 PM1/19/17

to stan-...@googlegroups.com

We wrote 2 papers! One for the univariate and one for the multivariate case:
http://www.stat.columbia.edu/~gelman/research/published/chung_cov_matrices.pdf
http://www.stat.columbia.edu/~gelman/research/published/chung_etal_Pmetrika2013.pdf
Authors for both papers are Yeojin Chung, Sophia Rabe-Hesketh, Andrew Gelman, Jingchen Liu, and Vincent Dorie.

Bob Carpenter

unread,

Jan 20, 2017, 1:45:42 PM1/20/17

to stan-...@googlegroups.com

I forgot the main author, Yeojin Chung --- must've been
done before I got to Columbia.

- Bob

Reply all

Reply to author

Forward