choice of priors in stan

1,541 views
Skip to first unread message

Linas Mockus

unread,
Jun 30, 2015, 12:28:32 PM6/30/15
to stan-...@googlegroups.com
Hi,

What are recommended choice of priors in stan?  I always struggle with variance - should it be half-cauchy or uniform if I know that variance is different from 0?  I gathered that for very small variances it is better to use half-cauchy.

What about neg_binomial_2 with hierarchical scale parameter phi: neg_binomial_2(mu,phi[i])?  I am always confused if phi~gamma(loc_phi,scale_phi) or phi~lognormal(loc_phi,scale_phi).  If I use lognormal I should use  normal prior for loc_phi and half-cauchy (or uniform) for scale_phi?  It would be great having a cookbook with  recommendations on which priors for parameters/hyper-parameters work or don't work in stan since we don't have to worry about conjugateness.

Thank you,
Linas

Bob Carpenter

unread,
Jun 30, 2015, 2:46:19 PM6/30/15
to stan-...@googlegroups.com
There's a discussion in the regression chapter of the manual.
The recommendations aren't really specific to Stan, either.


We don't discuss negative binomial --- Andrew may be able to help
here.

I believe Daniel created a Wiki page where Andrew was going to
write responses to questions about priors. But I'm not even sure where
it's at --- I don't see any public repos on:

https://github.com/andrewgelman?tab=repositories

Let's see if this inspires the first entry.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Andrew Gelman

unread,
Jun 30, 2015, 4:28:18 PM6/30/15
to stan-...@googlegroups.com
Lately we’ve been using N+(0,1) priors for sd parameters (N+ indicates normal constrained to be positive).  This is a bit stronger than Cauchy and assumes the model has been scaled so that parameters should rarely be more than 1.

On Jun 30, 2015, at 12:28 PM, Linas Mockus <linasm...@gmail.com> wrote:

Linas Mockus

unread,
Jun 30, 2015, 6:44:46 PM6/30/15
to stan-...@googlegroups.com
Dumb question - how to scale the model? 

Thank you,
Linas

Andrew Gelman

unread,
Jun 30, 2015, 7:10:06 PM6/30/15
to stan-...@googlegroups.com
It depends on the context, it just means you need to understand the parameters well enough to get a sense of how large they can be.

Richard McElreath

unread,
Jun 30, 2015, 9:29:41 PM6/30/15
to stan-...@googlegroups.com
I've found half-Cauchy fine in many cases, but with non-linear models, it is easy for a variance parameter to be weakly identified, due to ceiling/floor effects. In those cases, something like an exponential prior or half-Guassian works much better, because both have much thinner tails than the Cauchy.

I think Daniel Simpson likes exponential priors for scale parameters, based upon a subtle argument having to do with constant penalty with distance from a base model with variance = zero. http://arxiv.org/abs/1403.4630

Andrew Gelman

unread,
Jun 30, 2015, 9:47:36 PM6/30/15
to stan-...@googlegroups.com
I’ve been doing half-normal but I could be persuaded to switch to exponential.

Linas Mockus

unread,
Jul 1, 2015, 8:24:20 AM7/1/15
to stan-...@googlegroups.com
If I understand correctly, the best practice for sd prior:
- half-cauchy woks fine for sd
- if we know that most of sd < 2 then we scale the model:
real <lower=0> sd;
y~normal(mu,2*sd);
sd~normal(0,1)
- use exponential
- what if we know that sd is concentrated around 1?  Then half-cauchy may not be most efficient.  I don't know if it is true or not but I found sd~uniform(0,2) to be more efficient than half_cauchy.
- what prior should be used for neg_binomial_2 scale parameter? Is it gamma or lognormal or something else?  What should be the hyperpriors?

Thank you,
Linas
--
Thank you,
Linas

Andrew Gelman

unread,
Jul 3, 2015, 4:02:34 PM7/3/15
to stan-...@googlegroups.com
Neg_binomial_2 scale parameter . . . hmmm . . . Let me just clarify that we’re talking about the same thing.  neg_binomial has two parameters, alpha and beta, where alpha is the shape parameter and 1/beta is the scale parameter.

neg_binomial_2 has 2 parameters, mu and phi.  mu = alpha/beta and phi = alpha.  At least, I think I’m getting the algebra right.  I suppose we could call mu a scale parameter here.  But I think section 39.2 of the manual is wrong:  there it says that phi is a precision parameters and 1/phi is represents overdispersion, and I think neither of these statements are correct.  I think (if I’m not getting confused) that phi in the neg_binomial_2 distribution is simply alpha, the shape parameter.

The overdispersion (as usually defined) is 1 + 1/beta, hence is always at least 1.

I think this all might be explaining some of the confusion we’ve been having lately, also suggests we should have negative_binomial_3 that is given in terms of mu and the overdispersion.

A

Linas Mockus

unread,
Jul 3, 2015, 7:24:48 PM7/3/15
to stan-...@googlegroups.com
I always thought that the first parameter of neg_binomial_2 represents mean.  Since mean in GLM context is exp(beta'x) where beta is a parameter and x is independent variable, I never intended to put a prior on mu.  However I wanted to put a hierarchical prior on the second parameter, phi.  My expectation was that variance is expressed as mu+mu^2/phi so by giving a hierarchical prior on phi I implicitly attributed different variability to each sample (in my case particle count).  So what is correct?  

Anyway, what hierarchical prior (lognormal/gamma/etc) should phi have (i.e. phi~lognormal(loc,scale) as well as what hyper priors (i.e. loc~normal(0,5) and scale~half-cauchy(0,2)) should be used?   

Thank you,
Linas
--
Thank you,
Linas

Bob Carpenter

unread,
Jul 4, 2015, 3:35:30 PM7/4/15
to stan-...@googlegroups.com
Negative binomial is implicitly confusing given that there
are four standard variants in play in the stats literature.

neg_binomial(y | alpha, beta) is exactly as defined in BDA,
with shape alpha and inverse scale beta, so I'm assuming that
one's OK.

I added an issue for neg_binomial_2():

https://github.com/stan-dev/stan/pull/1523#issuecomment-118544690

Any suggestions on what to call the neg_binomial_2 parameters and
what Greek letters to use for them?

Note there's also neg_binomial_2_log, which takes a parameter on
the log scale.

- Bob

Andrew Gelman

unread,
Jul 4, 2015, 5:25:28 PM7/4/15
to stan-...@googlegroups.com
Hi, I’m not sure.  If you set phi < infinity you have overdispersion.  The lower phi, the more overdispersion.  It’s hard for me to get a handle on this in the abstract; I think it depends on context.
A

Michael Betancourt

unread,
Jul 4, 2015, 5:36:30 PM7/4/15
to stan-...@googlegroups.com
Just think about the variance — the over dispersion is
significant when mu^{2} / phi is on the same order of mu,
so

mu^{2} / phi ~ mu
mu / phi ~ 1
mu ~ phi

Consequently the absolute amount of over dispersion 
depends on the mean so you have to think about your prior 
in terms of relatively over dispersion. 

Michael Betancourt

unread,
Jul 4, 2015, 5:38:48 PM7/4/15
to stan-...@googlegroups.com
This book, http://www.amazon.com/Negative-Binomial-Regression-Joseph-Hilbe/dp/0521198151,
has something like 20 parameterizations. Don’t ever trust someone when they say “negative binomial”
without providing a formula.

As far as I’ve seen the mu, phi names as used in the manual are pretty standard for the negative binomial
GLM parameterization.

Luc Coffeng

unread,
Jul 5, 2015, 2:08:21 AM7/5/15
to stan-...@googlegroups.com
In infectious disease epidemiology, the NB2 parameters are referred to as mean \mu and aggregration k (amount of clumping of infectious particles within hosts :) ).
Reply all
Reply to author
Forward
0 new messages