choice of priors in stan

Linas Mockus

unread,

Jun 30, 2015, 12:28:32 PM6/30/15

to stan-...@googlegroups.com

Hi,

What are recommended choice of priors in stan? I always struggle with variance - should it be half-cauchy or uniform if I know that variance is different from 0? I gathered that for very small variances it is better to use half-cauchy.

What about neg_binomial_2 with hierarchical scale parameter phi: neg_binomial_2(mu,phi[i])? I am always confused if phi~gamma(loc_phi,scale_phi) or phi~lognormal(loc_phi,scale_phi). If I use lognormal I should use normal prior for loc_phi and half-cauchy (or uniform) for scale_phi? It would be great having a cookbook with recommendations on which priors for parameters/hyper-parameters work or don't work in stan since we don't have to worry about conjugateness.

Thank you,
Linas

Bob Carpenter

unread,

Jun 30, 2015, 2:46:19 PM6/30/15

to stan-...@googlegroups.com

There's a discussion in the regression chapter of the manual.
The recommendations aren't really specific to Stan, either.

We don't discuss negative binomial --- Andrew may be able to help
here.

I believe Daniel created a Wiki page where Andrew was going to
write responses to questions about priors. But I'm not even sure where
it's at --- I don't see any public repos on:

https://github.com/andrewgelman?tab=repositories

Let's see if this inspires the first entry.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Andrew Gelman

unread,

Jun 30, 2015, 4:28:18 PM6/30/15

to stan-...@googlegroups.com

Lately we’ve been using N+(0,1) priors for sd parameters (N+ indicates normal constrained to be positive). This is a bit stronger than Cauchy and assumes the model has been scaled so that parameters should rarely be more than 1.

On Jun 30, 2015, at 12:28 PM, Linas Mockus <linasm...@gmail.com> wrote:

Linas Mockus

unread,

Jun 30, 2015, 6:44:46 PM6/30/15

to stan-...@googlegroups.com

Dumb question - how to scale the model?

Thank you,
Linas

Andrew Gelman

unread,

Jun 30, 2015, 7:10:06 PM6/30/15

to stan-...@googlegroups.com

It depends on the context, it just means you need to understand the parameters well enough to get a sense of how large they can be.

Richard McElreath

unread,

Jun 30, 2015, 9:29:41 PM6/30/15

to stan-...@googlegroups.com

I've found half-Cauchy fine in many cases, but with non-linear models, it is easy for a variance parameter to be weakly identified, due to ceiling/floor effects. In those cases, something like an exponential prior or half-Guassian works much better, because both have much thinner tails than the Cauchy.

I think Daniel Simpson likes exponential priors for scale parameters, based upon a subtle argument having to do with constant penalty with distance from a base model with variance = zero. http://arxiv.org/abs/1403.4630

Andrew Gelman

unread,

Jun 30, 2015, 9:47:36 PM6/30/15

to stan-...@googlegroups.com

I’ve been doing half-normal but I could be persuaded to switch to exponential.

Linas Mockus

unread,

Jul 1, 2015, 8:24:20 AM7/1/15

to stan-...@googlegroups.com

If I understand correctly, the best practice for sd prior:

- half-cauchy woks fine for sd

- if we know that most of sd < 2 then we scale the model:

real <lower=0> sd;

y~normal(mu,2*sd);

sd~normal(0,1)

- use exponential

- what if we know that sd is concentrated around 1? Then half-cauchy may not be most efficient. I don't know if it is true or not but I found sd~uniform(0,2) to be more efficient than half_cauchy.

- what prior should be used for neg_binomial_2 scale parameter? Is it gamma or lognormal or something else? What should be the hyperpriors?

Thank you,
Linas

--

Thank you,
Linas

Andrew Gelman

unread,

Jul 3, 2015, 4:02:34 PM7/3/15

to stan-...@googlegroups.com

Neg_binomial_2 scale parameter . . . hmmm . . . Let me just clarify that we’re talking about the same thing. neg_binomial has two parameters, alpha and beta, where alpha is the shape parameter and 1/beta is the scale parameter.

neg_binomial_2 has 2 parameters, mu and phi. mu = alpha/beta and phi = alpha. At least, I think I’m getting the algebra right. I suppose we could call mu a scale parameter here. But I think section 39.2 of the manual is wrong: there it says that phi is a precision parameters and 1/phi is represents overdispersion, and I think neither of these statements are correct. I think (if I’m not getting confused) that phi in the neg_binomial_2 distribution is simply alpha, the shape parameter.

The overdispersion (as usually defined) is 1 + 1/beta, hence is always at least 1.

I think this all might be explaining some of the confusion we’ve been having lately, also suggests we should have negative_binomial_3 that is given in terms of mu and the overdispersion.

A

Linas Mockus

unread,

Jul 3, 2015, 7:24:48 PM7/3/15

to stan-...@googlegroups.com

I always thought that the first parameter of neg_binomial_2 represents mean. Since mean in GLM context is exp(beta'x) where beta is a parameter and x is independent variable, I never intended to put a prior on mu. However I wanted to put a hierarchical prior on the second parameter, phi. My expectation was that variance is expressed as mu+mu^2/phi so by giving a hierarchical prior on phi I implicitly attributed different variability to each sample (in my case particle count). So what is correct?

Anyway, what hierarchical prior (lognormal/gamma/etc) should phi have (i.e. phi~lognormal(loc,scale) as well as what hyper priors (i.e. loc~normal(0,5) and scale~half-cauchy(0,2)) should be used?

Thank you,
Linas

--

Thank you,
Linas

Bob Carpenter

unread,

Jul 4, 2015, 3:35:30 PM7/4/15

to stan-...@googlegroups.com

Negative binomial is implicitly confusing given that there
are four standard variants in play in the stats literature.

neg_binomial(y | alpha, beta) is exactly as defined in BDA,
with shape alpha and inverse scale beta, so I'm assuming that
one's OK.

I added an issue for neg_binomial_2():

https://github.com/stan-dev/stan/pull/1523#issuecomment-118544690

Any suggestions on what to call the neg_binomial_2 parameters and
what Greek letters to use for them?

Note there's also neg_binomial_2_log, which takes a parameter on
the log scale.

- Bob

Andrew Gelman

unread,

Jul 4, 2015, 5:25:28 PM7/4/15

to stan-...@googlegroups.com

Hi, I’m not sure. If you set phi < infinity you have overdispersion. The lower phi, the more overdispersion. It’s hard for me to get a handle on this in the abstract; I think it depends on context.

A

Michael Betancourt

unread,

Jul 4, 2015, 5:36:30 PM7/4/15

to stan-...@googlegroups.com

Just think about the variance — the over dispersion is

significant when mu^{2} / phi is on the same order of mu,

so

mu^{2} / phi ~ mu

mu / phi ~ 1

mu ~ phi

Consequently the absolute amount of over dispersion

depends on the mean so you have to think about your prior

in terms of relatively over dispersion.

Michael Betancourt

unread,

Jul 4, 2015, 5:38:48 PM7/4/15

to stan-...@googlegroups.com

This book, http://www.amazon.com/Negative-Binomial-Regression-Joseph-Hilbe/dp/0521198151,
has something like 20 parameterizations. Don’t ever trust someone when they say “negative binomial”
without providing a formula.

As far as I’ve seen the mu, phi names as used in the manual are pretty standard for the negative binomial
GLM parameterization.

Luc Coffeng

unread,

Jul 5, 2015, 2:08:21 AM7/5/15

to stan-...@googlegroups.com

In infectious disease epidemiology, the NB2 parameters are referred to as mean \mu and aggregration k (amount of clumping of infectious particles within hosts :) ).

Reply all

Reply to author

Forward