Default priors vs. Uniformly distribution

1,492 views
Skip to first unread message

lizzy2016

unread,
Aug 15, 2016, 1:41:08 PM8/15/16
to Stan users mailing list

I was wondering if someone could help me understand why applied uniformly distributed priors vs. default priors outputting drastic different results. Aren't default priors actually uniform distributions? Basically, i ran a linear regression via a simple Stan code as below. If commenting out the following two priors
beta ~ uniform(-5000, 5000); 
sigma ~ uniform(0, 100);

The result is similar as lm, as expected because MLE is equivalent to Bayesian + Uniformly distributed priors. 

However, if including them, the result from Stan is terrible; the chains don't even converge. By running lm, i know the coefficients are bounded by [-5000, 5000] and that's why i set the priors as beta ~ uniform(-5000, 5000); 

Can someone help? Thanks a lot. 

stanmodelcode = "
data {
int <lower=1> N;
int <lower=1> K;
matrix[N, K] X; 
vector[N] y;
}


parameters {
vector[K] beta;
real<lower=0> sigma;
}


model {
vector[N] mu;
mu <- X * beta;

// beta ~ uniform(-5000, 5000);
// sigma ~ uniform(0, 100);


y~ normal(mu, sigma);

}

"

Michael Weylandt

unread,
Aug 15, 2016, 5:28:17 PM8/15/16
to stan-...@googlegroups.com
I believe the issue is that the support of the priors needs to match
the constraints on the parameters.

Oversimplifying a bit, when you put the uniform priors in, you're
saying that values of sigma > 100 have zero probability, but are still
valid (similarly for beta) and should be explored by HMC. This causes
problems for sampling since it tries to explore the whole distribution
and hits the zero probability regions. If you do mode finding
(optimization), there isn't an issue since it focuses Stan's attention
on a fairly small region near the mode (which, under your
construction, is in the support of the priors).

If you really want to use those uniform priors, you also need to
constrain beta and sigma to match the support. It would, of course, be
better to use better priors (see [1] for some guidance) but I can
understand using these sort of priors as a teaching example /
robustness check.

The default "uniform priors" [2] are uniform over the entire range of
the parameters so they avoid this issue.

When you say:

> MLE is equivalent to Bayesian + Uniformly distributed priors

That's only half true.

MLE is equivalent to _finding the posterior mode_ with a uniform prior
(Stan's "optimizing" mode). Bayesian inference explores the entire
posterior (typically with MCMC - Stan's "sampling" mode) and typically
reports 'whole distribution' summaries (posterior means, medians, or
credible intervals) - not just the mode.

Cheers,
Michael

[1] https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
[2] In theory, there are other issues because "Uniform on R" isn't a
valid distribution, but with enough data, those problems vanish in
this example. I'd counsel against depending on this phenomenon and
just use weak priors.
> --
> You received this message because you are subscribed to the Google Groups
> "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

lizzy2016

unread,
Aug 15, 2016, 5:37:50 PM8/15/16
to Stan users mailing list
This is very helpful! Thanks a lot, Michael!

Andrew Gelman

unread,
Aug 15, 2016, 6:47:27 PM8/15/16
to stan-...@googlegroups.com
Hi, these uniform priors are bad news.  You can't "know" that beta is between -5000 and 5000.  If you think beta is kinda near zero, just use a prior such as
beta ~ normal(0,10);
Or even beta ~ normal(0,5000); if you feel the need, but in this case your parameters are not scale-free and this could be causing you other problems.
And similarly for sigma.  Do sigma ~ normal(0,10); or sigma ~ normal(0,100); or whatever, but also scale your problem so that you would not see such big models.

Also, you should put your Stan model in a separate file rather than as a character string.

A


Bob Carpenter

unread,
Aug 15, 2016, 7:03:10 PM8/15/16
to stan-...@googlegroups.com

> On Aug 15, 2016, at 7:41 PM, 'lizzy2016' via Stan users mailing list <stan-...@googlegroups.com> wrote:
>
>
> I was wondering if someone could help me understand why applied uniformly distributed priors vs. default priors outputting drastic different results. Aren't default priors actually uniform distributions?

Stan's default priors are uniform on the declared constraint.
So if you declare a variable with <lower=0>, then the prior
is uniform on (0, infinity).

> Basically, i ran a linear regression via a simple Stan code as below. If commenting out the following two priors
> beta ~ uniform(-5000, 5000);
> sigma ~ uniform(0, 100);
>
> The result is similar as lm, as expected because MLE is equivalent to Bayesian + Uniformly distributed priors.

I don't know what you mean by "Bayesian" here. If you're
talking about posterior modes, then yes, if there are no
reparameterizations involved, then the posterior mode (often
called the max a posterior or "MAP" estimate) will be the
same as the maximum likelihood estimate (MLE).

But the MLE isn't a Bayesian estimate. We usually use the
posterior mean, not the posterior mode, because it minimizes
expected square error of the estimated parameters.

>
> However, if including them, the result from Stan is terrible; the chains don't even converge. By running lm, i know the coefficients are bounded by [-5000, 5000] and that's why i set the priors as beta ~ uniform(-5000, 5000);

Even if you know the MLE for the parameters falls in (-5000, 5000),
that doesn't mean putting a (-5000, 5000) prior won't affect the
posterior mean. You didn't show us how you're fitting here.

What was the MLE for the parameters? If it's near the boundaries
of (-5000, 5000), these uniform priors are going to have a big effect.

If you're just using Stan's MLE, then uniform priors won't have
much effect if the priors are away from the boundaries.

And yes, you want to constrain the parameters to <lower=-5000, upper=5000>
if you have a uniform(-5000, 5000) distribution, because Stan requires
the model to have support (non-zero density, finite log density) for
every value of the parameters meeting the constraints. If the parameters
are near the boundaries of (-5000, 5000), then you can run into
computational issues due to transforms.

But as Andrew says, this is probably not what you want to be doing anyway.

- Bob


>
> Can someone help? Thanks a lot.
>
> stanmodelcode = "
> data {
> int <lower=1> N;
> int <lower=1> K;
> matrix[N, K] X;
> vector[N] y;
> }
>
>
> parameters {
> vector[K] beta;
> real<lower=0> sigma;
> }
>
>
> model {
> vector[N] mu;
> mu <- X * beta;
>
> // beta ~ uniform(-5000, 5000);
> // sigma ~ uniform(0, 100);
>
>
> y~ normal(mu, sigma);
>
> }
>
> "
>

lizzy2016

unread,
Aug 15, 2016, 8:47:42 PM8/15/16
to Stan users mailing list
Thanks a lot for all your comments! Very informative!

The parameters are directly from lm, not from Stan MLE. I enlarged the bounds as 

beta ~ uniform(-500000, 500000);
 sigma ~ uniform(0, 100000);

and then got similar results as lm. One parameter is around 4000. Based on your comments, it's close to the boundary i set before (5000) and hence the chains cannot converge, correct?

Also, would you always suggest me to standardize the input variables so that Stan could handle it better? Thanks.

Andrew Gelman

unread,
Aug 15, 2016, 9:11:56 PM8/15/16
to stan-...@googlegroups.com
I'd rescale your model so that the parameters are roughly on unit scale.  In a regression context, this can come from rescaling predictors.  It's not just about Stan running better, it's also about being able to assign reasonable prior distributions and get the most out of your data.

A

Bob Carpenter

unread,
Aug 15, 2016, 9:26:10 PM8/15/16
to stan-...@googlegroups.com
It depends what the variance is. That shouldn't stop
Stan's no-U-turn sampler from converging, but it
may require more warmup than the default to find the
scale when the parameters are on the order of 4000.

But what you'll find is that if the posterior standard deviation is
more than 500 for the parameter with marginal mode of 4000,
then putting a uniform(-5000,5000) prior will skew the posterior
mean to the low side compared to wider priors.

So as Andrew keeps saying, we really don't recommend these
hard boundaries in priors for just this reason---they don't
do what people think they're going to do. And as I keep
saying, I need to write some case studies to show this.

- Bob

lizzy2016

unread,
Aug 16, 2016, 1:02:35 AM8/16/16
to Stan users mailing list
Thank you very much, Andrew and Bob!
Reply all
Reply to author
Forward
0 new messages