Questiion about half-Cauchy priors

2,009 views
Skip to first unread message

Kent Holsinger

unread,
Dec 5, 2015, 7:09:41 PM12/5/15
to Stan users mailing list
I am confused about something very simple, and I'm sure the answer to my confusion is in the documentation, but I can't find it.

I'm fitting a relatively simple hierarchical linear regression with a random intercept that is the sum of two random factors.

data {
 
int<lower=0> n_obs;
 
int<lower=0> n_years;
 
int<lower=0> n_indiv;
 
int<lower=0> n_sites;
 
int<lower=0> n_months;
  matrix
[n_years,n_months] ppt;
  matrix
[n_years,n_months] tmn;
  vector
[n_obs] gi;
 
int<lower=0> year[n_obs];
 
int<lower=0> indiv[n_obs];
 
int<lower=0> site[n_indiv];
 
// for index calculations
  vector
[n_months] ppt_mean;
  vector
[n_months] tmn_mean;
}
parameters
{
  vector
[n_months] beta_ppt;
  vector
[n_months] beta_tmn;
  vector
[n_indiv] mu_indiv;
  vector
[n_sites] mu_site;
  real beta_0
;
  real
<lower=0> sigma_resid;
  real
<lower=0> sigma_indiv;
  real
<lower=0> sigma_site;
}
transformed parameters
{
  matrix
[n_years,n_indiv] mu_year_indiv;
  vector
[n_years] mu_year;

 
// n_months is number of months of prior weather included in
 
// calculating expectation
 
//
 
// beta_0 incorporated into intercept for mu_year_indiv through mu_indiv
 
//
  mu_year
<- ppt*beta_ppt + tmn*beta_tmn;
 
// expectation for individual j in year i is sum of year
 
// and indivdidual effects
 
//
 
for (i in 1:n_years) {
   
for (j in 1:n_indiv) {
      mu_year_indiv
[i,j] <- mu_year[i] + mu_indiv[j];
   
}
 
}
}
model
{
 
// priors
 
//
  beta_0
~ normal(0.0, 1.0);
  beta_ppt
~ normal(0.0, 1.0);
  beta_tmn
~ normal(0.0, 1.0);
  sigma_resid
~ cauchy(0.0, 2.5);
  sigma_indiv
~ cauchy(0.0, 2.5);
  sigma_site
~ cauchy(0.0, 2.5);

 
// likelihood
 
//
 
for (i in 1:n_indiv) {
    mu_indiv
[i] ~ normal(mu_site[site[i]], sigma_indiv);
 
}
  mu_site
~ normal(beta_0, sigma_site);
 
// individual site x year combinations
 
//
 
for (i in 1:n_obs) {
    gi
[i] ~ student_t(nu, mu_year_indiv[year[i],indiv[i]], sigma_resid);
 
}
}
Enter code here...

My question concerns the priors for sigma_resid, sigma_indiv, and sigma_site. The reference manual for 2.8.0 says on p. 48 that "the half-Cauchy (coded implicitly in Stan) [is] a good default choice for scale parameters." Does that mean that the explicit Cauchy priors coded above aren't needed? How is this related to the statement on p. 45 that "[t]he default in Stan is to provide uniform (or 'flat') priors over their legal values as determined by their declared constraints"? The latter statement would seem to imply a uniform prior for the sigma's on (0, \infty).

If it makes a difference, I've rescaled all of the covariates and the response variable to a mean of 0 and a standard deviation of 1.

Kent
 

Krzysztof Sakrejda

unread,
Dec 5, 2015, 7:51:48 PM12/5/15
to Stan users mailing list
If you want Cauchy priors you have to specify them as you do.  The "good default choice" is talking about how one might want to make default choices about priors whereas the p.45 default is about default values in the code.

Krzysztof

Kent Holsinger

unread,
Dec 5, 2015, 10:16:05 PM12/5/15
to stan-...@googlegroups.com
Sorry to be dense, but what then does the  "(coded implicitly in Stan)" mean? 

Kent Holsinger
Professor of Biology
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

andre.p...@googlemail.com

unread,
Dec 6, 2015, 5:00:32 PM12/6/15
to Stan users mailing list
Kent,

The default in Stan is to provide uniform (or “flat”) priors on parameters over their
legal values as determined by their declared constraints. A parameter declared with-
out constraints is thus given a uniform prior on (−∞, ∞) by default, whereas a scale
parameter declared with a lower bound of zero gets an improper uniform prior on
(0, ∞). Both of these priors are improper in the sense that there is no way formulate
a density function for them that integrates to 1 over its support.

With cauchy(0,2.5) you limit  the variance. Former gamma-distributions used, but
since studies shown that they could create problems their use is not recommended anymore. 

I think the questions refers to that you specify a cauchy but by using a contraint it implies 
to become a half-cauchy. 

Andre

Krzysztof Sakrejda

unread,
Dec 6, 2015, 6:02:33 PM12/6/15
to Stan users mailing list


On Sunday, December 6, 2015 at 5:00:32 PM UTC-5, andre.p...@googlemail.com wrote:
Kent,

The default in Stan is to provide uniform (or “flat”) priors on parameters over their
legal values as determined by their declared constraints. A parameter declared with-
out constraints is thus given a uniform prior on (−∞, ∞) by default, whereas a scale
parameter declared with a lower bound of zero gets an improper uniform prior on
(0, ∞). Both of these priors are improper in the sense that there is no way formulate
a density function for them that integrates to 1 over its support.

With cauchy(0,2.5) you limit  the variance. Former gamma-distributions used, but
since studies shown that they could create problems their use is not recommended anymore. 

Gamma distributions are fine, it's the attempts at having uninformative gamma priors that's
silly. (the BUGS dgamma(0.001,0.001) idiom).
 

Krzysztof Sakrejda

unread,
Dec 6, 2015, 6:20:28 PM12/6/15
to Stan users mailing list
On Saturday, December 5, 2015 at 10:16:05 PM UTC-5, Kent Holsinger wrote:
Sorry to be dense, but what then does the  "(coded implicitly in Stan)" mean? 

HMC (or anything else for that matter) doesn't do well sampling from a very fat-tailed distribution like the cauchy, so if you have

sigma ~ cauchy(0,2.5)

and no data about sigma you might get poor performance.  The usual solution is to instead do:

parameters {
  real<lower=-pi()/2,upper=pi()/2> x_raw;
}

transformed parameters {
  real sigma;
  sigma <- abs(tan(x_raw));
}

model {
  // x_raw ~ uniform(-.5*pi(),.5*pi);    // implied.
}

... and this gives sigma a half-cauchy prior....

I didn't write the manual, but that's what it sounds like (?)



 

Kent Holsinger

unread,
Dec 6, 2015, 7:19:56 PM12/6/15
to stan-...@googlegroups.com
That helps. Thanks. 

Kent 


Kent Holsinger
Professor of Biology

andre.p...@googlemail.com

unread,
Dec 6, 2015, 8:10:56 PM12/6/15
to Stan users mailing list

Gamma distributions are fine, it's the attempts at having uninformative gamma priors that's
silly. (the BUGS dgamma(0.001,0.001) idiom).
 


Gamma distributions create problems at ]0 

Andre
 
 sort(rgamma(n = 1000, shape = 0.01, rate = 0.01))
   [1]  0.000000e+00  0.000000e+00 4.255981e-295 3.613961e-281 6.495086e-280 5.245848e-243 7.648001e-231 6.213213e-226 2.992256e-216
  [10] 1.177523e-212 8.946648e-208 6.991148e-189 1.766772e-188 1.106023e-185 4.415470e-184 2.080586e-182 1.654261e-180 3.669441e-174
  [19] 3.113691e-165 5.595780e-163 1.998831e-160 7.104659e-160 7.526827e-160 3.950458e-157 9.700742e-157 1.483864e-156 4.532592e-155
  [28] 2.605218e-152 5.262874e-147 1.045855e-145 1.105727e-142 4.010348e-142 1.085546e-141 3.595015e-140 6.498320e-140 3.261144e-139
  [37] 4.251182e-139 9.160886e-139 1.480462e-138 1.978534e-138 4.114618e-138 1.585053e-137 6.075135e-137 1.066873e-136 1.381337e-133


Christopher Peterson

unread,
Dec 7, 2015, 11:48:34 AM12/7/15
to stan-...@googlegroups.com
I'm pretty sure that the 'encoded explicitly in stan' bit just refers to the fact that a Half-Cauchy is coded as a Cauchy with a lower bound of 0. 
parameters{
  real
<lower=0> cauchyOne;
  real
<lower=0,upper=pi()/2> cauchyTwo_raw;  //Non-central version, for more efficient sampling.
}
transformed parameters
{
  real cauchyTwo;
  cauchyTwo <- tan(cauchyTwo_raw); // multiply by scale and add center, if needed
}
model{
  cauchyOne ~ cauchy(0,1);
  //cauchyTwo uses uniform prior and is transformed
}

Kent Holsinger

unread,
Dec 7, 2015, 5:52:05 PM12/7/15
to stan-...@googlegroups.com
Thanks. That makes a lot of sense.

Kent

On 12/7/15 11:48 AM, Christopher Peterson wrote:
I'm pretty sure that the 'encoded explicitly in stan' bit refers to the fact that a Half-Cauchy is coded as a Cauchy with a lower bound of 0. 
parameters{
  real
<lower=0> cauchyOne;
  real
<lower=0,upper=pi()/2> cauchyTwo_raw;
}

transformed parameters
{


Bob Carpenter

unread,
Dec 7, 2015, 6:01:22 PM12/7/15
to stan-...@googlegroups.com
All I meant is that if you have a constraint like

real<lower=0, upper=1> theta;
real<lower=0> sigma;
real beta;

then theta is uniform(0,1), sigma is uniform(0,inf) and
beta uniform(-inf,inf). By "implicit", I just meant
that there's no terms added to the log density on the
natural scale (that is, the scale where sigma in (0, 1)).

By "explicit", I just meant a statement in the model like

sigma ~ cauchy(0, 1);

This adds

log cauchy(sigma | 0, 1)

to the density. The uniforms don't add anything --- that's
the sense in which they're implicit.

Now everything's a bit trickier, because what happens under
the hood is that constrained variables get transofmred to
(-inf,inf). For example, sigma is log transformed to (-inf, inf).
The inverse transform is exp : (-inf, inf) -> (0, inf).
It's Jacobian adjustment is exp'(x) = exp(x). So the log
Jacobian is log exp(x) = x. So x (the unconstrained value)
gets added to the log density. This is what makes it uniform
on (0, inf). We do the change of variables adjustment under
the hood. There's a chapter on the manual on constraint transforms
that goes through the gory details for all our constrained
variable types.

- Bob






> On Dec 7, 2015, at 11:48 AM, Christopher Peterson <christopher...@gmail.com> wrote:
>
> I'm pretty sure that the 'encoded explicitly in stan' bit refers to the fact that a Half-Cauchy is coded as a Cauchy with a lower bound of 0.
> parameters{
> real<lower=0> cauchyOne;
> real<lower=0,upper=pi()/2> cauchyTwo_raw;
> }
> transformed parameters{
Reply all
Reply to author
Forward
0 new messages