A counterintuitive example of change of variables

603 views
Skip to first unread message

Marco Inacio

unread,
Jul 6, 2014, 1:49:01 AM7/6/14
to stan-...@googlegroups.com
Suppose I have a gamma distribution as prior for the precision parameter of a Gaussian likelihood (conjugate prior). I start writing the following model:

parameters {
  real <lower=0> tau; //precision
}
transformed parameters {
  real <lower=0> sigma; //standard deviation
  sigma <- 1/sqrt(tau);
}
model {
  1.0 ~ normal(0, sigma); //likelihood
  tau ~ gamma(0.1,  0.1); //prior
}




Everything is fine till here, but now, the following Stan program would also be correct, right(?):

parameters {
  real <lower=0> tau; //precision
}
transformed parameters {
  real <lower=0> sigma; //standard deviation
  sigma <- 1/sqrt(tau);
}
model {
  1.0 ~ normal(0, sigma);
  sigma ~ inv_gamma(0.1,  0.1);
}




If so, then the following one would need the log absolute determinant of the Jacobian of the transform, right(?):

parameters {
  real <lower=0>sigma; //standard deviation
}
model {
  1.0 ~ normal(0, sigma);
  sigma ~ inv_gamma(0.1,  0.1);
}


If my idea is correct, the second Stan program is correct, but generates a warning, and the third one is wrong, but lacks a warning. I know the reason for this to be like that, and I'm not complaining, I just want someone to confirm my idea or tell me I'm wrong.

Luc Coffeng

unread,
Jul 6, 2014, 11:35:11 AM7/6/14
to stan-...@googlegroups.com
Hi Marco,

I always have to think hard when it comes to changes to variables. So if anybody wants to correct me in turn, please do :).

First of all, it's sigma^2 that follows an inverse gamma distribution, not sigma itself. Second, you should perform an adjustment of the posterior in the second example, as you are defining a density for a transformed parameter. So the second one should be (somebody correct me if I'm wrong):


parameters {
  real <lower=0> tau;  //precision
}
transformed parameters {
  real <lower=0> sigma2;  // variance

  real <lower=0> sigma;  //standard deviation
  sigma2 <- 1/tau;
  sigma <- sqrt(sigma2);
}
model {
  1.0 ~ normal(0, sigma);
  sigma2 ~ inv_gamma(0.1,  0.1);
  increment_log_prob(-log(tau));  // adjust for change of variables 1 / tau [log(f') = -2 * log(tau)]
}


The third one is indeed wrong, but only because you are plugging a variance (sigma) into the normal distribution, rather than a standard deviation. So the third should be:

parameters {
  real <lower=0>sigma2;  // variance
}
tranformed parameter {
  real <lower=0> sigma;  // standard deviation
  sigma <- sqrt(sigma2);
}
model {
  1.0 ~ normal(0, sigma);
  sigma2 ~ inv_gamma(0.1,  0.1);
}

No adjustment of the posterior is necessary here as we are sampling before transforming.

Bob Carpenter

unread,
Jul 6, 2014, 2:41:57 PM7/6/14
to stan-...@googlegroups.com



On July 6, 2014 01:48:59 AM EDT, Marco Inacio <marcoig...@gmail.com> wrote:

> Suppose I have a gamma distribution as prior for the precision parameter of
> a Gaussian likelihood (conjugate prior). I start writing the following
> model:
>
> parameters {
> real <lower=0> tau; //precision
> }
> transformed parameters {
> real <lower=0> sigma; //standard deviation
> sigma <- 1/sqrt(tau);
> }
> model {
> 1.0 ~ normal(0, sigma); //likelihood
> tau ~ gamma(0.1, 0.1); //prior
> }
>


> Everything is fine till here, but now, the following Stan program would
> also be correct, right(?):

Right --- tau has a gamma distribution and 1/tau = sigma^2 has an
inverse gamma distribution.

>
> parameters {
> real <lower=0> tau; //precision
> }
> transformed parameters {
> real <lower=0> sigma; //standard deviation
> sigma <- 1/sqrt(tau);
> }
> model {
> 1.0 ~ normal(0, sigma);
> sigma ~ inv_gamma(0.1, 0.1);
> }
>


This one needs the Jacobian adjustment because you have a non-linearly
transformed parameter sigma on the left of the sample twiddle ~.


> If so, then the following one would need the log absolute determinant of
> the Jacobian of the transform, right(?):
>
> parameters {
> real <lower=0>sigma; //standard deviation
> }
> model {
> 1.0 ~ normal(0, sigma);
> sigma ~ inv_gamma(0.1, 0.1);
> }
>



>
> If my idea is correct, the second Stan program is correct, but generates a
> warning, and the third one is wrong, but lacks a warning. I know the reason
> for this to be like that, and I'm not complaining, I just want someone to
> confirm my idea or tell me I'm wrong.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Jul 6, 2014, 2:46:18 PM7/6/14
to stan-...@googlegroups.com

My mailer config's messed up and I keep fat-fingering my messages into
the ether with Luxsci's terrible web interface. Let's try again.

On July 6, 2014 01:48:59 AM EDT, Marco Inacio <marcoig...@gmail.com> wrote:

> Suppose I have a gamma distribution as prior for the precision parameter of
> a Gaussian likelihood (conjugate prior). I start writing the following
> model:
>
> parameters {
> real <lower=0> tau; //precision
> }
> transformed parameters {
> real <lower=0> sigma; //standard deviation
> sigma <- 1/sqrt(tau);
> }
> model {
> 1.0 ~ normal(0, sigma); //likelihood
> tau ~ gamma(0.1, 0.1); //prior
> }
>
>
>
>
> Everything is fine till here,

Yes, this is fine. tau has a gamma distribution and 1/tau = sigma^2
has an inverse gamma distribution.

> but now, the following Stan program would
> also be correct, right(?):
>
> parameters {
> real <lower=0> tau; //precision
> }
> transformed parameters {
> real <lower=0> sigma; //standard deviation
> sigma <- 1/sqrt(tau);
> }
> model {
> 1.0 ~ normal(0, sigma);
> sigma ~ inv_gamma(0.1, 0.1);
> }
>

The one above needs the Jacobian adjustment because sigma is
a non-linearly transformed param on the left of a ~ statement.

>
>
>
> If so, then the following one would need the log absolute determinant of
> the Jacobian of the transform, right(?):
>
> parameters {
> real <lower=0>sigma; //standard deviation
> }
> model {
> 1.0 ~ normal(0, sigma);
> sigma ~ inv_gamma(0.1, 0.1);
> }

This one is fine. sigma gets an inverse gamma distribution. If you follow
through the definition of the inverse gamma, you'll see that the Jacobian for
the inversion is built into its density function.

It's not the same model, though, because in the above it was sigma^2 that
had an inverse gamma distribution. So you probably want:

parameters {
real<lower=0> sigma_squared;
}
transformed parameters {
real<lower=0> sigma;
sigma <- sqrt(sigma_squared);
}
model {
1.0 ~ normal(0,sigma);
sigma_squared ~ inv_gamma(0.1,0.1);
}

>
> If my idea is correct, the second Stan program is correct, but generates a
> warning, and the third one is wrong, but lacks a warning. I know the reason
> for this to be like that, and I'm not complaining, I just want someone to
> confirm my idea or tell me I'm wrong.

See above.

- Bob

Ben Goodrich

unread,
Jul 6, 2014, 2:58:18 PM7/6/14
to stan-...@googlegroups.com
On Sunday, July 6, 2014 11:35:11 AM UTC-4, Luc Coffeng wrote:
I always have to think hard when it comes to changes to variables. So if anybody wants to correct me in turn, please do :).

I think Luc has the right code. It does seem very challenging to explain change-of-variables to Stan users, so here is another attempt. The main point is that the priors must always pertain to the variables declared in the parameters block; the only issue is whether the priors are expressed directly on the parameters or indirectly via functions of those parameters.

[Technically, the priors actually pertain to the unbounded variables that are transformed into the variables declared in the parameters block, but Stan handles those (and only those) change-of-variables automatically, which is one reason why the log-posterior calculated by Stan differs from what you would intuitively calculate in R or Python. Anyway, the user is only responsible for direct or indirect priors on the possibly bounded variables declared in the parameters block].

Direct priors are pretty straightforward, and it is often possible to respecify your model so that the variables that you have informative prior information about are declared in the parameters block and functions thereof are used in the log-likelihood. In that case, you can just specify your priors, and there is no need for a change-of-variables adjustment.

Direct priors only get confusing when you declare a primitive variable in the parameters block that has no substantive meaning in order to sample a transformed parameter more efficiently. For example, you may want to put a Cauchy prior with a median of zero and a scale of gamma on a regression coefficient, but unless the data are fairly informative, the marginal posterior distribution of that regression coefficient may have heavy tails that are difficult to sample from efficiently. So, it can be more efficient to do something like

parameters {
  real z
;
}
transformed parameters
{
  real beta
;
  beta
<- gamma *  tan( pi() * (Phi_approx(z) - 0.5) );
}
model
{
  // use beta in the log-likelihood ...
  z
~ normal(0,1); // implies: beta ~ cauchy(0,gamma)
}

There is no need for a change-of-variables adjustment because the standard normal prior is expressed directly on z, but z is only introduced because it is easier for NUTS to do its thing in a space that is approximately standard normal. What you care about is beta, which is related to z by an opaque formula (that happens to be the inverse-CDF transformation for the Cauchy distribution). But, in order for Stan to work really well in a non-trivial set of situations, you often have to gunk up your Stan program with primitives like z, which is why I favor writing a // implies:  comment whenever you do that.

Indirect priors are legal but confusing and should be avoided if it is possible to express your program using direct priors. For the sake of example, suppose you wanted efficiency but thought the above model block was too unintuitive for your audience. So, you wanted to write

parameters {
  real z
;
}
transformed parameters
{
  real beta
;
  beta
<- gamma *  tan( pi() * (Phi_approx(z) - 0.5) );
}
model
{
  // use beta in the log-likelihood ...
 
beta ~ cauchy(0,gamma);
  # wrong due to absence of change-of-variables adjustment
}

This model block does not imply that z is distributed standard normal a priori; thus Stan would not be sampling efficiently and moreover, it would not be sampling from the posterior distribution you intended (i.e. the posterior distribution that would arise if you eliminated z and declared beta in the parameters block). In order to restore coherence, you would have to manually increment_log_prob() by the logarithm of the absolute value of the derivative of beta with respect to z and thus you would have to invoke the chain rule three times without making a typo. And even if you did that correctly, it would still be slower to execute than if you had just written z ~ normal(0,1) as above. Hence, the coherent model block would be even more gunked up with no benefit.

I think the parser would warn you about having to manually increment_log_prob() in this case, but there can be "false" positives when the parser sees a sampling statement on a function of the parameters but isn't programmed to recognize that the transformation is linear inclusive of an identity transformation such as col(Y, j). Under a linear transformation, then manually increment_log_prob() by a constant just slows everything down and does not affect the probability that a proposal will be accepted.

All of this is even less transparent when talking about multivariate and / or many-to-one transformations, but the basic principle still applies that you should specify your program with direct priors if at all possible and if that is not possible, make sure to manually increment_log_prob() correctly.

Ben

Bob Carpenter

unread,
Jul 6, 2014, 3:02:42 PM7/6/14
to stan-...@googlegroups.com

Sorry -- didn't see Luc's answer before I responded. The online mailer
also doesn't do ^&*() threading.

The change of variables is from tau to 1/tau = sigma2, so
derivative of the (inverse) transform 1/tau = tau^{-1} is

- tau ^ {-2}

log(| - tau ^ {-2} |) = -2 * log(tau)

so you need

increment_log_prob(-2 * log(tau));

not just

> increment_log_prob(-log(tau)); // adjust for change of variables 1 / tau

- Bob

On July 6, 2014 11:35:11 AM EDT, Luc Coffeng <lucco...@gmail.com> wrote:

> Hi Marco,
>
> I always have to think hard when it comes to changes to variables. So if
> anybody wants to correct me in turn, please do :).
>
> First of all, it's sigma^2 that follows an inverse gamma distribution, not
> sigma itself. Second, you should perform an adjustment of the posterior in
> the second example, as you are defining a density for a transformed
> parameter. So the second one should be (somebody correct me if I'm wrong):
>
> parameters {
> real <lower=0> tau; //precision
> }
> transformed parameters {
> real <lower=0> sigma2; // variance
> real <lower=0> sigma; //standard deviation
> sigma2 <- 1/tau;
> sigma <- sqrt(sigma2);
> }
> model {
> 1.0 ~ normal(0, sigma);
> sigma2 ~ inv_gamma(0.1, 0.1);
> increment_log_prob(-log(tau)); // adjust for change of variables 1 / tau
> [log(f') = -2 * log(tau)]
> }
>
>
> The third one is indeed wrong, but only because you are plugging a variance
> (sigma) into the normal distribution, rather than a standard deviation. So
> the third should be:
>
> parameters {
> real <lower=0>sigma2; // variance
> }
> tranformed parameter {
> real <lower=0> sigma; // standard deviation
> sigma <- sqrt(sigma2);
> }
> model {
> 1.0 ~ normal(0, sigma);
> sigma2 ~ inv_gamma(0.1, 0.1);
> }
>
> No adjustment of the posterior is necessary here as we are sampling before
> transforming.
>

Luc Coffeng

unread,
Jul 6, 2014, 9:01:44 PM7/6/14
to stan-...@googlegroups.com, ca...@alias-i.com
Thanks for correcting, Bob! In a lapse of mind, I interpreted the scalar 2 as a constant, which it obviously isn't.

Marco Inacio

unread,
Jul 7, 2014, 3:01:55 PM7/7/14
to stan-...@googlegroups.com

If you follow
through the definition of the inverse gamma, you'll see that the Jacobian for
the inversion is built into its density function. 

Now, this is something I really wasn't expecting. I thought that:


parameters {
  real <lower=0> tau; //precision
}
transformed parameters {
  real <lower=0> sigma2; //variance
  sigma2 <- 1/tau;
}
model {
  1.0 ~ normal(0, sigma2); //likelihood
  tau ~ gamma(0.1,  0.1); //prior
}

was completely equivalent to:


parameters {
  real <lower=0> tau; //precision
}
transformed parameters {
  real <lower=0> sigma2; //variance
  sigma2 <- 1/tau;
}
model {
  1.0 ~ normal(0, sigma2); //likelihood
  sigma2 ~ inv-gamma(0.1,  0.1); //prior
}

despite the warning. Do they add different things to lp__?

Marco Inacio

unread,
Jul 7, 2014, 3:01:57 PM7/7/14
to stan-...@googlegroups.com
Hi y'all, thanks for the input. Somehow my email filters got messed up and I wasn't receiving stan-users mails in the correct folder, so I just realized now the some many answer to this thread.



On 14-07-06 12:35 PM, Luc Coffeng wrote:
Hi Marco,

I always have to think hard when it comes to changes to variables.
Hi, me too.



So if anybody wants to correct me in turn, please do :).

First of all, it's sigma^2 that follows an inverse gamma distribution, not sigma itself. Second, you should perform an adjustment of the posterior in the second example, as you are defining a density for a transformed parameter. So the second one should be (somebody correct me if I'm wrong):

Yes, I messed up here. Luckily, it seems everybody got what I meant =]



parameters {
  real <lower=0> tau;  //precision
}
transformed parameters {
  real <lower=0> sigma2;  // variance
  real <lower=0> sigma;  //standard deviation
  sigma2 <- 1/tau;
  sigma <- sqrt(sigma2);
}
model {
  1.0 ~ normal(0, sigma);
  sigma2 ~ inv_gamma(0.1,  0.1);
  increment_log_prob(-log(tau));  // adjust for change of variables 1 / tau [log(f') = -2 * log(tau)]
}


The third one is indeed wrong, but only because you are plugging a variance (sigma) into the normal distribution, rather than a standard deviation. So the third should be:

parameters {
  real <lower=0>sigma2;  // variance
}
tranformed parameter {
  real <lower=0> sigma;  // standard deviation
  sigma <- sqrt(sigma2);
}
model {
  1.0 ~ normal(0, sigma);
  sigma2 ~ inv_gamma(0.1,  0.1);
}

No adjustment of the posterior is necessary here as we are sampling before transforming.

Marco Inacio

unread,
Jul 7, 2014, 3:11:06 PM7/7/14
to stan-...@googlegroups.com
Hi, thanks very much for the valuable input.

It is particularly interesting to note your example of a difficult posterior, this is something I always found hard to deal with in Stan (and any other software/method), I suggest more of these examples (of dealing with difficult posteriors) in the manual (or an issue on Github, for future reference).

Bob Carpenter

unread,
Jul 7, 2014, 3:11:26 PM7/7/14
to stan-...@googlegroups.com
Given that you have Stan, you can try all of these
models and look at the answers. Be sure to run enough
samples to get good estimates of the tail intervals and
variances.

- Bob

Bob Carpenter

unread,
Jul 7, 2014, 3:12:55 PM7/7/14
to stan-...@googlegroups.com
Feel free to add examples to the manual or stack concrete
requests into the issue --- I'll lose anything that's just
mentioned on the mailing list.

- Bob

Marco Inacio

unread,
Jul 7, 2014, 4:53:13 PM7/7/14
to stan-...@googlegroups.com


On 14-07-07 04:11 PM, Bob Carpenter wrote:
> Given that you have Stan, you can try all of these
> models and look at the answers. Be sure to run enough
> samples to get good estimates of the tail intervals and
> variances.
>
> - Bob

Yes, this should have been the first thing I should have done!

I've confirmed what you all said and came to understand things better:
parameterizing a gaussian using the precision with a gamma prior
expresses the same uncertainty as parameterizing a gaussian using the
variance with an inverse-gamma prior. Sorry to state the obvious, I
guess that's what inverse gamma is for in the first place.

Thanks very much.


And here's goes the code and results in case anyone is interested:

require(rstan)

model_code1 <- "parameters {
real <lower=0> tau; //precision
real x;
}
transformed parameters {
real <lower=0> sigma2; // variance
real <lower=0> sigma; //standard deviation
sigma2 <- 1/tau;
sigma <- sqrt(sigma2);
}
model {
x ~ normal(0, sigma); //likelihood
tau ~ gamma(1, 1); //prior
}"



model_code2 <- "parameters {
real <lower=0> tau; //precision
real x;
}
transformed parameters {
real <lower=0> sigma2; // variance
real <lower=0> sigma; //standard deviation
sigma2 <- 1/tau;
sigma <- sqrt(sigma2);
}
model {
x ~ normal(0, sigma);
sigma2 ~ inv_gamma(1, 1);
}"


model_code3 <- "parameters {
real <lower=0> sigma2; // variance
real x;
}
transformed parameters {
real <lower=0> sigma; //standard deviation
sigma <- sqrt(sigma2);
}
model {
x ~ normal(0, sigma);
sigma2 ~ inv_gamma(1, 1);
}"


stan(model_code=model_code1, iter=100000) -> fit1
stan(model_code=model_code2, iter=100000) -> fit2
stan(model_code=model_code3, iter=100000) -> fit3


> fit1
Inference for Stan model: model_code1.
4 chains, each with iter=1e+05; warmup=50000; thin=1;
post-warmup draws per chain=50000, total post-warmup draws=2e+05.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
tau 1.01 0.01 1.01 0.03 0.29 0.70 1.41 3.71 28693 1
x -0.01 0.03 3.01 -4.31 -0.82 -0.01 0.80 4.17 10804 1
sigma2 11.01 2.72 608.53 0.27 0.71 1.42 3.41 39.36 50107 1
sigma 1.75 0.02 2.82 0.52 0.84 1.19 1.85 6.27 12727 1
lp__ -2.36 0.01 1.48 -6.36 -2.92 -1.90 -1.31 -0.93 15222 1


> fit2
Inference for Stan model: model_code2.
4 chains, each with iter=1e+05; warmup=50000; thin=1;
post-warmup draws per chain=50000, total post-warmup draws=2e+05.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
tau 3.00 0.01 1.72 0.63 1.74 2.68 3.92 7.19 70603 1
x 0.00 0.00 0.71 -1.41 -0.41 0.00 0.41 1.40 54350 1
sigma2 0.50 0.00 0.50 0.14 0.26 0.37 0.57 1.60 43959 1
sigma 0.66 0.00 0.24 0.37 0.51 0.61 0.76 1.26 51083 1
lp__ -0.26 0.01 1.16 -3.34 -0.69 0.10 0.56 0.86 41528 1

> fit3
Inference for Stan model: model_code3.
4 chains, each with iter=1e+05; warmup=50000; thin=1;
post-warmup draws per chain=50000, total post-warmup draws=2e+05.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
sigma2 17.51 8.42 872.01 0.27 0.71 1.41 3.38 36.93 10724 1
x 0.09 0.06 4.33 -4.16 -0.80 0.00 0.80 4.28 5314 1
sigma 1.76 0.05 3.79 0.52 0.84 1.19 1.84 6.08 6399 1
lp__ -2.34 0.01 1.47 -6.27 -2.90 -1.89 -1.30 -0.93 14549 1


(x of fit1 and fit3 were pretty similar in the tails)

Luc Coffeng

unread,
Jul 7, 2014, 9:22:04 PM7/7/14
to stan-...@googlegroups.com
Hi Marco,

You should add a correction for change of variables to the second model where you are sampling sigma2 as a transformation of parameter tau. When Bob said that this adjustment is already part of the inv_gamma distribution, he meant that in general (i.e. compare the probability density functions of the gamma and inverse gamma distributions, you will see that the inverse gamma contains the adjustment for the Jacobian of the transform). The adjustment of the posterior in model 2 should be;

increment_log_prob(-2 * log(tau));


- Luc

Marco Inacio

unread,
Jul 8, 2014, 12:30:58 AM7/8/14
to stan-...@googlegroups.com
Yes, I know. These are the same 3 examples I used in the first email (with some small improvements), I just run them to confirm the conclusions were correct.
Reply all
Reply to author
Forward
0 new messages