Intuitive example of Jacobian adjustment

888 views
Skip to first unread message

Krzysztof Sakrejda

unread,
Oct 20, 2015, 4:43:28 PM10/20/15
to Stan users mailing list
I recently did a brief intro to Stan and I tried to come up with an example of
a model that needs a Jacobian adjustment due to a transformation and can
be explained in fairly intuitive terms.  The density I used was one with a normally
distributed radius parameter and a uniformly distributed angle parameter.  The first
Stan model was:

data {
  real r_mu;
  real r_sd;
}

parameters {
  real x;
  real y;
}

model {
  real r;
  r <- sqrt(x^2+y^2);
  r ~ normal(r_mu, r_sd);
}

This version of the model, when looking at the samples of r_mu, estimates something like
8.5 instead of the 8 used to simulate the data.  With the Jacobian adjustment the model was:

model {
  real r;
  matrix[2,2] J;
  real lad_J;
  r <- sqrt(x^2+y^2);
  r ~ normal(r_mu, r_sd);

  # Sigh, this model needs more maths:
  J[1,1] <- x/sqrt(x^2+y^2);
  J[1,2] <- y/sqrt(x^2+y^2);
  J[2,1] <- (x^2*(x^2+y^2)^(-3.0/2.0) - (x^2+y^2)^(-1.0/2.0)) / sqrt(1-(x^2)/(x^2+y^2));
  J[2,2] <- (x*y) / ((x^2+y^2)^(3.0/2.0) * sqrt(1-(x^2)/(x^2+y^2)));
  lad_J <- log(fabs(J[1,1]*J[2,2] - J[1,2]*J[2,1]));
  increment_log_prob(lad_J);
}

This nails the r_mu (and the difference between the two is clearly visible with something like 4k samples of r_mu).  It's sort of intuitive that without the adjustment the sampler would spend too much time on the outer ring and too little on the inner ring leading to an over-estimate for r

I guess the "log(x) ~ normal()" example might be simpler in some ways, but this one had the benefit of forcing me to figure out which derivatives go where. 

Do you have a favorite model for explaining the whole Jacobian adjustment thing? 

Bob Carpenter

unread,
Oct 20, 2015, 5:37:02 PM10/20/15
to stan-...@googlegroups.com
The only problem with circular distributions is that their
manifold doesn't map continuously to R^n. So I might stay
away from that in a simple example.

There's a related example on making a uniform distribution on the
circle I did in the shapes BUGS example in Vol3 (if I remember right
here).

I'd start with univariate then go onto something multivariate.
Transforms to and from the unit interval are tightly related to
CDFs, but I'm not sure that makes it easier or harder for novices.
Most of our multivariates have diagonal or triangular Jacobians,
though. Then you can go onto something needing a full Jacobian calc.

I couldn't quite follow the function whose Jacobian you give.
Is it (x,y) -> (r,theta) where theta is the angle? I always
find it much easier to understand these things when I have
two complete models, not fragments and everything's fully defined.

It'd be a good idea to assign (x^2 + y^2) to a local variable
and reuse it.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Krzysztof Sakrejda

unread,
Oct 20, 2015, 5:51:14 PM10/20/15
to stan-...@googlegroups.com
On Tuesday, October 20, 2015 at 5:37:02 PM UTC-4, Bob Carpenter wrote:
The only problem with circular distributions is that their
manifold doesn't map continuously to R^n.  So I might stay
away from that in a simple example.

Yeah... I can't figure out if there are any implications for the path taken by the
sampler.  Granted, I haven't spend much time on it.
 
There's a related example on making a uniform distribution on the
circle I did in the shapes BUGS example in Vol3 (if I remember right
here).

I'd start with univariate then go onto something multivariate.
Transforms to and from the unit interval are tightly related to
CDFs, but I'm not sure that makes it easier or harder for novices.
Most of our multivariates have diagonal or triangular Jacobians,
though.  Then you can go onto something needing a full Jacobian calc.

I couldn't quite follow the function whose Jacobian you give.
Is it (x,y) -> (r,theta) where theta is the angle?  I always
find it much easier to understand these things when I have
two complete models, not fragments and everything's fully defined.

Yep, so the derivatives are dr/dx, dtheta/dx, etc...
 

Michael Betancourt

unread,
Oct 21, 2015, 9:39:22 AM10/21/15
to stan-...@googlegroups.com
The problem with Jacobians is that to really motivate them you have to
get into the non-uniqueness of densities, i.e. there is some abstract
probabilistic system that we can represent with infinitely many 
parameterizations.  Samples are not dependent on such a density,
however, just the abstract probabilistic system, so they must remain
invariant.

One thing to do would be to take a density, p(theta), and some transform,
f(theta), and compare

1. Samples drawn from p(theta) then mapped by f, f_{i} = f(theta_{i})
2. Samples drawn from p(f(theta)).

They should not be the same, as easily established by comparing
histograms of the samples, or a histogram of (1) compared to the
density p(f(theta)).  Then to the proper transform,

p(f(theta)) df/dtheta,

and show that they are the same.

I’m not sure if there’s anything more intuitive, as you’re always
working with two spaces (the original and the transformed) and
there’s always going to be confusion between the two.


Bob Carpenter

unread,
Oct 21, 2015, 1:42:42 PM10/21/15
to stan-...@googlegroups.com
This is exactly how I finally understood them (thanks, Michael!)

I completely agree that simulations really help. Both ways.
Simulate x ~ uniform(0,1), plot x and note it's uniform, plot
logit(x) and note it's not. In fact, it's the logistic, because
the logit function is the inverse CDF for the logistic distribution
(that makes inverse logit the CDF). You can do the same thing
with the Phi() function (inverse unit normal CDF) and the normal.

I really like Krzysztof's example of the Stan program that samples
from a different density without the Jacobian adjustment. We're
also doing that in some of our education examples --- trying to
show what happens when things go wrong, how to diagnose it, and how
to debug the model and fix the problem.

- Bob

Krzysztof Sakrejda

unread,
Oct 23, 2015, 12:09:02 PM10/23/15
to Stan users mailing list
On Wednesday, October 21, 2015 at 1:42:42 PM UTC-4, Bob Carpenter wrote:
This is exactly how I finally understood them (thanks, Michael!)

I completely agree that simulations really help.  Both ways.
Simulate x ~ uniform(0,1), plot x and note it's uniform, plot
logit(x) and note it's not.  In fact, it's the logistic, because
the logit function is the inverse CDF for the logistic distribution
(that makes inverse logit the CDF).  You can do the same thing
with the Phi() function (inverse unit normal CDF) and the normal.

Yeah, the 1-D examples are probably easier to explain although the 
2-D is helpful since it makes the issue of geometry into one of 
change-of-area which I think can be more intuitive so likely both.
 

I really like Krzysztof's example of the Stan program that samples
from a different density without the Jacobian adjustment.  We're
also doing that in some of our education examples --- trying to
show what happens when things go wrong, how to diagnose it, and how
to debug the model and fix the problem.

I attached the full models and code to run them in case you're interested in the example.  
This was rushed so I hard-coded some directories... sorry!

Krzysztof
donut-sim-corrected.stan
donut-sim.stan
donut-sim.R

Bob Carpenter

unread,
Oct 23, 2015, 12:15:37 PM10/23/15
to stan-...@googlegroups.com
For me, I understood the 1D case, and then understanding
the idea of change of variables and the definition of Jacobian
adjustments in 2D finally led me to understanding
the purpose of a determinant. Same way that understanding
covariance led me to understand positive definiteness
and the multivariate normal let me understand
rotation, scaling and eigendecomposition. In fact, it might
be fun to write an intro to these topics based on the univariate
and multivariate normal.

- Bob
> <donut-sim-corrected.stan><donut-sim.stan><donut-sim.R>

Michael Betancourt

unread,
Oct 23, 2015, 12:28:30 PM10/23/15
to stan-...@googlegroups.com
2D would be fine it done carefully. For example, consider f: x \mapsto x^{3},
with a Gaussian density pi(x) and an integral \int_{A} pi(x)dx.

You start with drawing a grid to represent dx, a gradient showing pi(x)
and a line circumscribing the region of integration A (make it a circle
for simplicity). You can then add samples drawn from a proper Markov
chain and compare the MCMC estimate to the true integral.

Then apply the map. First you’ll see the grid change and the density warp.
Then when you integrate over f(A) you should get a different value. Draw
samples and see that they don’t one up with f( old samples).

Finally add the Jacobian to the density, which yields the right integral and
proper samples that line up with f(old samples).

Bob Carpenter

unread,
Oct 23, 2015, 2:08:15 PM10/23/15
to stan-...@googlegroups.com
This is so going in the book! Or the manual or wiki. Or
all of those good things we have left to do.

- Bob

Jonah

unread,
Oct 24, 2015, 1:50:03 AM10/24/15
to Stan users mailing list
On Friday, October 23, 2015 at 12:15:37 PM UTC-4, Bob Carpenter wrote:
In fact, it might be fun to write an intro to these topics based on the univariate
and multivariate normal. 

This would be great. 

Vishv Jeet

unread,
Dec 8, 2016, 4:03:25 PM12/8/16
to Stan users mailing list

Hi Krzysztof,

Can you explain the Jacobian components? I don't understand why Jacobian is a matrix in the first place, shouldn't it be a vector? Also Jacobian components should be symmetric in x and y. The J[2, 1] and J[2, 2] seem to care more about x then y.

regards,  
vishy

Michael Betancourt

unread,
Dec 8, 2016, 4:11:08 PM12/8/16
to stan-...@googlegroups.com
The Jacobian is a matrix of partial derivatives of the form

J_{ij} = f_{i} / x_{j}.

If f is a function that maps N inputs to M outputs then it
is a N x M matrix.  If M = 1 then it reduces to a vector,
perhaps that is of what you are thinking?

It is also evident from the definition that the Jacobian
matrix is _not_ symmetric.  

Vishv Jeet

unread,
Dec 8, 2016, 4:39:32 PM12/8/16
to Stan users mailing list
Got it.

I was confused by Krzysztof's Jacobian which can be simplied a lot.

J[1, 1] = x / sqrt(x^2+y^2);

J[1, 2] = y / sqrt(x^2+y^2);

J[2, 1] = - y / (x^2 + y^2);

J[2, 2,] = x / (x^2 + y^2);


You can see these are symmetric in the x and y are appear in different terms.

regards,
vishy

Vishv Jeet

unread,
Dec 8, 2016, 4:46:19 PM12/8/16
to Stan users mailing list
Thanks Michael,

So when Jacobian is indeed a vector, how do we compute its determinant?

regards,
vishy

On Thursday, December 8, 2016 at 4:11:08 PM UTC-5, Michael Betancourt wrote:

Bob Carpenter

unread,
Dec 8, 2016, 4:47:51 PM12/8/16
to stan-...@googlegroups.com
Always square, not necessarily symmetric.

We're not making this stuff up. It's all standard calculus.
See:

https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant

And then try to find something on change of variables in
calculus.

The real problem is finding simple explanations. I tried in
the chapter on variable transforms due to constraints near the
end of the manual, but it's going to go too fast if you don't
already know a lot of this material.

- Bob

> On Dec 8, 2016, at 4:03 PM, Vishv Jeet <vishv...@gmail.com> wrote:
>
>

Michael Betancourt

unread,
Dec 9, 2016, 8:43:49 AM12/9/16
to stan-...@googlegroups.com
Please see me previous email — if the transformation is _not_ 1-1
(with the Jacobian square and the determinant well-definted) then
it’s not a “change of variables”.  Again, you have to do a 1-1 first
and then marginalize out the excess parameters.  This is all
probability theory and is required understanding before trying to
build a model of this complexity.

Vishv Jeet

unread,
Dec 9, 2016, 9:54:14 AM12/9/16
to Stan users mailing list
I appreicate all the help I received for understanding Krzysztof's model. My model is really simple. See attached stan code. I believe the Jacobian (for the transformation from delta to y) is vector in this case.
ShareModel.stan

Michael Betancourt

unread,
Dec 9, 2016, 10:31:17 AM12/9/16
to stan-...@googlegroups.com
Vishv, let me elaborate again.

In order to perform a change of variables on a probability density
you need a 1-to-1 transformation.  Given the parameters x and a
density pi(x), the density for y = f(x) is given by

pi(y) = pi(f^{-1}(x)) | J_f(y) |

Because f is 1-to-1 (same number of inputs as outputs) the Jacobian
will always be square and the Jacobian determinant will be well-
defined.

In your case, (x, y) -> z is _not_ a 1-to-1 and so does not admit
a change of variables formula.  Instead you have do induce a 1-to-1
transformation by padding the output and then marginalizing.  So,
for example, you could do (x, y) -> (z, theta) then 

pi(z) = \int d(theta) pi(z, theta).

Or you could do (x, y) -> (z, x) with

pi(z) = \int dx pi(z, x),

etc.  Lots of potential ways, but each yields the same marginal
density.  The art is figuring out a clever way that facilities analytical
calculations.

<ShareModel.stan>

Reply all
Reply to author
Forward
0 new messages