Definition of 'divergent transition'

518 views
Skip to first unread message

Tran

unread,
Apr 5, 2017, 3:12:37 PM4/5/17
to Stan users mailing list
Hi all,

When fitting models with Stan I often see warnings of 'divergent transitions'. I read in the manual, it says:


"The primary cause of divergent transitions in Euclidean HMC (other than bugs in
the code) is highly varying posterior curvature, ..."


and in this article
"One of these behaviors is the appearance of divergences that indicate the Hamiltonian Markov chain has encountered regions of high curvature in the target distribution which it cannot adequately explore."

I also read the paper of NUTs (Hoffman and Gelman, 2014) but I do not
find the definitions.

Could you show me where I can find the (mathematical) definition of 'divergent transitions' because I think it is subjective to say 'high curvature'?

Kind regards,
Tran.

Bob Carpenter

unread,
Apr 5, 2017, 4:20:30 PM4/5/17
to stan-...@googlegroups.com
Trying to precisely define "high curvature" out of context makes
no more sense than trying to define "stiff ordinary differential
equation" out of the context of a solver. High curvature is just
the amount of curvature that will cause divergences in our
Hamiltonian simulations.

By definition, the Hamiltonian is the sum of the potential energy plus
the kinetic energy. In physical systems with no outside forces imparted,
the Hamiltonian remains constant over time (as the particle moves).

In Hamiltonian Monte Carlo, potential energy is negative
log density at the position of a particle representing the parameters,
whereas kinetic energy is derived from momentum, which is
imparted as standard normal at the start of each iteration.

We simulate the particle's trajectory using the leapfrog
integrator, which tends to preserve the Hamiltonian to within a very
small tolerance (1e-8 or so). We raise divergence errors when the
Hamiltonian drifts very far away (something like a factor of 1e100 or
something ridiculous like that) from its initial value.

The full details and practical ramifications are described here:

http://mc-stan.org/documentation/case-studies/divergences_and_bias.html

I'll also update the manual to include a crisp definition:

https://github.com/stan-dev/stan/issues/2122#issuecomment-291982907

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Stephen Martin

unread,
Apr 5, 2017, 9:11:00 PM4/5/17
to Stan users mailing list
It's a bit like if you're walking down a slope; you want to stay smoothly on the slope.  One divergence would be if you go 'too fast' and become airborne (like a car going over a hill), right? Another would be if your feet started entering the ground?

Tran

unread,
Apr 6, 2017, 5:05:24 AM4/6/17
to Stan users mailing list
Thank you Bob for a clear explanation!

So in case we are at a position with high curvature, after L leapfrog steps the proposal will have low chance of being accepted because the Hamiltonian at the proposed position is far from the current one. In case of rejecting we still stay at the current position and another new proposal is proposed and again with low probability of being accepted. In other words the chain only moves when the H(new)-H(current) is negative or sightly positive since the probability of MH step is

min(1, exp(-H(new)+H(current)).

In that case only moves with short distances will be accepted and it explains why we often have 'stuck' in the region of high curvature. Is it correct?


Does it hold in the other direction? I mean when we have high curvature we have a warning of divergence but does the divergence implies high curvature?

Best,
Tran.

Bob Carpenter

unread,
Apr 6, 2017, 10:42:30 AM4/6/17
to stan-...@googlegroups.com
You're talking about Metropolis with standard HMC.

We're not doing that with NUTS. Instead we take all
the points on the trajectory (which moves forward and
backward in time randomly) and choose one randomly
with probability proportional to the density with
an adjustment to favor draws from the second half of
the trajectory (the bias toward the second half is
in the original NUTS paper as is the NUTS criterion;
it's explained more thoroughly in Michael's exhaustive
sampling paper).

It has roughly the same effect---we want to take the
next draw from the true Hamiltonian trajectory.

It's the step size relative to the curvature. You're
using small steps along a gradient (only first order
approximation) to try to follow something whose geometry
is typically better approximated with the curvature
information in second derivatives (the Hessian matrix).
In that way it's like gradient descent vs. Newton steps in
optimization.

- Bob

Cole Monnahan

unread,
Apr 6, 2017, 11:42:32 AM4/6/17
to Stan users mailing list
Tran: I try to give an intuitive explanation of these concepts in the following paper (i.e., few equations, illustrative figures, no prior knowledge expected). It may be helpful or useful as a primer to the more technical papers.


Cheers, 
Cole

Trung Dung Tran

unread,
Apr 6, 2017, 12:55:18 PM4/6/17
to stan-...@googlegroups.com
Thank you so much for your help, Bob and Cole!

Tran.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/UwuTNEp6X8w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Bob Carpenter

unread,
Apr 6, 2017, 1:58:32 PM4/6/17
to stan-...@googlegroups.com
I just added more precise definitions to the manual
and provided links to Michael's cotangent disentegration paper
on divergences (on arXiv).

If you really want to understand this, I'd suggest reading
Michael's case study (on the Stan web site) or watching
his video from StanCon (also linked from the documentation page
of the web site). Cole's overview is a great place to start.

- Bob
> To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Trung Dung Tran

unread,
Apr 6, 2017, 2:54:12 PM4/6/17
to stan-...@googlegroups.com
Hi Bob,

Yes, I really want to understand at least to the basic level of the algorithm behind Stan from that I might know the reason why my particular model does not give true values. If not then it takes time to guess why a model for a real data set does not work.

Before I thought that Stan also has problem at the boundary (with constraints) even (if I understand correctly) Stan do transform from constrained to unconstrained. In that case of boundary problem, Stan will give warning of divergence.

For example with a scale parameter sigma we set sigma > 0 and in log-scale it will be -infinity to +infinity.

When sigma ->0, log(sigma) -> -infinity and even sigma change a little bit (0.1 -> 0.05), log(sigma) changes a 0.7 unit and it can lead to a large difference for Hamiltonian. In this case I thought Stan also give warnings.

Do you think this is a correct understanding of mine?

Best regards,
Tran.

> To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/UwuTNEp6X8w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Bob Carpenter

unread,
Apr 6, 2017, 3:12:32 PM4/6/17
to stan-...@googlegroups.com

> On Apr 6, 2017, at 2:54 PM, Trung Dung Tran <trungd...@gmail.com> wrote:
>
> Hi Bob,
>
> Yes, I really want to understand at least to the basic level of the algorithm behind Stan from that I might know the reason why my particular model does not give true values.

You want to check interval coverage, not "values". See the
Cook, Gelman, and Rubin paper, for example.

> If not then it takes time to guess why a model for a real data set does not work.
>
> Before I thought that Stan also has problem at the boundary (with constraints) even (if I understand correctly) Stan do transform from constrained to unconstrained. In that case of boundary problem, Stan will give warning of divergence.
>
> For example with a scale parameter sigma we set sigma > 0 and in log-scale it will be -infinity to +infinity.
>
> When sigma ->0, log(sigma) -> -infinity and even sigma change a little bit (0.1 -> 0.05), log(sigma) changes a 0.7 unit and it can lead to a large difference for Hamiltonian. In this case I thought Stan also give warnings.
>
> Do you think this is a correct understanding of mine?

No. The Hamiltonian is always conserved.

What happens is that we wander off in the unconstrained space, then
when we transform back (using, e.g., exp()), we get underflow or
overflow, which then leads to warnings when we plug the values into
distributions that require finite parameters.

- Bob

Trung Dung Tran

unread,
Apr 6, 2017, 3:22:25 PM4/6/17
to stan-...@googlegroups.com
Thank you so much for an useful explanation and correction!

I was attracted from Stan when I see it is better for ordinal variables when running a ordered logit model. I would continue use it not only because of its performance but also of the help from the team like you.

Tran.

Cole Monnahan

unread,
Apr 6, 2017, 6:04:22 PM4/6/17
to stan-...@googlegroups.com
It may be worth pointing out that divergences can be caused by large approximation errors in the Hamiltonian, or a NaN. Obviously the latter case would be divergent if the Hamiltonian could be calculated, and are related. As far as I know there is no way to tell the difference between these cases in Stan.

In some models in fisheries we have holes in the posterior where the log density is undefined (essentially 0 probability). This results from poorly parameterized models. The behavior of the divergences is different than those where the step size is simply too big.

Daniel Lee

unread,
Apr 6, 2017, 9:00:12 PM4/6/17
to stan-...@googlegroups.com

On Apr 6, 2017, at 6:03 PM, Cole Monnahan <mon...@uw.edu> wrote:

It may be worth pointing out that divergences can be caused by large approximation errors in the Hamiltonian, or a NaN. Obviously the latter case would be divergent if the Hamiltonian could be calculated, and are related. As far as I know there is no way to tell the difference between these cases in Stan.

In some models in fisheries we have holes in the posterior where the log density is undefined (essentially 0 probability). This results from poorly parameterized models. The behavior of the divergences is different than those where the step size is simply too big.

+1!!!! 


You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
Apr 7, 2017, 1:23:32 PM4/7/17
to stan-...@googlegroups.com

> On Apr 6, 2017, at 6:03 PM, Cole Monnahan <mon...@uw.edu> wrote:
>
> It may be worth pointing out that divergences can be caused by large approximation errors in the Hamiltonian, or a NaN. Obviously the latter case would be divergent if the Hamiltonian could be calculated, and are related. As far as I know there is no way to tell the difference between these cases in Stan.
>
> In some models in fisheries we have holes in the posterior where the log density is undefined (essentially 0 probability). This results from poorly parameterized models. The behavior of the divergences is different than those where the step size is simply too big.

There are really four ways I can think of to diverge:

1. Hamiltonian diverges but remains finite (finite divergence)

2. update target with NaN or -infinity (non-finite divergence)

3. a. execute a reject() statement (exception)

b. pass an illegal argument to a function (exception)

Cases (2) or (3) may arise from

* an infinite step in the Hamiltonian because floating-point arithmetic
fails for a well-defined Hamiltonian

* there are holes in your density

The arithmetic causes make (2) and (3) just like (1), as Cole points
out in his first paragraph.

Non-arithmetic causes, like holes in the density, can be equally
disastrous for estimation, as they cause HMC to fall back into a kind of
rejection sampler, which will usually mix poorly if there's not a smooth
boundary. Whatever you do, you want to be careful to test the calibration
of the model on simulated data.

Stan reports these slightly differently. If the NaN or infinite value
gets all the way through (unlikely), we'll report a divergence as usual.
Otherwise, we try to report the root cause from the function that
got an illegal argument or the reject statement.

- Bob


>
> On Thu, Apr 6, 2017 at 12:22 PM, Trung Dung Tran <trungd...@gmail.com> wrote:
> Thank you so much for an useful explanation and correction!
>
> I was attracted from Stan when I see it is better for ordinal variables when running a ordered logit model. I would continue use it not only because of its performance but also of the help from the team like you.
>
> Tran.
>
> On 6 April 2017 at 21:11, Bob Carpenter <ca...@alias-i.com> wrote:
>
> > On Apr 6, 2017, at 2:54 PM, Trung Dung Tran <trungd...@gmail.com> wrote:
> >
> > Hi Bob,
> >
> > Yes, I really want to understand at least to the basic level of the algorithm behind Stan from that I might know the reason why my particular model does not give true values.
>
> You want to check interval coverage, not "values". See the
> Cook, Gelman, and Rubin paper, for example.
>
> > If not then it takes time to guess why a model for a real data set does not work.
> >
> > Before I thought that Stan also has problem at the boundary (with constraints) even (if I understand correctly) Stan do transform from constrained to unconstrained. In that case of boundary problem, Stan will give warning of divergence.
> >
> > For example with a scale parameter sigma we set sigma > 0 and in log-scale it will be -infinity to +infinity.
> >
> > When sigma ->0, log(sigma) -> -infinity and even sigma change a little bit (0.1 -> 0.05), log(sigma) changes a 0.7 unit and it can lead to a large difference for Hamiltonian. In this case I thought Stan also give warnings.
> >
> > Do you think this is a correct understanding of mine?
>
> No. The Hamiltonian is always conserved.
>
> What happens is that we wander off in the unconstrained space, then
> when we transform back (using, e.g., exp()), we get underflow or
> overflow, which then leads to warnings when we plug the values into
> distributions that require finite parameters.
>
> - Bob
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/UwuTNEp6X8w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/UwuTNEp6X8w/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages