Sampler diagnostic parameter questions

1,409 views
Skip to first unread message

Jonathan Gilligan

unread,
Jun 29, 2015, 10:15:46 PM6/29/15
to stan-...@googlegroups.com
I am trying to understand how to use the sampler diagnostic parameters returned by get_sampler_params() to identify and fix possible problems with my models. I've found and read several relevant threads on this list, but I have a few questions:

First, I am confused about the relationship between treedepth__ and  max_treedepth that indicates a failed NUTS step. The Stan manual says (p. 453) that if treedepth__ equals the maximum, NUTS gives up and terminates that step prematurely. But when I look at the diagnostics from some model runs, sometimes treedepth__ == max_treedepth + 1 and that suggests that the steps with treedepth__ == max_treedepth were successful and it's the ones with treedepth__ > max_treedepth that were unsuccessful.

Second, should I worry if I get a few instances of excessive treedepth__ or nonzero n_divergent__ during warmup, or is it only a sign of trouble if it happens during the post-warmup sampling?

Third, is there a simpler way than the following code to get max_treedepth from a stanfit object in Rstan?

lapply(my_stanfit_obj@sim$samples, function(x) attr(x, 'args')$control$max_treedepth)

Finally, I'm not finding much documentation on how to interpret the sampler diagnostic parameters. I can get a feel for what treedepth__ is telling me from the section of the Stan manual that describes the NUTS algorithm, but I don't know whether I should be looking for anything other than checking that treedepth__ <= max_treedepth. And I can't find anything about what I should be learning from the other parameters, such as n_divergent__. I've looked in the Stan manual (but it's big and I might not have looked in the right places or searched on the right terms), the RStan manual, and Appendix C of BDA3 with no success.  Is there something else a basic stan user should read to understand these diagnostic parameters?

Ben Goodrich

unread,
Jun 29, 2015, 11:45:22 PM6/29/15
to stan-...@googlegroups.com, jonathan...@gmail.com
On Monday, June 29, 2015 at 10:15:46 PM UTC-4, Jonathan Gilligan wrote:
First, I am confused about the relationship between treedepth__ and  max_treedepth that indicates a failed NUTS step. The Stan manual says (p. 453) that if treedepth__ equals the maximum, NUTS gives up and terminates that step prematurely. But when I look at the diagnostics from some model runs, sometimes treedepth__ == max_treedepth + 1 and that suggests that the steps with treedepth__ == max_treedepth were successful and it's the ones with treedepth__ > max_treedepth that were unsuccessful.

Treedepth starts counting at zero. So, if max_treedepth is 10 (the default) and treedepth goes to 11, then it terminated prematurely.
 
Second, should I worry if I get a few instances of excessive treedepth__ or nonzero n_divergent__ during warmup, or is it only a sign of trouble if it happens during the post-warmup sampling?

There are almost always some divergent transitions during the warmup phase but we don't really worry about them. It is trying to find a stepsize during the warmup phase that achieves the target acceptance probability without diverging and without hitting the max_treedepth.
 
Third, is there a simpler way than the following code to get max_treedepth from a stanfit object in Rstan?

lapply(my_stanfit_obj@sim$samples, function(x) attr(x, 'args')$control$max_treedepth)

No.
 
Finally, I'm not finding much documentation on how to interpret the sampler diagnostic parameters.

Unfortunately, these things are better explained in the CmdStan manual

https://github.com/stan-dev/cmdstan/releases/download/v2.6.2/cmdstan-guide-2.6.2.pdf

Ben

Bob Carpenter

unread,
Jun 30, 2015, 12:00:25 AM6/30/15
to stan-...@googlegroups.com

> On Monday, June 29, 2015 at 10:15:46 PM UTC-4, Jonathan Gilligan wrote:
> ...

> Finally, I'm not finding much documentation on how to interpret the sampler diagnostic parameters.
>
> Unfortunately, these things are better explained in the CmdStan manual
>
> https://github.com/stan-dev/cmdstan/releases/download/v2.6.2/cmdstan-guide-2.6.2.pdf

Now that's something I understand how to fix! I'll move the discussion
over to the Stan manual and just leave a reference in the CmdStan manual.
Here's the issue tracker link:

https://github.com/stan-dev/stan/issues/1508#issuecomment-116940132

- Bob

Michael Betancourt

unread,
Jun 30, 2015, 4:40:26 AM6/30/15
to stan-...@googlegroups.com
Unfortunately we’re overdo for a serious rewrite of the manual,
especially given our rapidly evolving understanding of subtle
theoretical MCMC issues. I have everything collected into
slides, but turning it into text will have to wait until I have some
free time. But I’ll summarize the results here.

MCMC is just a method for numerically estimating integrals,
which is hard in high dimensions because of something called
concentration of measure. Essentially the only parameters
that contribute to integrals lie on a hyper surface in parameter
space and if we want to estimate integrals then we need to
find that hyper surface and then explore it. That’s exactly
what a Markov chain does — eventually we have a series
of states distributed across the hyper surface, { y_{n} }, that
we can use as a grid for doing really simple quadrature,

\int dy p(y) f(y) ~ 1/N sum_{n = 1}^{N} f(y_{n})

Everything relies on our being able to find and then explore
the _entire_ hyper surface (which we call the typical set).
This is incredibly hard to determine theoretically so we’re
left with a bunch of necessary but not sufficient diagnostics
to identify when we’re not exploring everything.

All of these apply only to samples — because we dynamically
adapt sampling parameters in warmup those samples will
behave very differently. Again, ignore anything you see
in warmup.

Traceplots:

The characteristic way pathologies manifest is the Markov
chain getting stuck in the boundary of a pathological region
of parameter space. If you run long enough you’ll literally
see the chain get stuck for long periods of time. You can
also compare chains to identify multi modality and other
problems.

R-hat:

If our Markov chain can explore the entire target distribution
then any realization of it should look the same. R-hat
essentially performs an analysis of variance on a bunch of
chains (and subsets of chains) looking for any deviations.
The more chains you use the more sensitive you’ll be to
potential pathologies.

n-divergent:

Hamiltonian Monte Carlo has a unique diagnostic — regions
of the typical set that are hard to explore induce particular
numerical divergences that we can capture and report. This
is much more sensitive then looking at trace plots! If
you see any divergences in sampling then you have to be
careful. For a few divergences you can increase the target
acceptance probability, but for any nontrivial number of
divergences you’ll probably have to consider tweaking your
model to use stronger priors or different parameterizations.

In RStan you can grab the divergences with

fit <- stan(file='model_name.stan', data=input_data,
iter=2000, chains=1, seed=4938483)

count_divergences <- function(fit) {
sampler_params <- get_sampler_params(fit, inc_warmup=FALSE)
sum(sapply(sampler_params, function(x) c(x[,'n_divergent__']))[,1])
}

Finally the treedepth issue is not so much a diagnostic of a
bad Markov chain but rather a safeguard we’ve built in to
avoid infinite loops caused by ill-formed posteriors which
require the sampler to explore all the way to infinity in back
(requiring an infinite tree depth an infinite time). Sometimes
your sampler will need to explore beyond our default safeguard
value, in which case you have to increase it manually — this
doesn’t effect the validity of your results, just the performance
of the algorithm.

I like looking at a histogram of the treedepths,

hist_treedepth <- function(fit) {
sampler_params <- get_sampler_params(fit, inc_warmup=FALSE)
hist(sapply(sampler_params, function(x) c(x[,'treedepth__']))[,1], breaks=0:20, main="", xlab="Treedepth")
abline(v=10, col=2, lty=1)
}

Technically the sampler tries one more step after maxdepth
which is why you might see it exceed that value by one. I
can never remember the indexing and ordering to state whether
you should see maxdepth or maxdepth + 1 but in the end it
doesn’t really matter as the bad behavior (histogram saturating
at a large value) is pretty obvious.
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jonathan Gilligan

unread,
Jun 30, 2015, 10:53:05 AM6/30/15
to stan-...@googlegroups.com
Thank you. This is very clear and helpful. I had felt that I understand using R hat, traceplots, and autocorrelation plots as diagnostics but I did not understand the HMC/NUTS specific diagnostics at all. This summary is a big help.  Thanks also to Ben Goodrich for his answers and for pointing me to the cmd-stan manual.

Andrew Gelman

unread,
Jun 30, 2015, 4:23:59 PM6/30/15
to stan-...@googlegroups.com
A paper needs to be written on all this!

Andrew Gelman

unread,
Jun 30, 2015, 11:57:20 PM6/30/15
to stan-...@googlegroups.com
“Essentially the only parameters that contribute to integrals lie on a hyper surface in parameter space“ . . .
Exactly! That’s the intuition behind my paper with Roberts and Gilks: with Metropolis you have to crawl along the surface of a sphere.

Michael Betancourt

unread,
Jul 1, 2015, 4:32:11 AM7/1/15
to stan-...@googlegroups.com
Yup. You’re walking on a knife edge and if you try to move
in random directions then you’re going to fall off!

Sebastian Weber

unread,
Jul 1, 2015, 4:46:50 AM7/1/15
to stan-...@googlegroups.com
Great insights here! Could we link this to the FAQ please?
Reply all
Reply to author
Forward
0 new messages