NUTS differences in Stan vs paper

Cole Monnahan

unread,

Oct 25, 2016, 7:07:34 PM10/25/16

to stan...@googlegroups.com

I'm wondering if there have been any changes to the NUTS algorithm since the paper was published.

I coded up algorithm 6 from the paper, then ran a model with a fixed step size, unit diagonal mass matrix and no adaptation. I ran the same model in RStan using metric=unit_e and the same step size, and turned off adaptation with adapt_engaged=FALSE. I'm getting the same posteriors, but slightly different treedepth and n_leapfrog distributions. Should I expect that the quantities in sampler_params to be identical? Before I dive into debugging mode of my code I thought I'd check to see if my expectation is wrong.

For example, I know there was talk of dropping slice sampling for multinomial sampling. Would something like that have an effect on n_leapfrog? I wouldn't imagine so since, as I understand, U-turns are independent of the sampling from the set "C".

If I use unit_e, is that equivalent to iid standard normal masses?

Any thoughts would be helpful.

Thanks,

Cole

Bob Carpenter

unread,

Oct 25, 2016, 8:41:20 PM10/25/16

to stan...@googlegroups.com

> On Oct 25, 2016, at 7:07 PM, Cole Monnahan <mon...@uw.edu> wrote:
>
> I'm wondering if there have been any changes to the NUTS algorithm since the paper was published.

Yes, several. I realize the algorithms section of the manual
is out of date.

> I coded up algorithm 6 from the paper, then ran a model with a fixed step size, unit diagonal mass matrix and no adaptation. I ran the same model in RStan using metric=unit_e and the same step size, and turned off adaptation with adapt_engaged=FALSE. I'm getting the same posteriors, but slightly different treedepth and n_leapfrog distributions. Should I expect that the quantities in sampler_params to be identical? Before I dive into debugging mode of my code I thought I'd check to see if my expectation is wrong.
>
> For example, I know there was talk of dropping slice sampling for multinomial sampling.

More than talk --- it happened. The points are selected with
probability proportional to density.

Looks like we never updated the HMC chapter of the manual to match.
I added a to-do item for the next manual to update it:

https://github.com/stan-dev/stan/issues/2051#issuecomment-256217382

I don't know if Michael Betancourt has it written up anywhere else other
than the code.

> Would something like that have an effect on n_leapfrog?

Yes, because it'll affect which parameters get sampled, which
impacts just about everything.

> I wouldn't imagine so since, as I understand, U-turns are independent of the sampling from the set "C".
>
> If I use unit_e, is that equivalent to iid standard normal masses?

The masses don't get a distribution, but assuming a unit mass
matrix makes the kinetic energy distribution unit normal.

- Bob

Michael Betancourt

unread,

Oct 25, 2016, 10:10:37 PM10/25/16

to stan...@googlegroups.com

> I'm wondering if there have been any changes to the NUTS algorithm since the paper was published.

Yes. See https://arxiv.org/abs/1601.00225, Section 2.

> I coded up algorithm 6 from the paper, then ran a model with a fixed step size, unit diagonal mass matrix and no adaptation. I ran the same model in RStan using metric=unit_e and the same step size, and turned off adaptation with adapt_engaged=FALSE. I'm getting the same posteriors, but slightly different treedepth and n_leapfrog distributions. Should I expect that the quantities in sampler_params to be identical? Before I dive into debugging mode of my code I thought I'd check to see if my expectation is wrong.
>
> For example, I know there was talk of dropping slice sampling for multinomial sampling. Would something like that have an effect on n_leapfrog? I wouldn't imagine so since, as I understand, U-turns are independent of the sampling from the set "C”.

The changes in the sampling will not affect the size of
the tree and hence n_leapfrog. That said, there was
a very small change in the calculation of the No-U-Turn
criterion that can have small changes in the size of
the trajectories (affecting whether an endpoint is included
or not) and the slice sampling to multinomial transition made
a very slight change in the definition of a divergence which
could also change the trajectory sizes slightly.

But that is only for the same initial condition. Selecting
different points lead to different chains which can have
slightly different n_leapfrog distributions. In particular,
the more extreme quantiles might be noisy and sensitive
to these changes.