"The estimated Bayesian Fraction of Missing Information is . . ."

1,693 views
Skip to first unread message

Andrew Gelman

unread,
Jul 20, 2016, 12:25:57 PM7/20/16
to stan development mailing list
In Stan 2.10 (or maybe it's just rstan 2.10), when you print out the model fit it gives a paragraph about The estimated Bayesian Fraction of Missing Information, along with numbers such as .7 or .9 for each chain.

I have a couple questions here. First, what is this? Second, is it a good idea for this to be in the default output? This seems to be just one more way to get users upset for no reason. Isn't it enough that we have R-hat?

A


Ben Goodrich

unread,
Jul 20, 2016, 12:43:37 PM7/20/16
to stan development mailing list, gel...@stat.columbia.edu
On Wednesday, July 20, 2016 at 12:25:57 PM UTC-4, Andrew Gelman wrote:
In Stan 2.10 (or maybe it's just rstan 2.10), when you print out the model fit it gives a paragraph about The estimated Bayesian Fraction of Missing Information, along with numbers such as .7 or .9 for each chain.

I have a couple questions here.  First, what is this?
 Second, is it a good idea for this to be in the default output?  This seems to be just one more way to get users upset for no reason.  Isn't it enough that we have R-hat?

I believe a BFMI close to 1 would imply that the chain is mixing well, in which case all the R-hats should be close to 1. I like BFMI somewhat because the variance in the energy is a function of the joint posterior kernel rather than some margin of it. What I don't like as much is that we could plot the joint dependence between adjacent draws directly and not have to worry about what implies what.

Ben

Michael Betancourt

unread,
Jul 20, 2016, 2:06:25 PM7/20/16
to stan...@googlegroups.com
This Hamiltonian-based BFMI quantifies how hard it is to
sample level sets at each iteration — if it’s very small then
that sampling is hard and we might even lose central limit
theorems.  Another way to think about it is that this value
quantifies how heavy the target distribution is and how
much we should we worried about the sampler being
able to adequately explore.  It is important information
orthogonal to what R-hat gives, although it’s not yet
clear what is a good default threshold.

--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Gelman

unread,
Jul 20, 2016, 9:04:02 PM7/20/16
to stan...@googlegroups.com
Hi, it sounds to me that this should be one of the things saved in the Stan object but that it should not be printed by default.  I think that printing it by default will have the net result of confusing most users, and perhaps deterring some people from using Stan.  (This relates to Bob's point that some users are intimidated by Stan's frequent issuing of warning messages.

Ben Goodrich

unread,
Jul 20, 2016, 9:26:25 PM7/20/16
to stan development mailing list, gel...@stat.columbia.edu
On Wednesday, July 20, 2016 at 9:04:02 PM UTC-4, Andrew Gelman wrote:
Hi, it sounds to me that this should be one of the things saved in the Stan object but that it should not be printed by default.  I think that printing it by default will have the net result of confusing most users, and perhaps deterring some people from using Stan.  (This relates to Bob's point that some users are intimidated by Stan's frequent issuing of warning messages.

The same could be said of the warning about the divergent transitions or the warning about hitting the maximum treedepth. A similar argument could be made against showing the MCSE and n_eff. Perhaps people understand Rhat somewhat better but only because it has been around for more than 20 years. We have to show people things that they don't understand for them to learn what they are.


Andrew Gelman

unread,
Jul 20, 2016, 9:32:02 PM7/20/16
to Ben Goodrich, stan development mailing list
I agree about the warnings about divergent transitions and treedepth.  If R-hat indicates poor mixing, then it can be useful to hear about divergent transitions and treedepth.  If mixing is ok, I don't see the advantage of bothering people about transitions and treedepth.

MCSE I'm not sure about.  I had mixed feelings about including it in the default output.  I do like n_eff, though; I think that's pretty intuitive.  In fact, when I explain MCSE now, I explain it by saying it's the posterior sd divided by the sqrt of n_eff.

This new thing I really hate because it always appears, it takes up several lines of output, and it always looks bad.  We're just asking users to say, "Hey, what's wrong, how come these numbers aren't 1, like they should be?"

I think it's fine for this stuff to be available but I think it's bad news for it to be always output.  We're making Stan look less reliable than it is. 

A

Ben Goodrich

unread,
Jul 20, 2016, 10:02:07 PM7/20/16
to stan development mailing list, gel...@stat.columbia.edu
On Wednesday, July 20, 2016 at 9:32:02 PM UTC-4, Andrew Gelman wrote:
I agree about the warnings about divergent transitions and treedepth.  If R-hat indicates poor mixing, then it can be useful to hear about divergent transitions and treedepth.  If mixing is ok, I don't see the advantage of bothering people about transitions and treedepth.

I think that is not correct. If there are divergent transitions, leapfrogs hitting the maximum treedepth, low BFMI, etc. then the R-hat can be close to 1 without the chain(s) having explored the whole posterior.

I'm not a fan of making users look at (and understand) several different things that collectively speak to the dependence between draws when it is easy to just plot the dependence between draws.
 
This new thing I really hate because it always appears, it takes up several lines of output, and it always looks bad.  We're just asking users to say, "Hey, what's wrong, how come these numbers aren't 1, like they should be?"

I think it's fine for this stuff to be available but I think it's bad news for it to be always output.  We're making Stan look less reliable than it is. 

I don't think we want users to presume that models they write will yield good draws from the posterior. New Stan users usually are writing models that don't yield good draws from the posterior.

Ben

Andrew Gelman

unread,
Jul 20, 2016, 10:13:03 PM7/20/16
to stan...@googlegroups.com
Hi, see below.
On Jul 20, 2016, at 10:02 PM, Ben Goodrich <goodri...@gmail.com> wrote:

On Wednesday, July 20, 2016 at 9:32:02 PM UTC-4, Andrew Gelman wrote:
I agree about the warnings about divergent transitions and treedepth.  If R-hat indicates poor mixing, then it can be useful to hear about divergent transitions and treedepth.  If mixing is ok, I don't see the advantage of bothering people about transitions and treedepth.

I think that is not correct. If there are divergent transitions, leapfrogs hitting the maximum treedepth, low BFMI, etc. then the R-hat can be close to 1 without the chain(s) having explored the whole posterior.

That _could_ happen, but it shouldn't, in that if your chains do not explore the whole posterior you should see this in poor mixing.  I don't see that max treedepth etc have anything to do with it.  What I _do_ see all the time is models that are just fine but I still get the damn max treedepth thing or that other message and I just have to ignore it.

I'm not a fan of making users look at (and understand) several different things that collectively speak to the dependence between draws when it is easy to just plot the dependence between draws.

Plotting dependence between draws is fine.  I recommend that when R-hat does not go down to 1 after several thousand iterations, users spend some effort trying to figure out what's going on.  But what I see now is settings where we do 2000 iterations, R-hat is around 1.02, n_eff is in the hundreds or thousands, and there are still a few of these messages, just there to give people an unnecessary worry.

 
This new thing I really hate because it always appears, it takes up several lines of output, and it always looks bad.  We're just asking users to say, "Hey, what's wrong, how come these numbers aren't 1, like they should be?"

I think it's fine for this stuff to be available but I think it's bad news for it to be always output.  We're making Stan look less reliable than it is. 

I don't think we want users to presume that models they write will yield good draws from the posterior. New Stan users usually are writing models that don't yield good draws from the posterior.

But "The estimated Bayesian Fraction of Missing Information is . . ." 0.8 does _not_ mean there's any problem with the draws.  It's just a way to screw with people.

We could possibly have a setting in the rstan call, hassle=TRUE will give all these messages and hassle=FALSE will only give them if there's a problem with R-hat.

A

Michael Betancourt

unread,
Jul 20, 2016, 10:27:34 PM7/20/16
to stan...@googlegroups.com
>>
>> I think that is not correct. If there are divergent transitions, leapfrogs hitting the maximum treedepth, low BFMI, etc. then the R-hat can be close to 1 without the chain(s) having explored the whole posterior.
>
> That _could_ happen, but it shouldn't, in that if your chains do not explore the whole posterior you should see this in poor mixing. I don't see that max treedepth etc have anything to do with it. What I _do_ see all the time is models that are just fine but I still get the damn max treedepth thing or that other message and I just have to ignore it.

No, it really can’t happen. R-hat is great, but there are lots of pathologies where you
need tens of chains to be sensitive to the problems. These HMC-specific diagnostics
are super sensitive to these potential problems and identify them long before they
manifest in bad R-hats.

You ignore at your own peril, but you really should be taking these warnings seriously.


> Plotting dependence between draws is fine. I recommend that when R-hat does not go down to 1 after several thousand iterations, users spend some effort trying to figure out what's going on. But what I see now is settings where we do 2000 iterations, R-hat is around 1.02, n_eff is in the hundreds or thousands, and there are still a few of these messages, just there to give people an unnecessary worry.

Again, it’s not unnecessary worry. R-hat is a noisy measure and it doesn’t always
identify pathological behavior quickly enough, especially when the mixing is really
bad.

I’m not sure how actually having information that can identify pathological behavior,
something we get from no other sampler, is bad. It’s like compiler warnings — you
ignore them at your own peril, but you really really should get in the habit of fixing
them if you want to be building robust models/code.

> But "The estimated Bayesian Fraction of Missing Information is . . ." 0.8 does _not_ mean there's any problem with the draws. It's just a way to screw with people.
>
> We could possibly have a setting in the rstan call, hassle=TRUE will give all these messages and hassle=FALSE will only give them if there's a problem with R-hat.

As I mentioned before we don’t have a strong idea of what threshold to apply. I’m
happy taking something conservative, say 0.3 or 0.25 and not printing the value
if the BFMI is above that value, but it should be printed if it’s smaller.

Hiding this info may appease new users but it will only encourage them to build
fragile analyses.

Andrew Gelman

unread,
Jul 20, 2016, 10:33:26 PM7/20/16
to stan...@googlegroups.com
Yes, it just happened today. I had a perfectly well behaved model that had a couple of those divergent transitions or whatever during warmup. I'd say that _most_ of the models I fit spit out that divergent transitions message, and I mostly fit pretty easy models.

I'm not saying R-hat is perfect, but currently we're sending so many warnings that I think we're doing little more than bugging people. And I think it's really really bad to print out things that are .7 .8 .9 and then tell people that ideally these should be 1. Again, just asking for trouble.

Michael Betancourt

unread,
Jul 20, 2016, 10:52:20 PM7/20/16
to stan...@googlegroups.com

> Yes, it just happened today. I had a perfectly well behaved model that had a couple of those divergent transitions or whatever during warmup. I'd say that _most_ of the models I fit spit out that divergent transitions message, and I mostly fit pretty easy models.

Perfectly well-behaved based on what criteria? Any divergent iterations indicate
some pathologies are still present and even a few can skew the higher quantiles
significantly. The right answer is not to ignore them but to increase the step size
and rerun to ensure that you’re covering all the parameter space that needs to
be covered.

Also, any divergent transitions during warmup are not reported so they’d have
to be real problems if you’re seeing them in RStan.

> I'm not saying R-hat is perfect, but currently we're sending so many warnings that I think we're doing little more than bugging people. And I think it's really really bad to print out things that are .7 .8 .9 and then tell people that ideally these should be 1. Again, just asking for trouble.

I said it last week and I’ll say it again — we can only hide the real difficulties
with MCMC for so long. I would not be at all surprised if most of the models
that people fit suffer some small pathologies are hence small (or even large)
biases. Everyone has just convinced themselves that MCMC should work
without a problem because no other algorithms can identify the problems so
quickly!

Having Rhat, divergences, and the other diagnostics makes us uniquely
suited to ensuring robust fits. The default should be “expect problems and
tweak the model/settings until they go away” not “problems are rare, my
fit is probably fine and I can ignore all of these warnings”. Again, this is
not an issue of boy crying wolf, it’s an issue of wolves being everywhere
and we’re only now seeing them.

I’m fine with the next version of RStan putting a threshold on the BFMI,
but at this point that threshold would be entirely heuristic. We haven’t
developed enough empirical understanding yet to know whether or not
0.7 is okay or could be a sign of something bad.

Bob Carpenter

unread,
Jul 21, 2016, 12:36:02 AM7/21/16
to stan...@googlegroups.com
Andrew and Michael and Ben:

Can you construct a a few examples that manifest these
problems concretely (things that split R-hat diagnoses
but R-hat doesn't, that Michael's metric diagnoses, but
R-hat doesn't, etc.)? I think that'd be a huge help for
both our users and the field as a whole.

From what they tell me, our (potential) users are being
freaked out by all the warning messages. Examples will
convince them the warnings are justified.

- Bob

Ben Goodrich

unread,
Jul 21, 2016, 1:04:28 AM7/21/16
to stan development mailing list
On Thursday, July 21, 2016 at 12:36:02 AM UTC-4, Bob Carpenter wrote:
Andrew and Michael and Ben:

Can you construct a a few examples that manifest these
problems concretely (things that split R-hat diagnoses
but R-hat doesn't, that Michael's metric diagnoses, but
R-hat doesn't, etc.)?  I think that'd be a huge help for
both our users and the field as a whole.

I haven't redone the examples in the bulleted list here

https://groups.google.com/d/msg/stan-dev/W4voz8g2Vm0/b-Cj2oFmk-cJ

with BFMI but the problems were pretty apparent from the dependence plots.


Sebastian Weber

unread,
Jul 21, 2016, 3:26:16 AM7/21/16
to stan development mailing list, gel...@stat.columbia.edu
Hi!

It would be really great to have Stan being able to run in a "quiet" mode which only spits out messages when there is a really bad problem.

Having exposed people in industry to Stan, I can assure you that these Warnings which come out of Stan very often scare people, even though they should just ignore it. I know that education here is the best thing, but not all people who want (or should) use Stan have the time to go into such depth; still MCMC is valuable to their work.

There are just too many false positives right now - a mode which would allow for some false negatives would be beneficial (I am not saying to make this a default!).

Sebastian

Michael Betancourt

unread,
Jul 21, 2016, 9:21:21 AM7/21/16
to stan...@googlegroups.com
When did everyone get the idea that these are ignorable warnings?
Did everyone forget how many BUGS/JAGS fits we’re seeing were
biased due to the sampler not behaving well? HMC is better at these
problems but it’s not immune to pathologies, and the huge advantage
that we have over the older algorithms is that we can diagnose the
pathologies in practice!

The only false positive is the occasional Metropolis rejection warning
due to numerical instabilities, and even then it would be best to tweak
the model to avoid the warnings altogether. The HMC warnings are
_not false positives_. They indicate real issues.

Remember how problems in the original 8-schools model didn’t show
up until Andy ran HMC for way longer than anyone would reasonable
run it? These diagnostics find those problems within a reasonable run.
I’m completely overwhelmed by how forgetful everyone seems to be.

Bob — the energy diagnostic is discussed in http://arxiv.org/abs/1604.00695
(I write these papers for a reason!). There are a collection of examples
demonstrating the utility of this information at the end. Ultimately the
energy diagnostic is complementary to divergences — whereas divergences
identify light tails that prevent complete sampling, the energy diagnostic
identifies heavy tails that prevent complete sampling. Heavy tails are
particularly hard problems that can easily sneak around R-hat unless
you run many chains.

Again, we absolutely cannot reinforce the myth that MCMC (or any
computational algorithm) can be run automatically with no validation
of the results. Statistics is not automatic, and anybody who values
automation over robustness is doomed to their own hubris.

_angry rant over_

Bob Carpenter

unread,
Jul 21, 2016, 1:40:47 PM7/21/16
to stan...@googlegroups.com
Just to clarify my own position:

* we should leave the diagnostics on by default

* we should have examples we can show users that will scare
them more than the warnings

Sebastian's right that our users are freaking out and abandoning
Stan because they interpret the warnings as an indication that
Stan's broken. We could let them just go back to
JAGS and run around bad-mouthing Stan, but I think it would
be a better service to the field if we could educate people
with examples they could understand.

- Bob

Sebastian Weber

unread,
Jul 21, 2016, 3:20:44 PM7/21/16
to stan development mailing list
Just to emphasize - I like the warnings, I just would love to have the option to turn them off or get fewer of them...

To put this into perspective, if we get ANY warning here during our analyses, then this leads to delays as we always have to explain that stuff. This is the reason why SAS by default will never say anything unless it has to STOP for some huge problem (of course, reporting level can be changed).

Sebastian

Ben Goodrich

unread,
Jul 21, 2016, 3:39:14 PM7/21/16
to stan development mailing list
On Thursday, July 21, 2016 at 3:20:44 PM UTC-4, Sebastian Weber wrote:
Just to emphasize - I like the warnings, I just would love to have the option to turn them off or get fewer of them...

We do this in rstanarm, but I don't think this would work well in general practice. If we provide and publicize such an option, users who do not want to see them will utilize that option in lieu of modifying their Stan program or the settings. 
 

To put this into perspective, if we get ANY warning here during our analyses, then this leads to delays as we always have to explain that stuff.


I think that is the appropriate policy, although I can imagine it is frustrating when the person you have to explain it to does not have much background in the subject.

Ben

Michael Betancourt

unread,
Jul 21, 2016, 7:44:09 PM7/21/16
to stan...@googlegroups.com

To put this into perspective, if we get ANY warning here during our analyses, then this leads to delays as we always have to explain that stuff.


I think that is the appropriate policy, although I can imagine it is frustrating when the person you have to explain it to does not have much background in the subject.

+N

Andrew Gelman

unread,
Jul 21, 2016, 8:27:42 PM7/21/16
to stan...@googlegroups.com
I want to push back against this. Sure, I think it's fine to leave on _some_ diagnostics by default. But "The estimated Bayesian Fraction of Missing Information" is not even a diagnostic. It's something that _every time_, for _every model_, no matter how good, reports numbers less than 1 and then tells the user that these numbers are supposed to be 1. That's horrible, horrible, horrible!

To put it another way, had we had this conversation a few months earlier, and Bob had said, "we should leave the diagnostics on by default," these diagnostics would not have included "The estimated Bayesian Fraction of Missing Information." I very strongly oppose this ratcheting in which more and more warnings get spewed out, thus giving people the idea that Stan is fragile.

Andrew Gelman

unread,
Jul 21, 2016, 9:51:57 PM7/21/16
to stan...@googlegroups.com
We have an example in BDA where split R-hat diagnoses the problem but R-hat doesn't. It's not a real model; I just generated some traceplots that show the problem. The original example that motivated split R-hat was some model that Ken Shirley was fitting several years ago with 3 chains. The more chains you have, the less likely you'll need split R-hat. But with 3 chains, it came in handy. Also, though, that was an example before we had our current n_eff measure that combines between and within-chain information. We used to use a n_eff measure that only used between-chain info, and that was very noisy. I think split R-hat and the new n_eff catch a lot.

Andrew Gelman

unread,
Jul 21, 2016, 9:59:36 PM7/21/16
to stan...@googlegroups.com
Noooooooooooooooooooooo!

Andrew Gelman

unread,
Jul 21, 2016, 10:37:21 PM7/21/16
to stan development mailing list
To elaborate . . . I think we can all agree this is a matter of communication. In one of our next Stan meetings, perhaps we can discuss what we want to communicate to users and how we can best do this. Right now I think we're crying wolf, but I think there's a larger issue in that we would like users to check just about everything using fake-data simulation, but this is not always so easy to do.
A

Michael Betancourt

unread,
Jul 22, 2016, 5:11:03 PM7/22/16
to stan...@googlegroups.com
Fake data simulation consistency is a strong condition
but not a sufficient one for success on real data. To
build a robust analysis we need _all_ diagnostics
(multiple chains and Rhat, sampler diagnostics, fake
data simulation checks) as they all provide unique
views into potential problems.

Andrew Gelman

unread,
Jul 22, 2016, 7:42:57 PM7/22/16
to stan...@googlegroups.com
Currently we are not communicating this well, and I don't think that spewing out more and more warnings is helping. It would be better for this info to be accessible, something less intrusive than warnings, and easier and less platform-dependent than Shinystan. Instead of trying to cram everytihng into the printed output, maybe we should have a new function that gives a bunch of diagnostics with explanations and suggestions.
A

Ben Goodrich

unread,
Jul 25, 2016, 9:14:36 PM7/25/16
to stan development mailing list, gel...@stat.columbia.edu
On Thursday, July 21, 2016 at 8:27:42 PM UTC-4, Andrew Gelman wrote:
To put it another way, had we had this conversation a few months earlier, and Bob had said, "we should leave the diagnostics on by default," these diagnostics would not have included "The estimated Bayesian Fraction of Missing Information."  I very strongly oppose this ratcheting in which more and more warnings get spewed out, thus giving people the idea that Stan is fragile.

Below is an example where the BFMIs are all very low ( 0.2 0.1 0.1 0.3), but most users would consider the chains to be fine. It is a linear hierarchical model for soybean weight with two types of soil, each with 10 locations, observed at 4 different times. The estimated n_eff for the intercepts is low relative to 4000 but most people would care primarily about the coefficients, which look fine except for in a couple of the locations. So, the only red flag in the pre-2.10 output is that the Rhat for lp__ is high, but that is typically discounted by users. Otherwise, they would be analyzing chains that didn't converge in distribution.

Ben
                        mean se_mean    sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
(Intercept):1988F4      0.18    0.03  0.55   -0.90   -0.16    0.17    0.53    1.24   394 1.01
(Intercept):1988F2      0.20    0.03  0.56   -0.96   -0.15    0.20    0.57    1.28   417 1.01
(Intercept):1988F1      0.19    0.03  0.58   -0.95   -0.16    0.20    0.57    1.33   433 1.01
(Intercept):1988F7      0.19    0.03  0.57   -1.00   -0.17    0.19    0.56    1.28   424 1.01
(Intercept):1988F5      0.21    0.03  0.59   -0.97   -0.15    0.22    0.59    1.38   404 1.01
(Intercept):1988F8      0.25    0.03  0.56   -0.83   -0.12    0.22    0.62    1.37   413 1.00
(Intercept):1988F6      0.18    0.03  0.56   -0.93   -0.16    0.18    0.54    1.23   421 1.01
(Intercept):1988F3      0.24    0.03  0.55   -0.86   -0.12    0.23    0.61    1.28   424 1.00
(Intercept):1988P1      0.17    0.03  0.58   -1.03   -0.18    0.17    0.55    1.25   420 1.01
(Intercept):1988P5      0.21    0.03  0.57   -0.91   -0.14    0.20    0.58    1.34   459 1.00
(Intercept):1988P4      0.20    0.03  0.57   -0.93   -0.16    0.19    0.56    1.33   454 1.01
(Intercept):1988P8      0.18    0.03  0.59   -1.01   -0.17    0.18    0.55    1.29   472 1.00
(Intercept):1988P7      0.17    0.03  0.58   -0.99   -0.18    0.17    0.55    1.27   399 1.01
(Intercept):1988P3      0.20    0.03  0.59   -0.99   -0.15    0.20    0.58    1.35   424 1.00
(Intercept):1988P2      0.29    0.03  0.57   -0.82   -0.07    0.26    0.66    1.45   363 1.01
(Intercept):1988P6      0.26    0.03  0.57   -0.86   -0.10    0.25    0.64    1.40   345 1.01
(Intercept):1989F6      0.17    0.03  0.58   -1.00   -0.19    0.18    0.56    1.26   439 1.01
(Intercept):1989F5      0.16    0.03  0.57   -1.03   -0.18    0.15    0.54    1.23   435 1.01
(Intercept):1989F4      0.26    0.03  0.56   -0.84   -0.10    0.23    0.63    1.38   390 1.01
(Intercept):1989F1      0.16    0.03  0.57   -1.01   -0.19    0.16    0.54    1.24   434 1.01
(Intercept):1989F2      0.17    0.03  0.56   -0.99   -0.18    0.17    0.53    1.21   415 1.01
(Intercept):1989F7      0.13    0.03  0.54   -0.95   -0.20    0.14    0.49    1.14   347 1.01
(Intercept):1989F8      0.18    0.03  0.59   -1.01   -0.17    0.19    0.56    1.30   416 1.01
(Intercept):1989F3      0.16    0.03  0.57   -1.02   -0.20    0.16    0.53    1.24   391 1.01
(Intercept):1989P7     -0.30    0.03  0.52   -1.39   -0.63   -0.28    0.05    0.65   282 1.01
(Intercept):1989P4     -0.22    0.03  0.50   -1.21   -0.54   -0.20    0.11    0.73   360 1.01
(Intercept):1989P6     -0.30    0.03  0.53   -1.43   -0.64   -0.28    0.04    0.68   289 1.01
(Intercept):1989P5     -0.31    0.03  0.54   -1.45   -0.64   -0.28    0.05    0.67   284 1.02
(Intercept):1989P1     -0.30    0.03  0.56   -1.48   -0.64   -0.27    0.06    0.67   289 1.01
(Intercept):1989P3     -0.33    0.03  0.53   -1.46   -0.66   -0.30    0.04    0.64   272 1.02
(Intercept):1989P2     -0.28    0.03  0.55   -1.41   -0.63   -0.26    0.07    0.74   339 1.01
(Intercept):1989P8     -0.29    0.03  0.53   -1.38   -0.64   -0.26    0.07    0.67   274 1.02
(Intercept):1990F2     -0.30    0.03  0.52   -1.36   -0.64   -0.28    0.05    0.63   265 1.02
(Intercept):1990F3     -0.19    0.03  0.47   -1.16   -0.50   -0.17    0.13    0.64   329 1.01
(Intercept):1990F4     -0.37    0.03  0.52   -1.46   -0.70   -0.34   -0.01    0.54   235 1.02
(Intercept):1990F5     -0.15    0.03  0.47   -1.09   -0.45   -0.13    0.17    0.73   336 1.01
(Intercept):1990F1     -0.31    0.03  0.53   -1.39   -0.63   -0.29    0.05    0.69   277 1.02
(Intercept):1990F8     -0.31    0.03  0.52   -1.41   -0.64   -0.28    0.05    0.63   281 1.02
(Intercept):1990F7     -0.30    0.03  0.53   -1.42   -0.62   -0.27    0.06    0.68   266 1.02
(Intercept):1990F6     -0.30    0.03  0.54   -1.40   -0.63   -0.27    0.06    0.68   293 1.01
(Intercept):1990P8     -0.30    0.03  0.56   -1.45   -0.64   -0.28    0.07    0.67   278 1.02
(Intercept):1990P7     -0.23    0.03  0.49   -1.20   -0.55   -0.21    0.10    0.69   343 1.01
(Intercept):1990P3     -0.29    0.03  0.55   -1.41   -0.63   -0.27    0.06    0.75   308 1.01
(Intercept):1990P1     -0.28    0.03  0.54   -1.42   -0.61   -0.26    0.07    0.71   324 1.01
(Intercept):1990P6     -0.30    0.03  0.55   -1.48   -0.62   -0.27    0.06    0.69   285 1.01
(Intercept):1990P5     -0.29    0.03  0.53   -1.40   -0.62   -0.27    0.06    0.70   307 1.01
(Intercept):1990P2     -0.34    0.03  0.54   -1.48   -0.66   -0.31    0.04    0.61   254 1.02
(Intercept):1990P4     -0.30    0.03  0.54   -1.43   -0.63   -0.27    0.05    0.66   287 1.02
Time:1988F4            -0.09    0.00  0.04   -0.16   -0.11   -0.09   -0.06   -0.01   725 1.00
I(Time^2):1988F4        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  1111 1.00
Time:1988F2            -0.10    0.00  0.04   -0.18   -0.12   -0.10   -0.07   -0.01   773 1.00
I(Time^2):1988F2        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  1159 1.00
Time:1988F1            -0.13    0.00  0.06   -0.23   -0.16   -0.13   -0.09    0.00  1084 1.00
I(Time^2):1988F1        0.01    0.00  0.00    0.00    0.01    0.01    0.01    0.01  1492 1.00
Time:1988F7            -0.06    0.00  0.06   -0.19   -0.10   -0.06   -0.02    0.07  1442 1.00
I(Time^2):1988F7        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1988F5             0.01    0.02  0.18   -0.21   -0.14   -0.09    0.19    0.33    83 1.02
I(Time^2):1988F5        0.00    0.00  0.00    0.00    0.00    0.00    0.01    0.01    79 1.02
Time:1988F8            -0.09    0.00  0.04   -0.17   -0.12   -0.09   -0.07   -0.02   800 1.00
I(Time^2):1988F8        0.00    0.00  0.00    0.00    0.00    0.00    0.01    0.01  1270 1.00
Time:1988F6            -0.12    0.00  0.05   -0.21   -0.15   -0.13   -0.10   -0.03  1079 1.00
I(Time^2):1988F6        0.00    0.00  0.00    0.00    0.00    0.00    0.01    0.01  1774 1.00
Time:1988F3            -0.13    0.00  0.04   -0.21   -0.16   -0.13   -0.10   -0.05   774 1.01
I(Time^2):1988F3        0.01    0.00  0.00    0.00    0.00    0.01    0.01    0.01  1186 1.00
Time:1988P1            -0.03    0.00  0.07   -0.16   -0.07   -0.03    0.01    0.11  1739 1.00
I(Time^2):1988P1        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1988P5            -0.16    0.00  0.06   -0.28   -0.20   -0.16   -0.12   -0.03  1806 1.00
I(Time^2):1988P5        0.01    0.00  0.00    0.00    0.01    0.01    0.01    0.01  4000 1.00
Time:1988P4            -0.11    0.00  0.07   -0.24   -0.15   -0.11   -0.07    0.02  1875 1.00
I(Time^2):1988P4        0.01    0.00  0.00    0.00    0.00    0.01    0.01    0.01  4000 1.00
Time:1988P8            -0.06    0.00  0.06   -0.17   -0.10   -0.06   -0.02    0.06  1443 1.00
I(Time^2):1988P8        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1988P7            -0.05    0.00  0.06   -0.17   -0.09   -0.05   -0.02    0.07  1479 1.00
I(Time^2):1988P7        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  2370 1.00
Time:1988P3            -0.14    0.00  0.09   -0.31   -0.20   -0.15   -0.09    0.03  1900 1.00
I(Time^2):1988P3        0.01    0.00  0.00    0.00    0.01    0.01    0.01    0.01  2397 1.00
Time:1988P2            -0.15    0.00  0.04   -0.21   -0.17   -0.15   -0.12   -0.08   566 1.01
I(Time^2):1988P2        0.01    0.00  0.00    0.00    0.01    0.01    0.01    0.01   880 1.01
Time:1988P6            -0.13    0.00  0.04   -0.20   -0.16   -0.13   -0.11   -0.05   656 1.01
I(Time^2):1988P6        0.01    0.00  0.00    0.00    0.01    0.01    0.01    0.01  1052 1.00
Time:1989F6            -0.03    0.00  0.05   -0.12   -0.06   -0.03    0.00    0.07  1283 1.00
I(Time^2):1989F6        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  2209 1.00
Time:1989F5            -0.02    0.00  0.04   -0.11   -0.05   -0.02    0.01    0.07  1013 1.00
I(Time^2):1989F5        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1599 1.00
Time:1989F4            -0.10    0.00  0.04   -0.17   -0.12   -0.10   -0.07   -0.02   648 1.00
I(Time^2):1989F4        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   884 1.00
Time:1989F1            -0.03    0.00  0.05   -0.11   -0.06   -0.03    0.00    0.07  1305 1.00
I(Time^2):1989F1        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1972 1.00
Time:1989F2            -0.01    0.00  0.04   -0.09   -0.03    0.00    0.02    0.08   938 1.00
I(Time^2):1989F2        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1606 1.00
Time:1989F7            -0.02    0.00  0.03   -0.09   -0.04   -0.02    0.00    0.04   518 1.01
I(Time^2):1989F7        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   872 1.00
Time:1989F8            -0.01    0.00  0.05   -0.11   -0.04    0.00    0.03    0.09  1103 1.00
I(Time^2):1989F8        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1850 1.00
Time:1989F3            -0.01    0.00  0.05   -0.10   -0.04   -0.01    0.02    0.08  1294 1.00
I(Time^2):1989F3        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  4000 1.00
Time:1989P7            -0.04    0.00  0.04   -0.12   -0.07   -0.04   -0.02    0.04   890 1.01
I(Time^2):1989P7        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1814 1.00
Time:1989P4            -0.07    0.00  0.03   -0.13   -0.09   -0.08   -0.06   -0.01   528 1.01
I(Time^2):1989P4        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   740 1.00
Time:1989P6            -0.01    0.00  0.07   -0.15   -0.05   -0.01    0.03    0.12  1993 1.00
I(Time^2):1989P6        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  2752 1.00
Time:1989P5            -0.03    0.00  0.05   -0.12   -0.06   -0.03    0.00    0.07   999 1.00
I(Time^2):1989P5        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1750 1.00
Time:1989P1             0.00    0.00  0.07   -0.16   -0.05    0.00    0.04    0.13  1582 1.00
I(Time^2):1989P1        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1989P3            -0.02    0.00  0.04   -0.10   -0.05   -0.02    0.00    0.05   746 1.00
I(Time^2):1989P3        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1375 1.00
Time:1989P2             0.03    0.00  0.07   -0.10   -0.01    0.04    0.08    0.15  1503 1.00
I(Time^2):1989P2        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  2750 1.00
Time:1989P8            -0.03    0.00  0.09   -0.22   -0.08   -0.02    0.04    0.14  4000 1.00
I(Time^2):1989P8        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1990F2            -0.04    0.00  0.05   -0.13   -0.07   -0.04   -0.01    0.05  1103 1.01
I(Time^2):1990F2        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  4000 1.00
Time:1990F3            -0.04    0.00  0.03   -0.09   -0.06   -0.05   -0.03    0.01   424 1.01
I(Time^2):1990F3        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   794 1.01
Time:1990F4             0.00    0.00  0.03   -0.06   -0.02    0.00    0.02    0.06   409 1.01
I(Time^2):1990F4        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   784 1.01
Time:1990F5            -0.05    0.00  0.02   -0.10   -0.07   -0.06   -0.04   -0.01   415 1.01
I(Time^2):1990F5        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   530 1.01
Time:1990F1            -0.01    0.00  0.06   -0.12   -0.05   -0.01    0.02    0.10  1249 1.00
I(Time^2):1990F1        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  2009 1.00
Time:1990F8            -0.03    0.00  0.05   -0.12   -0.06   -0.03    0.00    0.08  1089 1.00
I(Time^2):1990F8        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  2246 1.00
Time:1990F7            -0.06    0.00  0.06   -0.18   -0.10   -0.06   -0.02    0.06  1648 1.00
I(Time^2):1990F7        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  2485 1.00
Time:1990F6            -0.06    0.00  0.06   -0.16   -0.09   -0.06   -0.02    0.06  1254 1.01
I(Time^2):1990F6        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  2484 1.00
Time:1990P8             0.02    0.00  0.07   -0.12   -0.02    0.02    0.06    0.14  1744 1.00
I(Time^2):1990P8        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1990P7            -0.07    0.00  0.03   -0.13   -0.09   -0.07   -0.05   -0.01   554 1.01
I(Time^2):1990P7        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00   793 1.00
Time:1990P3             0.03    0.00  0.06   -0.09   -0.01    0.03    0.07    0.14  1381 1.00
I(Time^2):1990P3        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  4000 1.00
Time:1990P1            -0.06    0.00  0.05   -0.15   -0.09   -0.06   -0.03    0.05  1047 1.00
I(Time^2):1990P1        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  1560 1.00
Time:1990P6             0.00    0.00  0.07   -0.14   -0.05    0.00    0.04    0.12  1369 1.00
I(Time^2):1990P6        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1990P5             0.00    0.00  0.07   -0.15   -0.04    0.01    0.05    0.13  2064 1.00
I(Time^2):1990P5        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
Time:1990P2            -0.01    0.00  0.04   -0.09   -0.04   -0.01    0.02    0.07   677 1.00
I(Time^2):1990P2        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.00  1322 1.00
Time:1990P4             0.00    0.00  0.06   -0.13   -0.04    0.00    0.03    0.11  1612 1.00
I(Time^2):1990P4        0.00    0.00  0.00    0.00    0.00    0.00    0.00    0.01  4000 1.00
sigma:1988F4            1.01    0.00  0.30    0.61    0.80    0.95    1.16    1.76  4000 1.00
sigma:1988F2            1.25    0.01  0.43    0.72    0.96    1.15    1.43    2.36  2350 1.00
sigma:1988F1            1.92    0.01  0.59    1.15    1.51    1.80    2.18    3.41  2672 1.00
sigma:1988F7            2.35    0.01  0.64    1.46    1.91    2.24    2.65    3.94  4000 1.00
sigma:1988F5            2.48    0.22  2.13    0.46    0.75    1.23    4.22    7.14    93 1.01
sigma:1988F8            1.00    0.01  0.33    0.57    0.77    0.93    1.15    1.82  2726 1.00
sigma:1988F6            1.41    0.01  0.41    0.84    1.12    1.33    1.61    2.45  4000 1.00
sigma:1988F3            1.15    0.01  0.34    0.68    0.92    1.10    1.32    1.98  4000 1.00
sigma:1988P1            2.40    0.01  0.69    1.46    1.91    2.27    2.73    4.04  4000 1.00
sigma:1988P5            2.18    0.01  0.64    1.33    1.73    2.06    2.50    3.72  4000 1.00
sigma:1988P4            2.28    0.01  0.69    1.39    1.81    2.15    2.60    3.95  4000 1.00
sigma:1988P8            1.91    0.01  0.55    1.16    1.52    1.80    2.16    3.25  2591 1.00
sigma:1988P7            2.01    0.01  0.61    1.20    1.59    1.89    2.29    3.45  2685 1.00
sigma:1988P3            3.18    0.02  0.90    1.94    2.56    3.01    3.60    5.45  2459 1.00
sigma:1988P2            0.81    0.01  0.26    0.47    0.64    0.77    0.92    1.46  1587 1.00
sigma:1988P6            1.00    0.01  0.33    0.59    0.78    0.93    1.14    1.85  1984 1.00
sigma:1989F6            1.80    0.01  0.59    1.01    1.40    1.69    2.07    3.25  2529 1.00
sigma:1989F5            1.61    0.01  0.55    0.90    1.22    1.49    1.86    3.01  2064 1.00
sigma:1989F4            1.02    0.01  0.40    0.55    0.75    0.92    1.16    2.03  1468 1.00
sigma:1989F1            1.66    0.01  0.57    0.94    1.28    1.54    1.92    3.06  2540 1.00
sigma:1989F2            1.72    0.01  0.55    1.00    1.35    1.61    1.98    3.09  2549 1.00
sigma:1989F7            0.87    0.01  0.30    0.48    0.66    0.80    1.01    1.63  3015 1.00
sigma:1989F8            2.20    0.01  0.68    1.28    1.72    2.06    2.53    3.83  4000 1.00
sigma:1989F3            1.76    0.01  0.55    1.00    1.36    1.65    2.04    3.16  4000 1.00
sigma:1989P7            1.34    0.01  0.49    0.73    1.02    1.23    1.55    2.57  2009 1.00
sigma:1989P4            0.70    0.01  0.27    0.37    0.51    0.64    0.81    1.39  1690 1.01
sigma:1989P6            2.86    0.02  0.99    1.59    2.18    2.66    3.29    5.28  2724 1.00
sigma:1989P5            1.79    0.01  0.63    1.01    1.35    1.65    2.06    3.44  2098 1.00
sigma:1989P1            3.52    0.02  1.08    2.07    2.76    3.32    4.03    6.24  2755 1.00
sigma:1989P3            1.25    0.01  0.49    0.67    0.92    1.14    1.44    2.56  1760 1.00
sigma:1989P2            3.24    0.02  1.04    1.86    2.51    3.03    3.72    5.89  2685 1.00
sigma:1989P8            4.69    0.02  1.43    2.76    3.70    4.40    5.33    8.22  4000 1.00
sigma:1990F2            1.65    0.01  0.59    0.90    1.25    1.51    1.91    3.17  2527 1.00
sigma:1990F3            0.48    0.00  0.20    0.25    0.35    0.44    0.55    0.99  1979 1.00
sigma:1990F4            0.81    0.01  0.31    0.43    0.60    0.74    0.94    1.60  1755 1.00
sigma:1990F5            0.41    0.00  0.17    0.20    0.29    0.37    0.47    0.84  1803 1.00
sigma:1990F1            2.09    0.02  0.77    1.16    1.57    1.92    2.41    4.03  1508 1.00
sigma:1990F8            1.71    0.01  0.60    0.94    1.30    1.59    1.98    3.21  2400 1.00
sigma:1990F7            2.24    0.02  0.80    1.21    1.67    2.06    2.59    4.35  2225 1.00
sigma:1990F6            2.09    0.02  0.71    1.18    1.60    1.94    2.40    3.84  2235 1.00
sigma:1990P8            2.80    0.02  0.88    1.64    2.18    2.61    3.21    5.00  2466 1.00
sigma:1990P7            0.67    0.01  0.27    0.35    0.49    0.61    0.77    1.37  1840 1.00
sigma:1990P3            2.34    0.02  0.79    1.33    1.80    2.17    2.69    4.26  2334 1.00
sigma:1990P1            1.78    0.01  0.65    0.97    1.34    1.63    2.03    3.46  1920 1.00
sigma:1990P6            2.93    0.02  0.96    1.66    2.27    2.72    3.36    5.40  2840 1.00
sigma:1990P5            3.10    0.02  0.99    1.79    2.40    2.89    3.57    5.65  2127 1.00
sigma:1990P2            1.28    0.01  0.46    0.69    0.97    1.19    1.49    2.43  2248 1.00
sigma:1990P4            2.45    0.02  0.81    1.40    1.90    2.29    2.81    4.47  2593 1.00
log-posterior        -873.06    8.53 42.48 -940.16 -902.44 -880.40 -850.96 -770.97    25 1.20

Andrew Gelman

unread,
Jul 25, 2016, 9:46:38 PM7/25/16
to Ben Goodrich, stan development mailing list
Hi, yes, I agree that I would not stop when R-hat is 1.2.  I'm just curious:  in this particular example are there some parameters where the inferences are way off?

More generally, I like the idea of the program spitting out some condition number, like 0 if everything seems perfect (no divergent transitions, all R-hats less than 1.1, all n_eff's greater than 100, etc), 1 if there are some potential problems, 2 if there are more problems, etc.  Then we have another function that gives out diagnostics.

A

Ben Goodrich

unread,
Jul 25, 2016, 9:58:51 PM7/25/16
to stan development mailing list, gel...@stat.columbia.edu
On Monday, July 25, 2016 at 9:46:38 PM UTC-4, Andrew Gelman wrote:
Hi, yes, I agree that I would not stop when R-hat is 1.2.  I'm just curious:  in this particular example are there some parameters where the inferences are way off?

It is real data so we don't know what the true values are. We do know that the intercepts are systematically different across chains despite the Rhats being 1.00 to 1.02. The Rhat for lp__ is a bit like the BFMI in that it depends on all the margins. But we have tended to say not to worry about the Rhat for lp__ because the marginal distribution of lp__ tends to be not normal. The BFMI being <= 0.3 was the most obvious numerical indicator of danger.

Ben

Andrew Gelman

unread,
Jul 25, 2016, 11:22:46 PM7/25/16
to stan...@googlegroups.com
When I say "inferences are way off," I didn't mean relative to true values, I meant relative to the posterior distribution.

Regarding lp__ being nonnormal:  R-hat should still be interpretable for nonnormal distributions, unless you're talking about distributions with such long tails that the 2nd moment doesn't exist.  But that shouldn't be a problem with lp__.  More generally, we do have various R-hat-like measures that will work for long-tailed distributions, so we could consider these.

A

Ben Goodrich

unread,
Jul 25, 2016, 11:36:14 PM7/25/16
to stan development mailing list, gel...@stat.columbia.edu
On Monday, July 25, 2016 at 11:22:46 PM UTC-4, Andrew Gelman wrote:
When I say "inferences are way off," I didn't mean relative to true values, I meant relative to the posterior distribution.

Well, we don't yet have good draws from the posterior distribution either, but the chains are way off from each other.

Michael Betancourt

unread,
Jul 26, 2016, 10:10:37 AM7/26/16
to stan...@googlegroups.com
There will certainly be a strong overlap between strong autocorrelations in lp__
and a small BFMI (maybe we should call it E-BFMI to specify that it’s the BFMI
of the energy transitions?), but the overlap isn’t complete so there is useful
information in both.  Ultimately the E-BFMI most readily identifies the heavy-tailed
distributions which may obstruct geometric ergodicity and hence for which we cannot
trust the MCSE.  This looks like an excellent example of that behavior.

Bob Carpenter

unread,
Jul 26, 2016, 4:33:32 PM7/26/16
to stan...@googlegroups.com
You guys should write all these examples up in a paper focused
on MCMC diagnostics. I'm happy to help translating to and running
JAGS.

- Bob

Michael Betancourt

unread,
Jul 26, 2016, 5:55:43 PM7/26/16
to stan...@googlegroups.com
To be clear, outside of R-hat there aren’t really any useful MCMC
diagnostics. Divergences and this E-BFMI are unique to HMC,
which is another reason why HMC is so special!

Andrew Gelman

unread,
Jul 26, 2016, 8:45:53 PM7/26/16
to stan...@googlegroups.com
There are other R-hat-like diagnostics, for example computing between-chain coverage of within-chain intervals. Somewhere I have a paper mentioning a couple of these ideas.

Ben Goodrich

unread,
Jul 26, 2016, 8:57:37 PM7/26/16
to stan development mailing list
On Tuesday, July 26, 2016 at 5:55:43 PM UTC-4, Michael Betancourt wrote:
To be clear, outside of R-hat there aren’t really any useful MCMC
diagnostics.
 
Disagree


Michael Betancourt

unread,
Jul 27, 2016, 4:32:47 AM7/27/16
to stan...@googlegroups.com
Everything else is just some tweak of an ANOVA-like test or
a measure of autocorrelation.  Without having sampler-specific
information this is literally all the Markov chain provides that 
can be tested.  Rhat wins over other ANOVA-like test that I’ve
ever seen (the multiple chains being critical) whereas the
autocorrelation measures are all limited by interpretability,
at least in my opinion.  

Again, the novel thing in the HMC diagnostics is that they
are direct diagnostics on the _interaction_ of the sampler
and the target distribution, and not just an obfuscated 
manifestation of that interaction.


Seth Flaxman

unread,
Jul 27, 2016, 5:44:15 AM7/27/16
to stan...@googlegroups.com
Has anyone looked into the recent Stein stuff [2016a,2016b,2016c] or Grosse et al's bidirectional thing [2016]? (they implemented it to work with Stan apparently.)

Michael Betancourt

unread,
Jul 27, 2016, 6:03:50 AM7/27/16
to stan...@googlegroups.com
Yes and yes.  The Stein stuff at this point is weaker than our existing
theoretical work (http://arxiv.org/abs/1601.08057) and I couldn’t find
any way to spin it into a heuristic diagnostic (although I might have
some new ideas on that).  Roger’s bidirectional thing requires having
a tempering schedule and all kinds of other tunings that make it
rather fragile to implement well in practice.  
Reply all
Reply to author
Forward
0 new messages