rough version of document for users about warning messages

63 views
Skip to first unread message

Jonah Gabry

unread,
Jul 26, 2016, 6:15:25 PM7/26/16
to stan development mailing list
As discussed towards the end of the meeting today, I just wrote up something quickly describing a few of the more common warning messages users see and how they should react when seeing them. Draft is attached.

Three things:

- Does anyone object or have anything to add to the "Why does Stan give so many warnings?" section?

- Does anyone see anything wrong with my descriptions of what the warnings mean or the recommendations for how to proceed?

- Which other warnings should I include? I was thinking maybe the warning about Jacobians, which is important if not a false positive, but potentially confusing if a false positive.


Jonah

warnings.html

Eric Novik

unread,
Jul 26, 2016, 10:42:21 PM7/26/16
to stan...@googlegroups.com
Should we have a workflow section? I agree with Andrew that this is not emphasized enough even though for me it was (and still is) a very important part of Bayesian adaptation. The workflow is like a vaccine against many pathologies, which is why I think it belongs in this document.



Jonah

--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
Jul 26, 2016, 11:28:50 PM7/26/16
to stan development mailing list
On Tuesday, July 26, 2016 at 6:15:25 PM UTC-4, Jonah Gabry wrote:
As discussed towards the end of the meeting today, I just wrote up something quickly describing a few of the more common warning messages users see and how they should react when seeing them. Draft is attached.

Just push it to mc-stan.org. We need the link to exist in order to refer people to it in RStan. People can edit the wordings more later.

Ben

Jonah Gabry

unread,
Jul 26, 2016, 11:51:56 PM7/26/16
to stan development mailing list
On Tuesday, July 26, 2016 at 11:28:50 PM UTC-4, Ben Goodrich wrote:

> Just push it to mc-stan.org. We need
> the link to exist in order to refer people to it in RStan. People can
> edit the wordings more later.
>
> Ben

Ok, not sure it's ready for users yet, but I just put it in the misc folder you started:

https://github.com/stan-dev/stan-dev.github.io/tree/master/misc
http://mc-stan.org/misc/warnings.html

Jonah Gabry

unread,
Jul 26, 2016, 11:55:58 PM7/26/16
to stan development mailing list
On Tuesday, July 26, 2016 at 10:42:21 PM UTC-4, Eric Novik wrote:
> Should we have a workflow section? I agree with Andrew that this is not emphasized enough even though for me it was (and still is) a very important part of Bayesian adaptation. The workflow is like a vaccine against many pathologies, which is why I think it belongs in this document.
>

Sure, could make sense to put that in the same document.

Bob Carpenter

unread,
Jul 27, 2016, 12:26:11 AM7/27/16
to stan...@googlegroups.com
Why put this on mc-stan rather than a wiki page? If
the answer is because it's going to be permanent, then
I'd like to find a more permanent home than misc.

- Bob

Jonah Sol Gabry

unread,
Jul 27, 2016, 12:42:07 AM7/27/16
to stan...@googlegroups.com
Over on stan-dev Ben asked me to put it up so he could link to it from rstan. Probably should go somewhere other than misc, but not sure where. Maybe doc/? I worry about adding to the wiki labyrinth. 
You received this message because you are subscribed to a topic in the Google Groups "stan development mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-dev/Gw14J0GerW0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-dev+u...@googlegroups.com.

Sebastian Weber

unread,
Jul 27, 2016, 2:59:48 AM7/27/16
to stan development mailing list
Thanks much for putting this up, I think our users need this!

One additional recommendation for "Exception … metropolis proposal rejected" may be to suggest

- Use numerically stable expressions, i.e. prefer calculations on the log-scale and use numerically robust functions like log_sum_exp and the like if appropriate

... or something like this. People should calc directly in logs and never switch to natural space unless when really needed (I remember Bob saying something like "seeing exp in a program should scare you"). This is already a quite advanced tip, but from my experience doing so silences Stan quite a bit wrt to these warnings.

Best,
Sebastian

On Wednesday, July 27, 2016 at 6:42:07 AM UTC+2, Jonah Gabry wrote:
> Over on stan-dev Ben asked me to put it up so he could link to it from rstan. Probably should go somewhere other than misc, but not sure where. Maybe doc/? I worry about adding to the wiki labyrinth. 
>
> On Wednesday, July 27, 2016, Bob Carpenter
> Why put this on mc-stan rather than a wiki page?  If
>
> the answer is because it's going to be permanent, then
>
> I'd like to find a more permanent home than misc.
>
>
>
> - Bob
>
>
>
> > On Jul 26, 2016, at 11:55 PM, Jonah Gabry
>
> >
>

Michael Betancourt

unread,
Jul 27, 2016, 4:27:34 AM7/27/16
to stan...@googlegroups.com
The intro needs to be stronger.  We need to make it absolutely clear that
no statistical algorithm is guaranteed to get the right results on all models
and care is always needed.  This is especially confusing for MCMC which
is always quoted as being “unbiased” — but this is true only when you can
run the chains infinitely long.  Whether or not MCMC yields good values
in finite time is a much more challenging problem.

Also, it’s not just a design choice on our part that we show more diagnostics.
We _have_ more diagnostics to show.  This is a huge _feature_, not some
arbitrary choice.

I hate people taking the “particle moving through the distribution” analogy
too seriously.  I think it’s much clearer to say something like “the step size
controls the resolution of the sampler — for particularly hard problems
there are features of the target distribution that are too small for this
resolution.  Consequently the sampler misses those features and returns
biased estimates.  Fortunately, this mismatch of scales manifests as
_divergences_ which provide a practical diagnostic.”

As for recommendations, the increased adapt_delta will work only when
there are a few, say O(1%), divergences.  If there are many more then
there are modeling problems.  Either the model is wrong or a serious
reparameterization is needed.

In the Metropolis section we need to add that the user should check
that the support of their parameters matches the distributions.

Regarding workflow, this is the workflow I recommend in my tutorials:

-- 
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<warnings.html>

Seth Flaxman

unread,
Jul 27, 2016, 7:00:57 AM7/27/16
to stan...@googlegroups.com
Jonah, something to throw on there that I just saw the other day is a metropolis step that's rejected because of a dimensionality mismatch. (If you don't know what I'm talking about I can go find a reproducible example.) In this case, of course, it has nothing to do with sampling, the user just needs to check where their dimensions got screwed up. But as happened when someone asked me about this, they didn't read the error carefully and thought it had something to do with an actual sampling issue...

Seth

Jonah Sol Gabry

unread,
Jul 27, 2016, 11:56:34 AM7/27/16
to stan...@googlegroups.com
On Wed, Jul 27, 2016 at 4:27 AM, Michael Betancourt <betan...@gmail.com> wrote:
The intro needs to be stronger.  We need to make it absolutely clear that
no statistical algorithm is guaranteed to get the right results on all models
and care is always needed.  This is especially confusing for MCMC which
is always quoted as being “unbiased” — but this is true only when you can
run the chains infinitely long.  Whether or not MCMC yields good values
in finite time is a much more challenging problem.

Ok I'll make it more forceful. 
 

Also, it’s not just a design choice on our part that we show more diagnostics.
We _have_ more diagnostics to show.  This is a huge _feature_, not some
arbitrary choice.

Yes, good point. I'll emphasize that diagnostics (e.g. divergences) are unique to HMC, 
although the problems they indicate are not. 
 
I hate people taking the “particle moving through the distribution” analogy
too seriously.  

Ok, I don't take it too seriously but I do like to have some sort of analogy. I think
you've used planetary motion in some of your presentations. How about that one?
Or I just skip the analogies and just link to one of your talks? 
 
I think it’s much clearer to say something like “the step size
controls the resolution of the sampler — for particularly hard problems
there are features of the target distribution that are too small for this
resolution.  Consequently the sampler misses those features and returns
biased estimates.  Fortunately, this mismatch of scales manifests as
_divergences_ which provide a practical diagnostic.”

Ok, nice. I'll use this. 
 

As for recommendations, the increased adapt_delta will work only when
there are a few, say O(1%), divergences.  If there are many more then
there are modeling problems.  Either the model is wrong or a serious
reparameterization is needed.

In the Metropolis section we need to add that the user should check
that the support of their parameters matches the distributions.

Ok will add. 

Thanks for comments!

Jonah Sol Gabry

unread,
Jul 27, 2016, 1:15:14 PM7/27/16
to stan...@googlegroups.com
Ok, I've incorporated Michael and Sebastian's advice. There's an updated version here, and I'm happy to keep making changes if people have more comments or don't like any of what I just added. 

Seth, I know the warning you're referring to, but haven't come across it in a while. If you have a small example that'd be great. 

Jonah

Krzysztof Sakrejda

unread,
Jul 27, 2016, 2:35:23 PM7/27/16
to stan development mailing list
On Wednesday, July 27, 2016 at 1:15:14 PM UTC-4, Jonah Gabry wrote:
> Ok, I've incorporated Michael and Sebastian's advice. There's an updated version here, and I'm happy to keep making changes if people have more comments or don't like any of what I just added. 
>

This part: "While reading this document it is important to keep in mind that there is no statistical algorithm that is guaranteed to get the right results on all models. Markov chain Monte Carlo is no exception. Although it is often advertised as being “unbiased”, this is not a guarantee unless the Markov chains are infinitely long. Whether or not MCMC yields good values in finite time is a much more challenging problem."

Could be something like: "While reading this document it is important to keep in mind that there is no statistical algorithm that is guaranteed to get the right results on all models. [Optimization can get stuck in local modes and even a simple linear regression can give nonsense answers in the presence of colinearity.] Markov chain Monte Carlo is no exception. Although it is often advertised as being “unbiased”, this is not a guarantee unless the Markov chains are infinitely long. Whether or not MCMC yields good values in finite time is a much more challenging problem."

The maximum tree depth could give a rough idea of how much memory Stan will suck up with increasing depth. I don't recall off hand... [number of parameters]*2^treedepth (?)

Should there be a section about the BFMI? I feel like we lack guidance about the exact cut-off but we could give an example for when it is low and why it's low (something from Michael's write-up would be fine). For example this snippet gives low BFMI:

library(rstan); stan(model_code='parameters { real x;} model { x ~ cauchy(0,1);}')

The BFMI's are near .3 and we know that it's an example that runs and gives fine R-hats but doesn't give correct quantiles. The current warning seems to be the worst-case where we give an ominous warning with no context.

Krzysztof

Bob Carpenter

unread,
Jul 27, 2016, 7:40:25 PM7/27/16
to stan...@googlegroups.com
Just assign a size 3 array to a size 2 array in
the transformed parameters or model block.

- Bob
Reply all
Reply to author
Forward
0 new messages