@Michael Betancourt: are your slides from the Stan Conference 2017 available online?

Christian Schuhegger

unread,

Mar 11, 2017, 12:28:03 PM3/11/17

to Stan users mailing list

Hello Michael Betancourt,

I was watching the video about your talk: "Everything You Should Have Learned About Markov Chain Monte Carlo" and your explanations helped me a lot to understand what is going on in MCMC! Really a great talk!!

https://www.youtube.com/watch?v=DJ0c7Bm5Djk&feature=youtu.be&t=4h40m9s

I am planning to organize a dev-meetup at our company to present MCMC (PyStan and PyMC3) to the larger developer group and wanted to ask if I may use some of your slides that explain the geometry of the problem.

I was searching for the slides on the web, but couldn't find them.

Thanks a lot,

Christian

Michael Betancourt

unread,

Mar 11, 2017, 1:32:06 PM3/11/17

to stan-...@googlegroups.com

Christian,

My slides are not available online. You can, however, find the source

of most of my figures in the conceptual introduction on the arXiv,

https://arxiv.org/format/1701.02434.

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christian Schuhegger

unread,

Mar 11, 2017, 1:59:26 PM3/11/17

to Stan users mailing list

Thank you very much!

Christian Schuhegger

unread,

Mar 13, 2017, 2:09:07 AM3/13/17

to Stan users mailing list

Hi Michael,

thanks a lot for the link to your paper! This was exactly what I needed for developing a better understanding. My background is physics and your explanation of the analogy of HMC to gravitational systems in phase space was very helpful. It also makes clear to me now what the word “Hamiltonian” means in that context. I still need to read more and your references in that paper are on my reading list now :)

I understand that NUTS is a variant/improvement of HMC, but otherwise the same “rules” apply for HMC as for NUTS. Is there any reason to continue to use the HMC sampler in Stan rather than NUTS?

If this analogy with the gravitational system is right then I have a question about multi-modal distributions. If I take the analogy that in a multi modal system it could happen that there are more than one stable orbits, e.g. a stable orbit around earth plus a stable orbit around the moon. Will HMC/NUTS find all of the orbits or do I have to be careful in such scenarios? Would R_hat still be a valid “problem indicator” in such a scenario? I guess that R_hat would be different from 1 if two different chains find two different orbits? So my question would be how to deal with multi-modal systems?

My original problem that I try to solve that put me on the track of your video is a mixture model of shifted gammas. This is not a real world problem, but just an exercise that I’ve chosen for myself in order to learn MCMC techniques. My problem with this exercise is described here: http://stackoverflow.com/questions/42735489/fit-a-mixture-model-of-two-shifted-gamma-distributions-in-pymc3

And I created a PyMC3 version and a Stan version as gists on github: https://gist.github.com/cs224/36482b4f52885310cec6ef975fd20184

I guess I am making more fundamental mistakes than “just” trying to fit a multi-modal model, but nevertheless my question would be how to do something like this best in Stan?

Thanks for any further hints that help me find my way into the world of MCMC.

Thanks,

Christian

Christian Schuhegger

unread,

Mar 13, 2017, 2:10:31 AM3/13/17

to Stan users mailing list

Hello Michael,

as a side question: in your paper you somehow start to draw a “family tree” of MCMC algorithms. At least you mention the Gibbs/Metropolis and the HMC branches. How would the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman

& Weare (2010) fit into this picture? I am asking, because it seems that this algorithm can be parallelized and put onto GPUs. But all of this will not help if (talking in your geometrical picture) this affine-invariant ensemble sampler cannot deal with high dimensional spaces in a conceptual “good” way like HMC does. What is your view on this?

Here is the link to some development to put the affine-invariant ensemble sampler on a GPU: http://dragan.rocks/articles/17/Bayadera-Bayes-Clojure-GPU-slides-Bobkonferenz

And here is a video where Dragan Djuric talks about his implementation: https://www.youtube.com/watch?v=bEOOYbscyTs

Thanks,

Christian

Christian Schuhegger

unread,

Mar 13, 2017, 2:11:24 AM3/13/17

to Stan users mailing list

Hello Michael,

I have another side question to your talk at StanCon. I simply quote two passages from your talk:

-- snip start --

https://www.youtube.com/watch?v=DJ0c7Bm5Djk&feature=youtu.be&t=4h45m20s

Statistics was this menagerie of techniques and approaches and there was not unified theory underneath all of this all. In Bayesian inference there is really only two steps, there is the modeling step where you chose the prior and the likelihood and there is the computation step where you compute expectations with respect to the posterior. … how is that computation trying to do an integral. If it’s not clear how your computational algorithm is trying to do an integral, probably a good sign of doing something weird.

https://www.youtube.com/watch?v=DJ0c7Bm5Djk&feature=youtu.be&t=5h00m00s

Now what’s nice about this is that you can invert the logic and using it to evaluate all of the algorithms you might know about. Statistical computation is all about computing integrals and computing integrals is all about quantifying this typical set. That must mean that all of this statistic algorithms are just different ways of quantifying this typical set. And indeed you can analyse them all from this perspective and start seeing why they fail.

-- snip end --

I could not have summarized my own problem with traditional statistics any better than you did! And this is also my main attraction to the Bayesian approach, because it is from a conceptional perspective clean and clear.

Do you have pointers that show how you analyse traditional statistical algorithms from this perspective of computing integrals? I would be very much interested in reading more about this.

Thanks a lot!

Christian

Bob Carpenter

unread,

Mar 13, 2017, 2:23:31 PM3/13/17

to stan-...@googlegroups.com

> On Mar 13, 2017, at 2:09 AM, Christian Schuhegger <christian....@gmail.com> wrote:
>
> Hi Michael,
>
> thanks a lot for the link to your paper! This was exactly what I needed for developing a better understanding. My background is physics and your explanation of the analogy of HMC to gravitational systems in phase space was very helpful. It also makes clear to me now what the word “Hamiltonian” means in that context. I still need to read more and your references in that paper are on my reading list now :)
>
> I understand that NUTS is a variant/improvement of HMC, but otherwise the same “rules” apply for HMC as for NUTS. Is there any reason to continue to use the HMC sampler in Stan rather than NUTS?

Only for comparison and for use as bits of other algorithms.

> If this analogy with the gravitational system is right

Potential = negative log density
Kinetic = random standard normal at start of iteration

The algorithm then uses the leapfrog integrator to simulate the
Hamiltonian and draws point along trajectory. NUTS then
just controls how long the Hamiltonian is simulated forward
and backward in time and how a point is selected from along
the trajectory for the next draw.

> then I have a question about multi-modal distributions. If I take the analogy that in a multi modal system it could happen that there are more than one stable orbits, e.g. a stable orbit around earth plus a stable orbit around the moon. Will HMC/NUTS find all of the orbits or do I have to be careful in such scenarios?

No. Nothing will. If there's only a few, finding them
won't be the big problem---deciding how much time to spend
on each will be.

> Would R_hat still be a valid “problem indicator” in such a scenario? I guess that R_hat would be different from 1 if two different chains find two different orbits? So my question would be how to deal with multi-modal systems?

Rhat will explode if the different chains fall into different modes.
That's one thing we use it to diagnose.

> My original problem that I try to solve that put me on the track of your video is a mixture model of shifted gammas. This is not a real world problem, but just an exercise that I’ve chosen for myself in order to learn MCMC techniques.

You probably want to read Michael's case study (see our web site
under DOCS) on mixture models.

> My problem with this exercise is described here: http://stackoverflow.com/questions/42735489/fit-a-mixture-model-of-two-shifted-gamma-distributions-in-pymc3
>
> And I created a PyMC3 version and a Stan version as gists on github: https://gist.github.com/cs224/36482b4f52885310cec6ef975fd20184
>
> I guess I am making more fundamental mistakes than “just” trying to fit a multi-modal model, but nevertheless my question would be how to do something like this best in Stan?

There aren't really any good techniques for fitting general multimodal
models. Mixture models we usually code directly and they wind up not
being so multimodal on their paramters.

- Bob

Bob Carpenter

unread,

Mar 13, 2017, 2:29:29 PM3/13/17

to stan-...@googlegroups.com

The ensemble samplers go in the "doesn't scale well with dimension"
category. We implemented the Goodman and Weare sampler, and it
fared as poorly in high dimensions as we expected it would. So
we left it on a branch rather than encouraging our users to waste
time on it.

The problem is that interpolating between two points in the typical set
is unlikely to fall back in the typical set in high dimensions,
because you'll be in a thin shell around the mode.

It's tempting to speed up these poor algorithms because they're
easy to parallelize. It's a good example of the "streetlight effect"
(colloquially, "drunk under a lamp post"):

https://en.wikipedia.org/wiki/Streetlight_effect

- Bob

Bob Carpenter

unread,

Mar 13, 2017, 2:33:29 PM3/13/17

to stan-...@googlegroups.com

> On Mar 13, 2017, at 2:11 AM, Christian Schuhegger <christian....@gmail.com> wrote:
>
> ...

> Do you have pointers that show how you analyse traditional statistical algorithms from this perspective of computing integrals? I would be very much interested in reading more about this.

MCMC methods were defined to compute integrals. Specifically,
expectations of functions over densities. The typical set's just
the set over which you need to integrate to compute expectations
w.r.t. a density. In Bayes, the densities are posteriors of parameters
conditioned on data.

Event probabilities are expectations of indicator functions over params.
Bayesian estimates in the form of posterior means are expectations
of the parameters themselves. Predictive inference is just another
integral of one of these forms depending on how you're doing prediction.

They approximate
the posterior density in the typical set and the typical set is
the region you need to integrate over to get the right result (by
definition).

- Bob

Michael Betancourt

unread,

Mar 14, 2017, 10:28:59 AM3/14/17

to Stan users mailing list

I understand that NUTS is a variant/improvement of HMC, but otherwise the same “rules” apply for HMC as for NUTS. Is there any reason to continue to use the HMC sampler in Stan rather than NUTS?

I refer to Hamiltonian Monte Carlo as the general mathematical method.

NUTS is an implementation of that method. So is “static HMC” of “vanilla

HMC” which is to how the original implementation of the algorithm.

There is no reason to use static HMC unless you know that your target

distribution is very very very close to a Gaussian.

If this analogy with the gravitational system is right then I have a question about multi-modal distributions. If I take the analogy that in a multi modal system it could happen that there are more than one stable orbits, e.g. a stable orbit around earth plus a stable orbit around the moon. Will HMC/NUTS find all of the orbits or do I have to be careful in such scenarios? Would R_hat still be a valid “problem indicator” in such a scenario? I guess that R_hat would be different from 1 if two different chains find two different orbits? So my question would be how to deal with multi-modal systems?

HMC can do okay for weakly multimodal problems, but if the modes are

well separated then it will not perform well. R_hat with multiple chains is

a critical diagnostic.

To tackle multimodal problems you have to either make them unimodal

or consider algorithms in development, https://arxiv.org/abs/1405.3489.

My original problem that I try to solve that put me on the track of your video is a mixture model of shifted gammas. This is not a real world problem, but just an exercise that I’ve chosen for myself in order to learn MCMC techniques. My problem with this exercise is described here: http://stackoverflow.com/questions/42735489/fit-a-mixture-model-of-two-shifted-gamma-distributions-in-pymc3

This is not a great model for learning MCMC as multimodal pathologies

are relatively advanced. In any case, http://mc-stan.org/documentation/case-studies/identifying_mixture_models.html.

Michael Betancourt

unread,

Mar 14, 2017, 10:31:42 AM3/14/17

to stan-...@googlegroups.com

Goodman and Weare, and other ensemble samplers such as differential evolution,

do not scale up to more than O(10) dimensions. It’s straightforward to see why if

you take a cartoon typical set, distribute walkers across it, and then see what the

ensemble moves would do.

GPUs will not solve statistics. GPUs may be helpful in very particular circumstances

where scalable algorithms admits parallelizable sub-calculations.

Michael Betancourt

unread,

Mar 14, 2017, 10:32:39 AM3/14/17

to stan-...@googlegroups.com

Do you have pointers that show how you analyse traditional statistical algorithms from this perspective of computing integrals? I would be very much interested in reading more about this.

https://github.com/betanalpha/stan_intro, Chapters 5 and 6.

Christian Schuhegger

unread,

Mar 14, 2017, 9:55:31 PM3/14/17

to Stan users mailing list

@Bob and @Michael: thanks a lot for your very helpful explanations!

I am currently reading the Stan manual cover to cover and then will go into the case studies and your further links. After that I will continue to read more of the papers and references you pointed me to.

At least I have now a good plan for working myself into this topic area!

Thanks,

Christian

Christian Schuhegger

unread,

May 6, 2017, 12:52:43 AM5/6/17

to Stan users mailing list

I am a bit behind my own schedule, but I've now finished reading through the complete stan manual cover to cover and the stan_intro. As a next step I'll work through the examples in the stan manual.

I wanted to use the chance to thank everybody who was involved in creating the stan manual and the stan_intro. Especially the stan_intro answered MANY of the questions I accumulated over the years! Great job!

Reply all

Reply to author

Forward