Use adapt_diag_e_nuts outside of Stan

58 views
Skip to first unread message

John Jumper

unread,
Apr 30, 2017, 10:30:33 PM4/30/17
to Stan users mailing list
I would like to use the really nice adaptive NUTS sampler with a non-Stan model.  The model itself uses a deep learning framework as well as Markov random fields, so it would be painful to port to the Stan language.  I should be able to use the adapt_diag_e_nuts class for the sampling, but I need to provide a valid Model class.

Is the interface for a Model class documented anywhere?  It is unfortunately a template parameter everywhere, so there is no base class.

Thanks,
John

Daniel Lee

unread,
Apr 30, 2017, 11:08:08 PM4/30/17
to stan-...@googlegroups.com
This is the place to start:

We use a template concept, not polymorphism, but shouldn't be too hard to follow. 



Daniel
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Gelman

unread,
Apr 30, 2017, 11:12:54 PM4/30/17
to Stan users mailing list
To channel Bob and Michael and maybe Aki too:  It sounds like the resulting posterior distributions will have all sorts of nooks and crannies, so maybe MCMC is not the best way to go.  Maybe something like EP would work better.

That said, I'm a big fan of trying MCMC to see how it will work.  I recommend multiple chains, with starting points taken from different places in the target distributon, so you can get a good idea of mixing.

I don't know anything about the model class stuff, but I guess the Stan coders will know this one.

Andrew

On Apr 30, 2017, at 10:30 PM, John Jumper <john.m...@gmail.com> wrote:

Bob Carpenter

unread,
Apr 30, 2017, 11:19:51 PM4/30/17
to stan-...@googlegroups.com
Sorry, that's the same link as I sent. I replied to Andrew's
reply, which somehow got a new subject along the way, so got
unthreaded in my mailer.

- Bob

Andrew Gelman

unread,
Apr 30, 2017, 11:22:05 PM4/30/17
to stan-...@googlegroups.com
Hi, yes, I gave it a new thread on purpose because I was addressing a general question so I thought it could be interesting to people who wouldn't care about adapt_diag_e_nuts in particular.
A

John Jumper

unread,
May 1, 2017, 12:30:11 AM5/1/17
to Stan users mailing list, gel...@stat.columbia.edu
To be fair, I am not planning to use NUTS on the neural network parameters themselves.  I have a trained variational autoencoder (a la Kingma and Welling https://arxiv.org/abs/1312.6114) using PyTorch to handle the deep learning model.  The parameters on which I will do HMC are the latent dimensions of the autoencoder.  I expect that this space should be clean enough by construction of the autoencoder, otherwise the gaussian variational posterior would not achieve a good entropy bound.  In the worst case, I might need parallel tempering (which I assume Stan doesn't have).

It would be really nice if Stan's MCMC engine were split out into a proper library that could be used outside of Stan.  Please correct me if I am wrong, but is it as simple as writing a virtual base class that implements the Model concept?  I will probably do that anyway for my project, to avoid the hassle of runtime recompilation of the optimizer.

John Jumper

unread,
May 1, 2017, 12:34:04 AM5/1/17
to Stan users mailing list, gel...@stat.columbia.edu
Also, I don't see a second thread that was created in the mailing list.  Am I missing something?

Aki Vehtari

unread,
May 1, 2017, 2:42:38 PM5/1/17
to Stan users mailing list, gel...@stat.columbia.edu
On Monday, May 1, 2017 at 6:12:54 AM UTC+3, Andrew Gelman wrote:
To channel Bob and Michael and maybe Aki too:  It sounds like the resulting posterior distributions will have all sorts of nooks and crannies, so maybe MCMC is not the best way to go.

Around 2000 we were close to write a paper on a "funnel" problem in neural network posteriors, but nobody else seemed to care, and since there were too many problems with multimodality, I moved from neural networks to Gaussian processes. 
Due to the multimodality, the convergence diagnostics for MCMC for neural networks is really difficult.

Aki

John Jumper

unread,
May 1, 2017, 2:50:56 PM5/1/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu
I currently work in protein folding, so I can sympathize with funnel problems (I assume the terminology comes from protein folding funnels). 

I am planning to run Bayesian sampling in the latent space of a variational auto-encoder, not its parameter space, so I am hoping these issues don't apply. 


You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/lhoyPTPbHdk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
May 2, 2017, 7:49:52 PM5/2/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu
That'd be one way to do it. Many of those
features are dependent on the particular model structure
of Stan.


> On May 1, 2017, at 12:30 AM, John Jumper <john.m...@gmail.com> wrote:
>
> To be fair, I am not planning to use NUTS on the neural network parameters themselves. I have a trained variational autoencoder (a la Kingma and Welling https://arxiv.org/abs/1312.6114) using PyTorch to handle the deep learning model. The parameters on which I will do HMC are the latent dimensions of the autoencoder. I expect that this space should be clean enough by construction of the autoencoder, otherwise the gaussian variational posterior would not achieve a good entropy bound. In the worst case, I might need parallel tempering (which I assume Stan doesn't have).

Nope, no parallel tempering.

> It would be really nice if Stan's MCMC engine were split out into a proper library that could be used outside of Stan. Please correct me if I am wrong, but is it as simple as writing a virtual base class that implements the Model concept? I will probably do that anyway for my project, to avoid the hassle of runtime recompilation of the optimizer.

That'd be the least invasive way to do it. There's a lot of functionality in
those methods that depend on our constraining and unconstraining
transforms and the blocks in a Stan program. And it's probably
assuming you have a templated log_prob function that gets autodiffed to
form a gradient. It used to have a log_prob_grad function that rolled
them together.

One of the things we need to do for Stan 3 is rethink that model
class. So we'll probably be doing this ourselves over the next year
or two.


A better approach would probably be to more cleanly abstract the
algorithms. That's never been a huge concern for us because we only
have one language. Implementing NUTS is relatively easy (though it's
subtle and there are a lot of pitfalls---I just meant in terms of
amount of code). It's all the derivatives and tie-ins to the language
that are hard.

If you can figure out how to refactor the MCMC lib into a standalone library
that doesn't depend on our model concept that we can use elsewhere, that'd be
great. When Alp and Dustin built ADVI, they wound up writing their own
optimizer because the L-BFGS built into Stan is also tied up with our model class.
So it'd be useful to us internally, too.

- Bob

Bob Carpenter

unread,
May 2, 2017, 7:54:54 PM5/2/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu

> On May 1, 2017, at 2:50 PM, John Jumper <john.m...@gmail.com> wrote:
>
> I currently work in protein folding, so I can sympathize with funnel problems (I assume the terminology comes from protein folding funnels).

Don't know what those are, but the terminology was
from Radford Neal's example, where the posterior in
a hierarchical model with no data looks like a funnel
when projected down to two dimensions.

> I am planning to run Bayesian sampling in the latent space of a variational auto-encoder, not its parameter space, so I am hoping these issues don't apply.

Is the posterior multimodal or unimodal? Is the
Hessian constant over the posterior or varying?

- Bob

John Jumper

unread,
May 2, 2017, 8:21:38 PM5/2/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu

I am unsure at this point about the structure of the posterior, especially if it is multimodal. I have decided to start with variational inference or even maximum likelihood with multiple starts on my model to see if I can get a better picture of the posterior structure before attempting full Bayesian sampling. I am still finishing the implementation of the model. 


John Jumper

unread,
May 2, 2017, 8:28:08 PM5/2/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu
I will let you guys know if I abstract the model class. I saw the templated methods for prob (or grad) but it looks like there is no particular efficiency gain from templating. If I abstract it, I will probably write a templated-method ModelHolder class that essentially forwards the calls to a contained object with non-templated virtual Model class interface. I think I can then interface the contained class back to Python. 

Bob Carpenter

unread,
May 3, 2017, 3:57:12 PM5/3/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu

> On May 2, 2017, at 8:27 PM, John Jumper <john.m...@gmail.com> wrote:
>
> I will let you guys know if I abstract the model class. I saw the templated methods for prob (or grad) but it looks like there is no particular efficiency gain from templating.

Not really in this case. Nothing to really inline as all the
calls are to big functions and the important inlining is
within the model.

> If I abstract it, I will probably write a templated-method ModelHolder class that essentially forwards the calls to a contained object with non-templated virtual Model class interface. I think I can then interface the contained class back to Python.

This kind of forgetfulness of type can be difficult to manage
with concepts. There are approaches like the CRTP that can help.

What would be nice from my perspective would be if there was
a lower-level abstraction class that only exposed size and
log density. I think that's where a lot of algorithms want to work.
The details of loading data and transforming variables are not
of concern most of the time. But then you often need them for
user interfaces as users want to work on the conventional constrained
scale for both writing models and dealing with I/O.

- Bob

John Jumper

unread,
May 3, 2017, 5:23:08 PM5/3/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu
My plan was essentially to provide your ideal interface and have ModelHolder as an adapter to the current interface that contains the ideal LogDensity object as well as some callback objects.  It is certainly not what you want for Stan 3, but it might give you a halfway house on the refactoring.  As an aside, you likely also want the notions of proto and Jacobian adjustment in your ideal LogDensity interface but they can just be regular boolean parameters.

I realize that I may have underestimated the difficulty of the type parameter T. I missed that it can be instantiated as an autodiff type.  It looks like it is normally instantiated as an std::vector<double> or Eigen column vector with the type stan::math::var::var (or may double).  Anyway, I could probably work around that in the ModelHolder since I don't plan to use the autodiff types in my LogDensity interface.

I should say that I have job interviews coming up, so I may not get time on this anyway for a while.


--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/lhoyPTPbHdk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages