That'd be one way to do it. Many of those
features are dependent on the particular model structure
of Stan.
> On May 1, 2017, at 12:30 AM, John Jumper <
john.m...@gmail.com> wrote:
>
> To be fair, I am not planning to use NUTS on the neural network parameters themselves. I have a trained variational autoencoder (a la Kingma and Welling
https://arxiv.org/abs/1312.6114) using PyTorch to handle the deep learning model. The parameters on which I will do HMC are the latent dimensions of the autoencoder. I expect that this space should be clean enough by construction of the autoencoder, otherwise the gaussian variational posterior would not achieve a good entropy bound. In the worst case, I might need parallel tempering (which I assume Stan doesn't have).
Nope, no parallel tempering.
> It would be really nice if Stan's MCMC engine were split out into a proper library that could be used outside of Stan. Please correct me if I am wrong, but is it as simple as writing a virtual base class that implements the Model concept? I will probably do that anyway for my project, to avoid the hassle of runtime recompilation of the optimizer.
That'd be the least invasive way to do it. There's a lot of functionality in
those methods that depend on our constraining and unconstraining
transforms and the blocks in a Stan program. And it's probably
assuming you have a templated log_prob function that gets autodiffed to
form a gradient. It used to have a log_prob_grad function that rolled
them together.
One of the things we need to do for Stan 3 is rethink that model
class. So we'll probably be doing this ourselves over the next year
or two.
A better approach would probably be to more cleanly abstract the
algorithms. That's never been a huge concern for us because we only
have one language. Implementing NUTS is relatively easy (though it's
subtle and there are a lot of pitfalls---I just meant in terms of
amount of code). It's all the derivatives and tie-ins to the language
that are hard.
If you can figure out how to refactor the MCMC lib into a standalone library
that doesn't depend on our model concept that we can use elsewhere, that'd be
great. When Alp and Dustin built ADVI, they wound up writing their own
optimizer because the L-BFGS built into Stan is also tied up with our model class.
So it'd be useful to us internally, too.
- Bob