Evaluating normalizing constant in Stan

394 views
Skip to first unread message

Hamed Nikbakht

unread,
Apr 12, 2017, 11:58:14 AM4/12/17
to Stan users mailing list
Dear all,

Can we evaluate the normalizing constant with NUTS? (Maybe using Adiabatic Monte Carlo?)

I'm asking that because in my problem, the probability of failure is exactly the normalizing constant.

Thank you very much for you help in advance.

Regards,
Hamed

Bob Carpenter

unread,
Apr 13, 2017, 2:30:36 PM4/13/17
to stan-...@googlegroups.com
Not directly. NUTS just does MCMC using an unnormalized density.

If the probability of failure is an event probability, then
you should be able to calculate that conditioned on data.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Hamed Nikbakht

unread,
Apr 14, 2017, 1:58:12 PM4/14/17
to Stan users mailing list
Thank you very much.

But, Do you have any plan to add it (Adiabatic Monte Carlo) into Stan in near future?

I'm saying that based on Dr. Gelman's post on his weblog in May 2014:

"This is what Betancourt’s thermodynamic Monte Carlo algorithm does. I don’t follow the details of the algorithm (I’m waiting till it appears in Stan) but the basic idea makes a lot of sense to me, as it follows the Folk Theorem and the Pinocchio principle."

Highly appreciate your consideration.
Hamed

Bob Carpenter

unread,
Apr 14, 2017, 2:44:34 PM4/14/17
to stan-...@googlegroups.com
I'm sure we'll get there eventually, but we need
to get the higher-order autodiff solidified first.
So not until late this year or next year, probably.

- Bob

Henrik Singmann

unread,
Apr 14, 2017, 5:01:53 PM4/14/17
to Stan users mailing list
I am not sure if this is exactly what you need, but in the current development version of the bridgesampling package you can pass a fitted rstan object (i.e., which contains posteriors samples) directly to the bridge sampler to obtain the marginal likelihood. It is no Adiabatic MC, but the bridge sampler appears solid in getting the job done.

You currently have to use the stan_interface branch though: https://github.com/quentingronau/bridgesampling/tree/stan_interface

Andrew Gelman

unread,
Apr 14, 2017, 5:45:08 PM4/14/17
to stan-...@googlegroups.com
Wow, that looks great!
A

Henrik Singmann

unread,
Apr 14, 2017, 6:11:17 PM4/14/17
to Stan users mailing list, gel...@stat.columbia.edu
Thanks a lot Andrew. That is very encouraging feedback.

Quentin (who is the main author of the bridge sampler code and the vignettes) is a PhD student of E.J. Wagenmakers in Amsterdam. And we (i.e., Quentin, E.,J., and I) are definitely planning to keep you guys in the loop about the progress of the package. Right now we are still a little in the testing stage (but there do not appear to be any more major bugs).

As I already mentioned in a previous mail on the list, the main issue for us right now for the stan interface are related to the current rstan version. Our wishlist currently is:
- A method that recompiles the C++ code of a model from the fitted stanfit object. Currently we cannot call and rstan::log_prob() or rstan::unconstrain_pars() if the model was fitted in a different R session. Given that the stanfit object contains everything needed for compiling (minus the original data object), that seems in principle solvable.
- Ability to pass more than one set of samples per call to rstan::unconstrain_pars().
But we are obviously happy to wait until rstan3 as we found workarounds for both problems (in the first case the user needs to compile the model again just without samples and pass both).

Best,
Henrik

Hamed Nikbakht

unread,
Apr 15, 2017, 7:52:09 PM4/15/17
to Stan users mailing list
Thank you very much for the point.

Regards,
Hamed

Hamed Nikbakht

unread,
Apr 15, 2017, 7:53:19 PM4/15/17
to Stan users mailing list
Thank you very much for letting me know about the bridge sampling package.

Sounds pretty good!

Regards
Hamed

Matt Graham

unread,
Apr 16, 2017, 8:16:07 PM4/16/17
to stan-...@googlegroups.com
Apologies for the self-advertising but as (I think) it is relevant:

We just put up a pre-print 'Continuously tempered Hamiltonian Monte Carlo' on arXiv which presents a method for calculating normalising constants within HMC (or NUTS) frameworks such as Stan or PyMC3. It is related to Adiabatic Monte Carlo, but uses a different Hamiltonian formulation which makes it simpler to implement (it just requires integrating a standard Hamiltonian dynamic on a slightly extended state space), with the trade off of less coherent movement through the inverse temperature dimension than Adiabatic Monte Carlo.

Technically it can be applied already using Stan as it just requires defining an augmented target log density and then running NUTS (or static HMC) in that augmented space. For example Stan code for the experiments of a previous workshop version of the paper is available in Appendix G here

A key factor of the performance in more complex models is however how 'close' the base density sampled from at zero inverse temperatures is to the target density. In the paper we suggest using a variational method such as ADVI to fit a parametric approximation to the target density (e.g. Gaussian) and then use this as the base density. Although the ADVI implementation in Stan can be used to fit such an approximate density, from my explorations in the Pystan source / documentation it did not seem possible at the moment to extract the mean / covariance parameters of the fitted ADVI Gaussian approximation directly, only to draw samples from the approximation (if there is a way to do this in Pystan or one of the other interfaces it would be great to find out how). Although the samples can be used to estimate the mean / covariance indirectly this is a bit clunky and slow. As an alternative the base density can be fitted externally and the its parameters fed in as 'data' to an extended Stan model (this was the approach used in the aforementioned Stan experiments in the workshop paper). 

If using Python is an option (and if its not too heretical to mention alternative packages on this mailing list!), the current ADVI implementation in PyMC3 does allow directly accessing the parameters of the fitted ADVI density and together with the NUTS implementation in PyMC3, can be used to run the method in the paper and estimate normalising constants. A Jupyter notebook giving a PyMC3 implementation performing marginal likelihood estimation for the Bayesian hierarchical regression problem discussed in section 7.3 of the paper is available here (the PyMC3 model there is an adaption of the example Stan model here).

Regards,
Matt

Bob Carpenter

unread,
Apr 17, 2017, 2:29:36 PM4/17/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu

> On Apr 14, 2017, at 6:11 PM, Henrik Singmann <sing...@gmail.com> wrote:
>
> Thanks a lot Andrew. That is very encouraging feedback.
>
> Quentin (who is the main author of the bridge sampler code and the vignettes) is a PhD student of E.J. Wagenmakers in Amsterdam. And we (i.e., Quentin, E.,J., and I) are definitely planning to keep you guys in the loop about the progress of the package. Right now we are still a little in the testing stage (but there do not appear to be any more major bugs).
>
> As I already mentioned in a previous mail on the list, the main issue for us right now for the stan interface are related to the current rstan version. Our wishlist currently is:
> - A method that recompiles the C++ code of a model from the fitted stanfit object. Currently we cannot call and rstan::log_prob() or rstan::unconstrain_pars() if the model was fitted in a different R session. Given that the stanfit object contains everything needed for compiling (minus the original data object), that seems in principle solvable.

I've been lobbying for exactly the opposite approach. If you have
your model code somewhere (ideally in a standalone file), you don't
need to reconsistitute a fit object.

But I'm not a heavy or native R user, so take this with a grain of
salt. I think they're going to keep the
god-object approach that's so common in R (e.g., return object for glm()):

https://en.wikipedia.org/wiki/God_object

> - Ability to pass more than one set of samples per call to rstan::unconstrain_pars().
> But we are obviously happy to wait until rstan3 as we found workarounds for both problems (in the first case the user needs to compile the model again just without samples and pass both).

This is just a simple mapper function (something that applies a function to
a sequence)? You provide a sequence of parameter values and get a sequence of
unconstrained values?

- Bob

Henrik Singmann

unread,
Apr 18, 2017, 3:21:22 AM4/18/17
to Stan users mailing list, gel...@stat.columbia.edu
H Bob,

To respond to your points.

Am Montag, 17. April 2017 20:29:36 UTC+2 schrieb Bob Carpenter:
[...]

> - A method that recompiles the C++ code of a model from the fitted stanfit object. Currently we cannot call and rstan::log_prob() or rstan::unconstrain_pars() if the model was fitted in a different R session. Given that the stanfit object contains everything needed for compiling (minus the original data object), that seems in principle solvable.

I've been lobbying for exactly the opposite approach.  If you have
your model code somewhere (ideally in a standalone file), you don't
need to reconsistitute a fit object.

But I'm not a heavy or native R user, so take this with a grain of
salt.  I think they're going to keep the
god-object approach that's so common in R (e.g., return object for glm()):

https://en.wikipedia.org/wiki/God_object

I can see that for stanfit objects, which can be of considerable size, it might make sense to restrict them to the necessary bare bones (such as just the posterior samples and not also part of the model). But I also believe that for us it does not really matter in the end. We just need a reliable method to call log_prob() whether the samples were obtained in the current R session or a previous one.  
 

> - Ability to pass more than one set of samples per call to rstan::unconstrain_pars().
> But we are obviously happy to wait until rstan3 as we found workarounds for both problems (in the first case the user needs to compile the model again just without samples and pass both).

This is just a simple mapper function (something that applies a function to
a sequence)?  You provide a sequence of parameter values and get a sequence of
unconstrained values?  


True, just a simple wrapper and only a question of interface. Currently this is just not optimal as there are too many steps: We first need to extract the samples, then transform each posterior sample individually into a list which can then be passed to the unconstrain_par() function, and then combine the resutls again.. It would be optimal if this function simply accepts the extracted samples without intermediary steps. But this is obviously just cosmetics.

Thanks again,
Henrik 
 
- Bob

Bob Carpenter

unread,
Apr 19, 2017, 3:44:36 PM4/19/17
to stan-...@googlegroups.com

> On Apr 16, 2017, at 8:16 PM, Matt Graham <matthew....@gmail.com> wrote:
>
> Apologies this is fairly shameless self-advertising but as (I think) it is relevant:

Citations to relevant papers are always welcome, self-advertising or otherwise.

...
> If using Python is an option (and if its not too heretical to mention alternative packages on this mailing list!),

Not too heretical. We don't want to start getting queries from people
wanting help with packages we don't maintain, but discussion's fine.

...
> the current ADVI implementation in PyMC3 does allow directly accessing the parameters of the fitted ADVI density

I'm pretty sure you can get the fitted ADVI parameters out
of RStan and CmdStan. I think the PyStan ADVI interface
was awkward because the underlying ADVI code wasn't originally
integrated the same way as other algorithms. Allen just
implemented a stopgap measure until the Stan 3 refactors land,
which is happening now.

How are you evaluating the stability of ADVI? We're having
massive problems with convergence across multiple model types.
So any tips would be appreciated.

- Bob

Bob Carpenter

unread,
Apr 19, 2017, 3:49:36 PM4/19/17
to stan-...@googlegroups.com, gel...@stat.columbia.edu

> On Apr 18, 2017, at 3:21 AM, Henrik Singmann <sing...@gmail.com> wrote:
>
> H Bob,
>
> To respond to your points.
>
> Am Montag, 17. April 2017 20:29:36 UTC+2 schrieb Bob Carpenter:
> [...]
> > - A method that recompiles the C++ code of a model from the fitted stanfit object. Currently we cannot call and rstan::log_prob() or rstan::unconstrain_pars() if the model was fitted in a different R session. Given that the stanfit object contains everything needed for compiling (minus the original data object), that seems in principle solvable.
>
> I've been lobbying for exactly the opposite approach. If you have
> your model code somewhere (ideally in a standalone file), you don't
> need to reconsistitute a fit object.
>
> But I'm not a heavy or native R user, so take this with a grain of
> salt. I think they're going to keep the
> god-object approach that's so common in R (e.g., return object for glm()):
>
> https://en.wikipedia.org/wiki/God_object
>
> I can see that for stanfit objects, which can be of considerable size, it might make sense to restrict them to the necessary bare bones (such as just the posterior samples and not also part of the model). But I also believe that for us it does not really matter in the end. We just need a reliable method to call log_prob() whether the samples were obtained in the current R session or a previous one.

I think that's fine if you're on the same hardware platform.
When you change hardware, R versions, C++ libs, etc., you run
into incompatibilities.

The bit you need is the compiled model, which is specifically what
you're having trouble reconstituting. No way to get the log densities,
gradients, or transforms out without it (and with data loaded).


>
>
>
> > - Ability to pass more than one set of samples per call to rstan::unconstrain_pars().
> > But we are obviously happy to wait until rstan3 as we found workarounds for both problems (in the first case the user needs to compile the model again just without samples and pass both).
>
> This is just a simple mapper function (something that applies a function to
> a sequence)? You provide a sequence of parameter values and get a sequence of
> unconstrained values?
>
>
> True, just a simple wrapper and only a question of interface. Currently this is just not optimal as there are too many steps: We first need to extract the samples, then transform each posterior sample individually into a list which can then be passed to the unconstrain_par() function, and then combine the resutls again.. It would be optimal if this function simply accepts the extracted samples without intermediary steps. But this is obviously just cosmetics.

My own philosophy here is to provide minimal tools that can then be
combined in useful ways (you can tell I "borrowed" my philosophy from unix).
Again, that seems like not the R way of doing things.

But why not just write the function yourself to do these things?

The problem with dumping things in packages is that then we have to anticipate
all the uses, doc them, test them, and then answer questions about them on
our lists. If we provide only simple primitives, they're much easier to
document, test, and understand for users. I think the R philosophy is that
users are better at poking around in the doc of big packages than writing glue
functions (exactly the opposite of my own experience with R, but then I'm not
a native R coder).

- Bob

Henrik Singmann

unread,
Apr 19, 2017, 7:18:32 PM4/19/17
to Stan users mailing list, gel...@stat.columbia.edu


Am Mittwoch, 19. April 2017 21:49:36 UTC+2 schrieb Bob Carpenter:

> On Apr 18, 2017, at 3:21 AM, Henrik Singmann <sing...@gmail.com> wrote:
>
> H Bob,
>
> To respond to your points.
>
> Am Montag, 17. April 2017 20:29:36 UTC+2 schrieb Bob Carpenter:
> [...]
> > - A method that recompiles the C++ code of a model from the fitted stanfit object. Currently we cannot call and rstan::log_prob() or rstan::unconstrain_pars() if the model was fitted in a different R session. Given that the stanfit object contains everything needed for compiling (minus the original data object), that seems in principle solvable.
>
> I've been lobbying for exactly the opposite approach.  If you have
> your model code somewhere (ideally in a standalone file), you don't
> need to reconsistitute a fit object.
>
> But I'm not a heavy or native R user, so take this with a grain of
> salt.  I think they're going to keep the
> god-object approach that's so common in R (e.g., return object for glm()):
>
> https://en.wikipedia.org/wiki/God_object
>
> I can see that for stanfit objects, which can be of considerable size, it might make sense to restrict them to the necessary bare bones (such as just the posterior samples and not also part of the model). But I also believe that for us it does not really matter in the end. We just need a reliable method to call log_prob() whether the samples were obtained in the current R session or a previous one.

I think that's fine if you're on the same hardware platform.
When you change hardware, R versions, C++ libs, etc., you run
into incompatibilities.

The bit you need is the compiled model, which is specifically what
you're having trouble reconstituting.  No way to get the log densities,
gradients, or transforms out without it (and with data loaded).


Yes, of course. Sorry for being unclear here. What I meant is that I would be very happy with a log_prob function that always needs to receive model code and data in addition to the desired parameter set and not only a god object that might or might not fail. A function that requires the user to specify more information, but works more reliably, is what we need. I hope that makes clearer that I agree with you here.

Matt Graham

unread,
Apr 22, 2017, 11:35:25 AM4/22/17
to Stan users mailing list

Thanks for your reply Bob.

I’m pretty sure you can get the fitted ADVI parameters out of RStan and CmdStan.

I couldn’t see any reference for this in the documentation for RStan / CmdStan - is this something that hasn’t been documented yet? The closest I can see in the get_posterior_mean method of the stanfit object returned by the RStan vb function, however from the description it seems this may be estimated from samples or just not be applicable to a vb stanfit output.

How are you evaluating the stability of ADVI? We’re having massive problems with convergence across multiple model types. So any tips would be appreciated.

We get an indication of how well ADVI fits the posterior from the marginal density on the inverse temperature parameter when sampling in the extended space. If the ADVI converges to a good fit and so the KL from the base density (ADVI fit) to target density is low, then the marginal density on the inverse temperature should be relatively flat (in particular the log ratio of the densities at inverse temperatures one and zero will be equal in expectation to the KL divergence from the ADVI base to target density). If the inverse temperature samples are all close to one, this suggests there is a large remaining KL between the variational approximation and target, though this can also be due to poor convergence of the MCMC chain in the extended space so its only partially useful as a diagnostic. Also even if the ADVI has converged, as it could be the optima converged to still has a large KL from the target if the family used for the variational approximation is a poor fit to the true target, this is more a diagnosis of the goodness of the variational fit than stability / convergence. In our PyMC3 experiments, we also used a long annealed importance sampling run as a proxy ground truth to evaluate both the ADVI fit and the estimate of the marginal likelihood from our method.

Bob Carpenter

unread,
Apr 23, 2017, 1:06:38 AM4/23/17
to stan-...@googlegroups.com
I also don't see any doc for the return other than as a
stanfit object. I think one of the iterations may be the
final fit, but I don't see that indicated anywhere.

And thanks for the report on evaluating ADVI.

- Bob
Reply all
Reply to author
Forward
0 new messages