paper on black-box stochastic variational inference

122 views
Skip to first unread message

a...@ariddell.org

unread,
Jul 5, 2016, 2:56:37 PM7/5/16
to stan...@googlegroups.com
Just spotted this. Cites/discusses Stan.

Black-Box Stochastic Variational Inference
in Five Lines of Python

http://www.cs.toronto.edu/~duvenaud/papers/blackbox.pdf

Several large software engineering projects have been undertaken to support
black-box inference methods. In contrast, we emphasize how easy it is to
construct scalable and easy-to-use automatic inference methods using only
automatic differentiation. We present a small function which computes stochastic
gradients of the evidence lower bound for any differentiable posterior. As an
example, we perform stochastic variational inference in a deep Bayesian neural network.

Andrew Gelman

unread,
Jul 5, 2016, 4:46:59 PM7/5/16
to stan...@googlegroups.com
They write, "Theano (Bastien et al., 2012), Torch (Collobert et al., 2002), and Stan Stan (2015) require learning a new syntax in which to express basic operations, essentially acting as interpreters for a restricted mini-language."
I don't know Theano, but is Stan really a restricted mini-language? I thought it's just C+?

They write, "Autograd can handle Python code containing control flow primitives such as for loops, while loops, recursion, if statements, closures, classes, list indexing, dictionary indexing, arrays, array slicing and broadcasting."
But Stan can handle all this too, no?

Is it just that they want to work in Python, not in C++?

This does remind me, though, that I'd like to see the Stan implementation of the neural network for the digit recognition problem. I think Daniel's writing that one up?
A
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


Michael Betancourt

unread,
Jul 5, 2016, 4:59:55 PM7/5/16
to stan...@googlegroups.com
Yes, it’s the same argument PyMC makes — building the model in
Python would be easier than in a domain-specific language like the
Stan Modeling Language. The problem of course is that you have
to deal with the failures that arise when Autograd reaches a function
through which it can’t autodiff (or autodiff accurately). In a very real
sense it’s a false sense of security.

Andrew Gelman

unread,
Jul 5, 2016, 5:04:44 PM7/5/16
to stan...@googlegroups.com
In practice, though, their program is very similar to the Stan math library, no? The tradeoffs are:

1. Stan is easier if you know statistics, or if you know C++
2. Stan is faster
3. Stan has lots of preprogrammed probability distributions, math functions, ODE's, etc.
4. Autograd is better for certain sorts of hashing.
5. Autograd is easier if you know Python

So it depends on the importance of items 4 and 5 for the user, compared to items 1, 2, and 3. Does that sound fair? I feel like the paper is somewhat misleading in that it presents Autograd as being qualitatively different from Stan and these other programs, but actually they are all different versions of the same thing, just written in different languages and with different features programmed in. Would that be a fair description?

A

Michael Betancourt

unread,
Jul 5, 2016, 5:22:09 PM7/5/16
to stan...@googlegroups.com
Yes and yes.

Bob Carpenter

unread,
Jul 5, 2016, 8:52:19 PM7/5/16
to stan...@googlegroups.com
Stan and Stan Math work in plain old C++. But most of
our users use our domain-specific language (DSL), which
is C-like, but not C.

I'm pretty sure PyMC is different in that it has you
build a graphical model using predefined components, not
try to autodiff an arbitrary Python program. I don't know
how easy it is to add components. But it's definitely an
API you need to learn. It's not magic.

Theano, which PyMC is using, is fundamentally different
in that it's symbolic differentiation that generates
code rather than the template overload autodiff that Stan
uses. So they may be targeting Theano when they talk
about branching, etc., which I don't think Theano can do.
(I hate saying all this on public forums without being
sure, hence the hedging.)

Also, anything can be written in 1 line of code with
enough back-end scripting. Like when you call ADVI
from RStan :-)

I don't know how Autograd compares to Stan in speed.

And I also agree we need to ping Daniel---wasn't he going
to write that neural net? If he doesn't have time, I can
do it.

- Bob

Dustin Tran

unread,
Jul 5, 2016, 9:12:34 PM7/5/16
to stan...@googlegroups.com
We discussed this when it first showed up around past December. I recall we specifically corrected some of the incorrect statements and told David (one of the authors). I don't think he got around to making the changes though.

In general, yes I think the distinction is more symbolic vs reverse-mode autodiff, and if I recall thats what they confused Stan math for.

Dustin
Reply all
Reply to author
Forward
0 new messages