Autodiff and Laplace approx from ADMB (arXiv paper link)

51 views
Skip to first unread message

Bob Carpenter

unread,
Feb 5, 2016, 11:45:03 AM2/5/16
to stan...@googlegroups.com
An arXiv paper from the ADMB folks (they just invited me to
go to their next meeting in June in Seattle and sent me this
link) to how they're doing Laplace approx in TMB (R package
that wraps ADMB and apparently adds more functionality):

http://arxiv.org/pdf/1509.00660v1.pdf

I haven't read it yet, but it's next on my queue.

- Bob

Michael Betancourt

unread,
Feb 5, 2016, 11:53:21 AM2/5/16
to stan...@googlegroups.com
This is basically an MML scheme.
> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Carpenter

unread,
Feb 5, 2016, 12:58:03 PM2/5/16
to stan...@googlegroups.com
Exactly why I thought Andrew, Dustin, Alp, etc. would
be interested :-)

- Bob

Michael Betancourt

unread,
Feb 5, 2016, 1:39:50 PM2/5/16
to stan...@googlegroups.com
I told everyone the calculations that would be involved a long time ago. :-p

Bob Carpenter

unread,
Feb 5, 2016, 3:25:07 PM2/5/16
to stan...@googlegroups.com
I couldn't follow their discussion of automatic
sparsity detection and how it leads into equation (8),
so maybe we could talk about that when we're face to
face at some point. Or I can spend more time squinting
at it.

And when they say order 0 forward, do you think they're
just building up the expression graph without any evaluations?
I found all the descriptions pretty confusing.

And where do you need the gradient of the directional
derivative they talk about in (6)?

You can see that they're taping once and reusing, which
limits what their C++ code can be, but is much faster for
them because the CppAD taping is relatively slow.

Their "cheap gradient principle" is misleading. It may take
only a few more arithmetic operators, but there's also memory
locality and essentially interpreter overhead.
a factor of 4 slowdown --- it totally depends on the kinds
of operations going on. We measured 32* and 16* slowdown
on sums and products for CppAD (though that included taping)
and about a 4* slowdown for pow().

And we really need to work on parallelization for matrix
ops! Maybe the next big data thingy we apply for we can
talk about it.

- Bob

Dustin Tran

unread,
Feb 5, 2016, 7:09:24 PM2/5/16
to Andrew Gelman, Alp Kucukelbir, stan...@googlegroups.com
I read the paper. The main innovations seem to be implementation of autodiff, and the MML is an added bonus. It’s faster and more general than INLA, although it is simpler as an approximation. In general, it’s very traditional and their angle is that they can automate Laplace approximations in some sense because they take autodiffs to follow gradients when maximizing an objective. The objective function is the Laplace approximation of the marginal likelihood.

GMO is more accurate from an estimator point of view: it only uses an approximation, e.g., Laplace, ADVI, as a proposal distribution for importance sampling. GMO is also faster (at least in principle): they use the very standard approach of LBFGS, whereas we use stochastic gradients.

Dustin
> On Feb 5, 2016, at 4:49 PM, Andrew Gelman <gel...@stat.columbia.edu> wrote:
>
> Yup. I’m just hoping/expecting that GMO will be a faster, more scalable MML.
>
>> On Feb 5, 2016, at 4:33 PM, Alp Kucukelbir <a...@cs.columbia.edu> wrote:
>>
>> looks like MML to me (at a quick glance).
>>
>> On Fri, Feb 5, 2016 at 1:17 PM, Andrew Gelman <gel...@stat.columbia.edu> wrote:
>>> Hi, I haven’t looked at this but if it’s MML we can compare it to GMO at some point…
>>>
>>>> On Feb 5, 2016, at 11:53 AM, Michael Betancourt <betan...@gmail.com> wrote:
>>>>

Andrew Gelman

unread,
Feb 5, 2016, 7:15:29 PM2/5/16
to Dustin Tran, Alp Kucukelbir, stan...@googlegroups.com
That’s good: once we have GMO really working, ti’s good to have one more existing method that we beat!
Reply all
Reply to author
Forward
0 new messages