derivatives for MML

Bob Carpenter

unread,

Jul 2, 2014, 3:25:52 PM7/2/14

to stan...@googlegroups.com

The "outer loop" of Andrew's general MML algorithm requires
us to compute a vector g:

g = gradient_alpha log | det H |

where H is the Hessian (2nd order derivative matrix) of a
function f(alpha).

Question: Is H going to be positive definite?

BRUTE FORCE: One approach would be to use fvar<fvar<var>> types, compute
the Hessian, which will have entries of type var, then take
the log determinant, which will be of type var, then propagate
backward. Overall, it'll be an O(N^3) operation, but the operations
are expensive.

BETTER: Use the algebraic force (which for me, means looking up the answer
in the Matrix Cookbook):

partial log(det(X)) = Tr(inv(X) * partial X)

I think this means we can do the Hessian with fvar<var> types, which
only requires N function evaluations. Then we can calculate the
partial w.r.t. the log determinant analytically.

Thoughts?

- Bob

Bob Carpenter

unread,

Jul 2, 2014, 3:45:52 PM7/2/14

to stan...@googlegroups.com

Obviously the "BETTER" algorithm is also O(N^3) because
it involves an inverse.

I don't know if there's a simpler way to do the trace
of an inverse rather than running a solver.

- Bob

> --
> You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Michael Betancourt

unread,

Jul 2, 2014, 3:51:58 PM7/2/14

to stan...@googlegroups.com

Same calculation arises in RHMC, which is the reason for the
grad Trace Matrix Time Hessian autodiff function. Once the
inverse has been calculated (a O(N^{3}) cost unless we’re really
clever about the “arrowhead” structure) the derivative can be
done in O(N).

But you do have to worry about being in a region where the Hessian
is positive definite. We could think about finding some way of
defining a smooth absolute value… ;-)

Bob Carpenter

unread,

Jul 2, 2014, 4:01:52 PM7/2/14

to stan...@googlegroups.com

On Jul 2, 2014, at 3:51 PM, Michael Betancourt <betan...@gmail.com> wrote:

> Same calculation arises in RHMC, which is the reason for the
> grad Trace Matrix Time Hessian autodiff function.

Right. I thought it was starting to look familiar.

> Once the
> inverse has been calculated (a O(N^{3}) cost unless we're really
> clever about the "arrowhead" structure) the derivative can be
> done in O(N).

OK --- that's the best I could come up with.

> But you do have to worry about being in a region where the Hessian
> is positive definite. We could think about finding some way of

> defining a smooth absolute value... ;-)

That's why SoftAbs came up in the discussion. Seems like if
we're going to take second derivatives to model curvature, we
might want to smooth them.

Andrew's already talking about doing a diagonal-only version now
that we discussed the computational cost, which makes this look
even more like SoftAbs!

- Bob

Reply all

Reply to author

Forward