Solutions for problems with complicated gradient & hessian?

66 views
Skip to first unread message

zhang...@gmail.com

unread,
Nov 20, 2016, 12:48:46 AM11/20/16
to Manopt
In some cases, even though we can compute frechet derivative for the target function, we can hardly obtain an explicit expression of euclidean gradiant. In these cases, will it be possible to use manopt to perform trust-region method?

For example,suppose X is a row vector of length d*n, and denote
X=(x_1, x_2, .., x_n),
where x_i are row vector of length d . Define
f(X)=\sum_{i, j} ||x_i – x_j||^{2n},
where n is an integer and || || is the Euclidean norm.

Following similar trick in
http://thousandfold.net/cz/2013/11/12/a-useful-trick-for-computing-gradients-w-r-t-matrix-arguments-with-some-examples/,
we have
\nabla f(X) V’ = \sum_{i, j} 2n * ||x_i – x_j||^{2n-2} * <x_i – x_j , v_i – v_j >
and
1/2 * V' * \nabla^2 f(X) V = +\sum_{i, j} 2n(n-1) * ||x_i – x_j||^{2n-4} * <x_i – x_j , v_i – v_j >^2 + \sum_{i, j} n * ||x_i – x_j||^{2n-2} * ||v_i – v_j ||^2.
(Similarly, we can write down formula for W' * \nabla^2 f(X) V, even if W and V are different)

However, it will be difficult to express \nabla f(X) and \nabla^2 f(X) V. But since we have formula for directional gradient and hessian, I guess we can still perform second-order trust-region method.

Another example can be
f(X)=\sum_{i} || x_i * r_i ||^2, here * means crossproduct, r_i are given vectors.

Instead of express problem.egrad and problem.ehess explicitly, can we just input function of
first_order_term(X, V)=\nabla f(X) V
and
second_order_term(X, V, W)=W' * \nabla^2 f(X) V
to perform trust-region?


Nicolas Boumal

unread,
Nov 21, 2016, 9:33:53 AM11/21/16
to manopt...@googlegroups.com
Hello,

If you only have an expression for the directional derivatives, Df(x)[u] (directional derivative of f at x along direction u), you can specify that in

problem.diff = @(x, u) ...;


Since Manopt 3.0, Manopt knows how to get the gradient from the directional derivatives, slowly. The reason it's slow is that it will generate an orthonormal basis u_1, ..., u_n of the tangent space at x, and compute Df(x)[u_k] along each direction, using the fact that

grad f(x) = sum_{k = 1}^n  Df(x)[u_k] * u_k

So a first answer to your question is: for prototyping, specifying directional derivatives can be enough. But speed will improve dramatically if you implement a proper formula for the gradient.

For the Hessian, there is no such trick implemented (it would get very expensive very fast, to the point that using second order methods would make no sense, I expect.)

To find grad f(x) from Df(x)[u], as you seem to know, one should use the identity

<grad f(x), u> = Df(x)[u] for all u,

which means that if

Df(x)[u] = <something, u>,

then that something is grad f(x). I can't help with the details right now, but this is really all you need at this point: manipulate the expression for the directional derivative until it has the form <something, V>.

For the Hessian, I would recommend first getting an expression for the gradient, and then doing a Fréchet derivative of that.

Best of luck!
Nicolas
Reply all
Reply to author
Forward
0 new messages