How to handle non-positive-definite hessian matrix

83 views

Skip to first unread message

张亦弛

unread,

Oct 5, 2024, 3:04:03 AM10/5/24

to Manopt

Hello. Thank you for open-source textbook "An Introduction to Optimization on Smooth Manifolds". This book is really a good starting point for me unfamiliar with this field before.

Having read a few chapters of the book, I want to write a toy model of optimization on a manifold on my own. To get through the procedure for the first time, I would like to put the existent tool "manopt" aside and take care of every prgramming problem by myself.

The toy problem is minimization of the quadratic function L(x) = 1/2 xT A x + B x with simplex constraint M = { x | x_i > 0, sum_i x_i = 1, x in R^n }. According to the literature and my own derivation, several important objects are as follows.

Tangent space: TxM = { d | sum_i d_i = 0, d in R^n }

Inner product: < d, p >_x = sum_i d_i p_i / x_i

Projector: Proj_x = I - 1 * x, where I is identity matrix, 1 is an n-by-n matrix whose elements are all ones, and * is column-wise multiplication.

Gradient: Rgrad(x) = Proj_x [ Egrad(x) . x ], where "." is element-wise multiplication.

Hessian: Rhess(x)[d] = Proj_x [ Diag(x) A + Egrad(x) - 1 * ( Egrad(x) . x ) - 1/2 Rgrad(x)/x ] d, where "Diag()" converts a vector into a diagonal matrix, and the last "/" is element-wise division. Rhess(x)[d] is found to fulfill the self-adjoint rule with respect to the inner product defined in the tangent space. Rhess(x)[d] is linear since it is expressed as the product of the matrix Rhess(x) and the vector d.

Exponential map: Obtained from "Image Labeling by Assignment".

With all these objects at hand, I wrote a Riemannian-Newton algorithm. However, occasionally, L(x) converges (with sufficiently small Riemannian gradient) at a point higher than the initial guessed x0, so I think the Riemannian hessian must have been non-positive-definite. Is there any way to handle it?

I know, in the unconstrained case, the hessian can be "lifted": H_new = H + mu I, where mu is a scalar a little bit greater than the absolute value of the smallest negative eigenvalue of H. Can this technique be transplanted to Riemannian optimization?

If the answer is yes, there are two following questions.

1. How can I obtain the eigenvalues and eigenvectors of Rhess(x)? Inspired by Rayleigh quotient, my wild guess is to minimize the function <d, Rhess(x)[d]> / <d, d> in the tangent space, where < , > is the inner product defined in the tangent space.

2. How can I construct a new Rhess(x)[d] based on the smallest negative eigenvalue. In the unconstrained case, the trick is H_new = H + mu I. What about the Riemannian case?

Thanks for any help.

Nicolas Boumal

unread,

Oct 7, 2024, 3:56:59 AM10/7/24

to Manopt

Hello,

The main issue here may be related to Newton's method itself. Indeed, Newton's method is not a descent method: it does not guarantee f(x_{k+1}) < f(x_k), as you observed.

My main recommendations would be:

1) Experiment with a different method, e.g., gradient descent.

2) Consider the option of parameterizing the simplex through a sphere. Namely, if S is the unit sphere in R^n, and if you pick a point x on S, then x.^2 (entry-wise squared) is a point in the simplex. Moreoever, every point in the simplex can be written in this way. Thus, if you have a cost function f(y) on the simplex, you can instead optimize g(x) = f(x.^2) on the sphere. This is nice because the sphere is a complete set, whereas the simplex (with constraints x_i > 0) is open: the missing boundary can cause trouble for optimization.

Best,

Nicolas

Reply all

Reply to author

Forward

0 new messages