Complex Circle manifold, Riemannian Gradient

57 views
Skip to first unread message

Robert Qiu

unread,
Aug 14, 2021, 12:17:55 PMAug 14
to Manopt
I have a quartic objective function listed as follow, could you please help me calculate the Euclidean gradient and Hessian matrix of the objective function and transform them into their Riemannian versions.捕获.JPG

Nicolas Boumal

unread,
Aug 16, 2021, 8:59:47 AMAug 16
to Manopt
Hello Robert,

Could you tell us more about what you have tried so far? This will make it easier to identify the concept that needs clarification.

Best,
Nicolas

Robert Qiu

unread,
Aug 19, 2021, 8:10:06 AMAug 19
to Manopt

Hello Nicolas,
Thanks very much for your kind help.
I don’t know how to use the function of uploading files, so I uploaded a few pictures.
The work I have tried so for was attached as belows.
Best,
Robert1.JPG2.JPG3.JPG

Nicolas Boumal

unread,
Aug 20, 2021, 3:04:12 AMAug 20
to Manopt
Hello,
There is no need to differentiate "with respect to $s$ or $s^*$".
Formally, here's how to proceed:
  • We think of $C^n$ as a real vector space of dimension $2n$. Indeed, if $e_1, \ldots, e_n$ are the columns of the identity matrix, then $e_1, i e_1, e_2, i e_2, \ldots, e_n, i e_n$ are $2n$ vectors which form a basis for $C^n$: every vector in $C^n$ can be expressed (uniquely) as a linear combination of those $2n$ vectors, with real coefficients.
  • On $C^n$, we define a real inner product: see the Euclidean reminders early in Chapter 3. Explicitly, for $u, v \in C^n$, we define $<u, v> = Re(u* v)$ where $Re$ extracts the real part.
  • If $f : C^n \to R$ is your cost function, then its gradient at $x$ is defined as the vector $grad f(x) \in C^n$ such that $D f(x)[v] = <grad f(x), v>$ for all $v \in C^n$. The left-hand side of that is the directional derivative: $D f(x)[v] = lim_{t \to 0} (f(x+tv) - f(x))/t$: just work out that limit, the normal way. For example, complex conjugation is a linear map (since we allow only real coefficients in linear combinations): it's easy to differentiate through linear maps. Then you need to manipulate the expression you find for $D f(x)[v]$ until it looks like $<...., v>$ where $<., .>$ is that inner product defined above. The .... is the gradient of $f$ on $C^n$.
  • You can give this gradient as problem.egrad in Manopt: the toolbox will take care of transforming it into a Riemannian gradient if your manifold is a Riemannian submanifold of $C^n$, such as the phase manifolds for example.

I hope this clarifies things.
Best wishes,
Nicolas

Robert Qiu

unread,
Aug 27, 2021, 10:44:58 PMAug 27
to Manopt
Hello, Nicolas
Thanks very much for your time and kind help.
I worked out the Gradient and Hessian matrix of the objective function.
Now, I want to solve this minimization problem using the ARC algorithm in the 'manopt' toolbox.
In page 26 of your paper Adaptive regularization with cubics on manifolds, for ARC, the equation (3) are not checked to slove the subproblem. As far as I know, to avoid the saddle points of a optimization probllem, the gradient of the cost function should be 0, and the Hessian matrix should be positive define. If only the first-order condition is considered, can it be guaranteed that saddle points are avoided?
Moreover, in equation (2) and (3) on the paper, the gradient is not strictly equal to 0, but less than a value. The matrix is not strictly positive definite, but the smallest eigenvalue is greater than a value. What's the reason for doing this?
Thanks for your time again.
Best wishes,
Robert

Nicolas Boumal

unread,
Aug 29, 2021, 12:30:26 PMAug 29
to Manopt
Hello Roberts,

>   If only the first-order condition is considered, can it be guaranteed that saddle points are avoided?

Guaranteed, no. However, in practice, it's unlikely that you would converge to a saddle point because they are unstable. They can slow down algorithms significantly though. Sometimes people add small random perturbations in certain places in their algorithms to escape saddle points. For example, with the trustregions algorithm in Manopt you can set "option.useRand = true:" This is a heuristic.

> Moreover, in equation (2) and (3) on the paper, the gradient is not strictly equal to 0, but less than a value. The matrix is not strictly positive definite, but the smallest eigenvalue is greater than a value. What's the reason for doing this?

The reason is that it it is not possible to guarantee the computation of an exact critical point (be it first or second order) in finite time. This is for the same reason that it is not possible to find the exact roots of a polynomial of degree >= 5 in finite time: nonlinear problems are just difficult in this way. The best we can do is find points which approximately satisfy the conditions we aim for.


Best,
Nicolas

Nicolas Boumal

unread,
Aug 29, 2021, 12:30:56 PMAug 29
to Manopt
(sorry for the extraneous 's' in your name!)
Reply all
Reply to author
Forward
Message has been deleted
0 new messages