The discussion touches upon a topic that's interesting for many Manopt users, namely: how to escape saddle points when using Riemannian Trust Regions?
Saddle points are points where the gradient is zero, but the Hessian has at least one negative eigenvalue (thus, it is not even a local optimizer, but if you're not careful you might be stuck).
Hello Ju,
Here is my take on truncated conjugate gradients (tCG), the option "useRand" (on/off) and the issue of escaping saddle points, with Riemannian Trust Regions (RTR) in Manopt.
With useRand "off", tCG starts with the zero vector, and iterates a certain procedure. The procedure is such that, if eta_0 = 0, then eta_1 = the Cauchy point. As I am sure you know, the Cauchy point in the optimum of the TR inner problem restricted to the gradient direction. Theory guarantees that if RTR achieves at least (a fraction of) the Cauchy decrease in the model at each iteration, then we get convergence to critical points. Combine that with the fact that tCG is monotonically improving, and you get convergence for RTR with tCG, regardless of when you stop tCG. (For fast local convergence, you need more work of course.)
Now, if you are at a saddle, the gradient is zero, and there is no Cauchy point to be talked about. In those cases (and similarly, if you are at a point where the gradient is very small), an alternative strategy is to compute a negative-curvature direction, and follow that direction all the way to the boundary of the TR. You can show that if you do that, you get convergence to second-order critical points (will appear in a paper with Absil and Cartis, soon).
You can cover both scenarios if you have a method which solves the inner problem optimally at each step (or at least gets a fixed fraction of the optimal model improvement). If you do that, you are certain to perform at least as well as a fraction of the Cauchy step and at least as well as a fraction of the "most negative curvature step" (I call them eigen steps). Thus, you will get convergence to second order critical points as well. But .... you have to solve this problem globally. It can be done in poly time, but it's a bit expensive (overkill).
Setting useRand to "on" is an optimist's take on this state of affairs. You don't want to compute eigensteps, and you don't want to solve the TR inner problem optimally. So, instead, you pick your eta_0 at random. This will allow you to explore directions other than the Cauchy step, thus giving you better chances to discover negative curvature if they are overwhelmingly present (many negative eigenvalues in the Hessian). In the code, there is also a check to make sure we do at least as well as the Cauchy point, to not harm convergence to critical points.
Of course, it makes a lot more sense to set useRand to "on"
only when the gradient is small. I think that's what the originally intended use of it was, when
Chris Baker, Absil and Gallivan coded the original software,
GenRTR: solve with useRand "off"; then just to be a little bit safer, run again from your small-gradient point, with useRand "on". If that doesn't make you move away, your confidence increases (heuristic).
The thing about eigensteps is that it's deceptively difficult to find good literature on how hard it is or isn't, computationally. There are some really good (but old) papers about it, but the focus back then was on different things, amn so they don't quite answer the precise questions that are useful today. And they're pretty difficult to get into as well, so it's not trivial to adapt them. We'll try to discuss that briefly in the paper with Absil and Cartis.
I hope this makes sense and partly addresses your questions.
Cheers,
Nicolas