Performance and Computation Time of Riemannian Trust Region and Conjugate Gradient Algorithm

lineya...@gmail.com

unread,

Oct 13, 2016, 1:42:46 PM10/13/16

to Manopt

Hi,

Very sorry to bother you again. When I'm trying to minimize a logistic loss function log(1+exp(x/sigma)), I found that the trust region algorithm is much faster than conjugate gradient algorithm. I use default parameters except for the maximum iterations which is set as 500. The factory I use is fixed_rank_2factors_preconditioned (the result remains without "preconditioned"). But I'm confused about this kind of outcome, why is trust region faster than conjugate gradient? Is this kind of phenomenon normal? Thanks for your time.

Kai

Nicolas Boumal

unread,

Oct 13, 2016, 5:18:00 PM10/13/16

to Manopt

Hello Kai,

No worries, that's what the forum is for.

I imagine you did not specify the Hessian of the cost, and I suppose you judge that one is faster than the other based on time, and not based on iterations count? (Iterations in trust regions are typically much more expensive than for CG).

Your observation is familiar: I also tend to find trust regions (RTR) are faster than non-linear conjugate gradients (CG) (at least, as they are implemented in Manopt, with default parameters, which certainly are not adequate in all situations.)

I do not have a satisfying answer for why that is. Of course, we have convergence results for RTR which state explicitly that we can expect superlinear convergence with it (under some assumptions, and if the Hessian approximation is good enough), whereas superlinear convergence results for CG on manifolds are hard to come by (although there has been recent work on Riemannian CG and I am not up to date.) So, theoretically, I suppose we could expect this outcome; but I don't think that explains it. CG appears to converge superlinearly in many situations also. Why RTR routinely beats CG is a mystery to me.

Sorry that I don't have an answer.

Best,

Nicolas

lineya...@gmail.com

unread,

Oct 13, 2016, 11:32:48 PM10/13/16

to Manopt

Thanks for your insightful views. Actually,I compared the overall time for the above question. In past, I tried to compare the time cost in each iteration (just like what you do in your paper, X-axis is time, Y-axis shows the decreasing of objective value). Then trust region is a little more time consuming but they are quite close to each other, and after a few iteration, CG has a long tail while trust region can find its optimal point more quickly. Perhaps it's because the objective function may be approximated not that good in CG, I think. Thanks very much.

Kai

lineya...@gmail.com

unread,

Oct 14, 2016, 8:06:18 AM10/14/16

to Manopt

What's more, I did specify the hessian, and use storage to reduce the computation cost.

Nicolas Boumal

unread,

Oct 14, 2016, 9:32:11 AM10/14/16

to Manopt

In that case it's different: RTR will use the Hessian, whereas CG will not. This means RTR has access to more information about the cost function.

This is especially meaningful if the cost function is ill-conditioned, in which case Hessian information is a big help.