Why always transport to the latest iterate (instead of the first one) in L-BFGS

36 views
Skip to first unread message

张亦弛

unread,
Aug 17, 2025, 3:48:55 AMAug 17
to Manopt
Hello. I have a question on Riemannian L-BFGS.

As far as I know, in each iterate of L-BFGS, all the previous steps s_k and changes in gradient y_k are transported to the tangent space of the current point, where the new step is calculated. As optimization goes on, more and more vectors will have to be transported in a single iterate. Is it not too computationally intensive?

Why not transport the vectors of the current iterate to a reference point (e.g. the first iterate)? The new step is calculated at the reference point and transported back to the current point. In that way, vector transport is done only twice (once backward, once forward) in each iterate.

Thanks for answering.

Nicolas Boumal

unread,
Aug 18, 2025, 4:08:14 AMAug 18
to Manopt
Hello,

That's an interesting point.

In L-BFGS, the L stands from "limited memory", which means the work we did at x_0 will vanish from memory after a while, but let's entertain the idea nevertheless. After all, x_0 still exists, so whichever tangent vectors we decided to keep in memory, we surely can memorize "as transported to x_0", even if none of those vectors were at x_0 originally.

In my mind, the main reason for transporting everything to the most recent point is because vector transport "loses meaning" over long distances. What I mean by that is that tangent spaces at points x and y are quite similar if x and y are near one another (and therefore one can make reasonable proposals for how to transport vectors from one tangent space to the other), but if x and y are far appart then comparison (and therefore transports) become trickier.

As an example, consider the sphere in R³: if y = -x (antipodal points), there really is no good way to transport vectors from x to y. Even the Riemannian parallel transport isn't well defined, because there exist infinitely many minimal geodesics from x to y.

Coming back to optimization, the premise is that when we generate a sequence of iterates x_0, x_1, x_2, ..., then two consecutive points x_k and x_{k+1} are typically near each other, but x_k may be quite far from x_0 as k grows. Thus, we "feel comfortable" transporting a bunch of vectors from x_k to x_{k+1}, but we may feel less comfortable transporting vectors from x_k all the way back to x_0.

That said, BFGS and its variants are already heuristic to a large extent, so why not try?

It would certainly be interesting to implement the variant you propose, with some heuristic choices for what the transports might look like. One might even heuristically decide to "restart" the process (that is, "reset" the index to 0) whenever $x_k$ has travelled too far from $x_0$ (which could be determined heuristically as well, e.g., based on the sum of norms of retracted tangent vectors used to generate the sequence). This way, x_0 would never be too far from x_k, and perhaps that would help.

When we redefine x_0, we could either transport all the info from our previous x_0 to the new one, or we could just discard it all and really, actually restart.

I'll be curious to hear back if you try it out!

PS: BFGS is especially well suited to manifolds that have a lot of symmetries, such as Lie groups for example. There, we do have a reference tangent space (the tangent space at the identity, called the Lie algebra), and it's quite natural to keep all information in that reference tangent space at all times. Then, there are no vectors transports at all (it's free, nothing to do!).

Best,
Nicolas

Pierre Absil

unread,
Aug 18, 2025, 4:52:28 AMAug 18
to Manopt
For the above idea to yield a significant speedup, the vector transport should better be the computational bottleneck. This will require taking m (the limited memory size) "large enough" (to be determined experimentally). In proof-of-concept experiments, it may help to choose a computationally cheap objective function, e.g., take N small and the C_i's sparse in the problem described in section 7.1 of https://www.math.fsu.edu/~whuang2/papers/ARBMDRNOP.htm. -PA
Reply all
Reply to author
Forward
0 new messages