Hello,
That's an interesting point.
In L-BFGS, the L stands from "limited memory", which means the work we did at x_0 will vanish from memory after a while, but let's entertain the idea nevertheless. After all, x_0 still exists, so whichever tangent vectors we decided to keep in memory, we surely can memorize "as transported to x_0", even if none of those vectors were at x_0 originally.
In my mind, the main reason for transporting everything to the most recent point is because vector transport "loses meaning" over long distances. What I mean by that is that tangent spaces at points x and y are quite similar if x and y are near one another (and therefore one can make reasonable proposals for how to transport vectors from one tangent space to the other), but if x and y are far appart then comparison (and therefore transports) become trickier.
As an example, consider the sphere in R³: if y = -x (antipodal points), there really is no good way to transport vectors from x to y. Even the Riemannian parallel transport isn't well defined, because there exist infinitely many minimal geodesics from x to y.
Coming back to optimization, the premise is that when we generate a sequence of iterates x_0, x_1, x_2, ..., then two consecutive points x_k and x_{k+1} are typically near each other, but x_k may be quite far from x_0 as k grows. Thus, we "feel comfortable" transporting a bunch of vectors from x_k to x_{k+1}, but we may feel less comfortable transporting vectors from x_k all the way back to x_0.
That said, BFGS and its variants are already heuristic to a large extent, so why not try?
It would certainly be interesting to implement the variant you propose, with some heuristic choices for what the transports might look like. One might even heuristically decide to "restart" the process (that is, "reset" the index to 0) whenever $x_k$ has travelled too far from $x_0$ (which could be determined heuristically as well, e.g., based on the sum of norms of retracted tangent vectors used to generate the sequence). This way, x_0 would never be too far from x_k, and perhaps that would help.
When we redefine x_0, we could either transport all the info from our previous x_0 to the new one, or we could just discard it all and really, actually restart.
I'll be curious to hear back if you try it out!
PS: BFGS is especially well suited to manifolds that have a lot of symmetries, such as Lie groups for example. There, we do have a reference tangent space (the tangent space at the identity, called the Lie algebra), and it's quite natural to keep all information in that reference tangent space at all times. Then, there are no vectors transports at all (it's free, nothing to do!).
Best,
Nicolas