Optimization of functions with affine subspace as domain

45 views
Skip to first unread message

Christian Franssen

unread,
Apr 12, 2023, 7:10:29 AM4/12/23
to Manopt
Dear,

I am working on the optimization of a function defined on a Markov chain transition matrix, which in fact is a concatenation of probability mass functions. We can consider this as an optimization on a (bounded part) of an affine manifold, which can be transformed to an optimization on a linear manifold. 

The problem I am facing is that my function is not defined for non-Markov chain transition matrices, and so I cannot evalueate the Euclidean gradient. In other words, there are implicit constraints on the domain of my function and my function is only differentiable within these implicit constraints.

Is there a trivial way to deal with these implicit constraints in finding the Riamannian gradient (which I believe is determined using the Euclidean gradient in this package)?

Best,
Chris

Nicolas Boumal

unread,
Apr 12, 2023, 7:25:27 AM4/12/23
to Manopt
Hello Chris,

I would suggest you try out parameterizing your transition matrix via a product of spheres -- maybe not what you had in mind, but it has the merit of fitting into manopt fairly easily.

Specifically, let problem.M = obliquefactory(n, n);  This defines a product of n spheres in R^n.

Then, each matrix X on that manifold is a matrix of size n x n with the property that each column has unit 2-norm. As a result, if you consider P = X.^2 (that is, entry-wise square the matrix), then that matrix P has nonnegative entries and its columns sum to 1.

If you would rather that it is the rows that sum to 1, you can call obliquefactory(n, n, true); (which transposes the matrices to fit the other common convention).

Now, you can specify your cost function problem.cost = @(X) ....   and in the expression you would write X.^2 (or X.*X) instead of P.

My sense is that this might work quite well, but I'm curious to hear what you find.


>  finding the Riemannian gradient (which I believe is determined using the Euclidean gradient in this package)

Manopt allows you to define the Riemannian gradient directly, for example by setting problem.grad = ...;  But it's true that most users (myself included) define the Euclidean gradient (or more generally the "gradient in the embedding space") by setting problem.egrad = ...; then counting on Manopt to do the appropriate transformation.

Christian Franssen

unread,
Apr 12, 2023, 9:13:14 AM4/12/23
to Manopt
Hi Nicolas,

That is a good suggestion. I have experimented with such a parameterization and others, such as a softmax-like parameterization and a spherical coordinate-like parameterization. These generally work well, except when the minimizer of my function (assuming it is unique, for simplicity) is a boundary point. The fact that we try to squeeze the Euclidean space into probability simplices using a parameterization ensures the existence of vanishing gradients, complicating the search for boundary points.

I will continue my search in the direction of primal methods.

Best,
Chris  

Op woensdag 12 april 2023 om 13:25:27 UTC+2 schreef Nicolas Boumal:

Nicolas Boumal

unread,
Apr 13, 2023, 2:01:54 AM4/13/23
to Manopt
Hello Christian,

It's true that the parameterization X -> X.*X can introduce spurious critical points (the vanishing gradient problem as you said) when X contains some zeros, which is often desirable.

Spurious means that there could be a critical point in X such that Y = X.*X is not a critical point of the original function in Y.

However, it is also true that second-order critical points always map to first-order critical points. That is: if X is a second-order critical point, then Y = X.*X is necessarily a first-order critical point of the original problem.
See for example this paper with my co-authors Levin and Kileel: https://arxiv.org/abs/2207.03512  example 4.4 for a general theory, and also a paper by other authors for the specific case at hand here: https://arxiv.org/abs/2112.05273.

Since "reasonable" optimization algorithms "typically" do not converge to saddle points, this provides some reassurance that the spurious critical points are not a huge threat in practice.

Best,
Nicolas

Nicolas Boumal

unread,
Apr 13, 2023, 2:11:52 AM4/13/23
to Manopt
(In our paper with Levin and Kileel---v1 on arXiv---the statement is more readily accessible from Corollary 4.13, where the item regarding "sphere -> simplex" refers to the map y -> y.^2, that is, entry-wise squaring. It is presented as part of a more general result in Example 4.4, which leads to more involved notation, but it's "just" that.)
Reply all
Reply to author
Forward
0 new messages