Matlab code for the Procrustes problem using conjugated gradient

Mårten Gulliksson

unread,

Feb 12, 2025, 9:16:52 AMFeb 12

to Manopt

Hi!

I need a simple Matlab code for solving the Procrustes problem on the Stiefel manifold using conjugated gradients. Its essential its the conjugated gradient method since I want to compare it to an algorithm building up Krylov subspaces. Thanks in advance.

Best regards,

Mårten

Nicolas Boumal

unread,

Feb 12, 2025, 9:31:31 AMFeb 12

to Manopt

Helllo Mårten, what did you try so far?

Nicolas Boumal

unread,

Feb 12, 2025, 9:36:26 AMFeb 12

to Manopt

Remarkably, Deepseek R1 has a pretty good answer (I didn't check it, but the code seems fine at first sight).

Using https://api.together.ai/playground/chat/deepseek-ai/DeepSeek-R1, I asked:

> I need a simple Matlab code using the Manopt toolbox for solving the Procrustes problem on the Stiefel manifold using conjugated gradients.

Here is the output: there is explicit code at the end.

<think>
Okay, the user now wants a simple MATLAB code using the Manopt toolbox to solve the Procrustes problem on the Stiefel manifold with conjugate gradients. Let me recall that Manopt is a specialized toolbox for optimization on manifolds, which simplifies implementing such algorithms because it handles the manifold structure and optimization solvers internally.

First, I need to remember how Manopt works. The typical steps are:

1. **Define the problem structure**: This includes specifying the manifold (Stiefel in this case) and the cost function.
2. **Set up the solver options**: Choosing the solver (conjugate gradient here) and any parameters like tolerance or maximum iterations.
3. **Run the solver**: Let Manopt handle the optimization using the defined problem and options.

For the Procrustes problem, the cost function is ||A*Q - B||_F², where Q is on the Stiefel manifold (Q'Q = I). In Manopt, I'll need to define this cost function, its Euclidean gradient, and optionally the Hessian. However, since we're using the conjugate gradient solver, Manopt can automatically compute the Riemannian gradient from the Euclidean gradient if we provide it.

The Stiefel manifold in Manopt is accessed via `stiefelfactory(n, p)`, where n is the number of rows and p the number of columns of Q. The factory function creates the manifold structure.

So, the steps in code would be:

- **Create the manifold**: Using `stiefelfactory`.
- **Define the problem structure**: With `problem.manifold`, `problem.cost`, and `problem.egrad` (Euclidean gradient).
- **Specify solver options**: Set `options.verbosity` for output details, `options.maxiter`, and `options.tolgradnorm` for stopping criteria.
- **Run the solver**: Call `conjugategradient(problem, [], options)`.

For the cost function, given a point Q on the manifold, compute f = norm(A*Q - B, 'fro')^2. The Euclidean gradient is 2*A'*(A*Q - B), as previously established.

Manopt automatically converts the Euclidean gradient to the Riemannian gradient by projecting it onto the tangent space. This avoids the need to manually handle the projection, retraction, or vector transport, which simplifies the code significantly.

Testing the code with sample matrices A and B would be good. For example, generate random matrices A (m x n), B (m x p), then solve for Q (n x p) on the Stiefel manifold.

I should also mention that the user needs to have Manopt installed and added to MATLAB's path. The code will be concise because Manopt abstracts away the manifold-related operations, allowing the user to focus on the problem-specific cost and gradient.

Let me verify the steps:

1. **Manifold Creation**: Correct, using `stiefelfactory`.
2. **Problem Definition**: The cost function and Euclidean gradient are correctly defined as anonymous functions or separate functions. Since A and B are fixed, they can be captured in the anonymous function handles.
3. **Solver Options**: Setting parameters like maximum iterations and tolerance ensures the solver stops appropriately.
4. **Solver Call**: Using `conjugategradient` with the problem and options.

Potential issues to note: Ensure that the dimensions of A, B, and Q are compatible. For instance, if A is m×n, B must be m×p, and Q will be n×p. The Stiefel factory is called with n and p accordingly.

Including comments in the code to explain each part will help the user understand the structure. Also, adding a test case with random matrices and displaying the result (like the cost value and final gradient norm) will make the example complete.
</think>

Here's a simple MATLAB code using the Manopt toolbox to solve the Procrustes problem on the Stiefel manifold with a conjugate gradient method:

```matlab
% Install and set up Manopt (ensure it's in your MATLAB path)
% Download from: https://www.manopt.org/

% Generate synthetic data
m = 5; % Rows of A and B
n = 3; % Columns of A
p = 2; % Columns of B and Q (Stiefel dimension)
A = randn(m, n);
B = randn(m, p);

% Create the Stiefel manifold (Q'Q = I)
stiefel = stiefelfactory(n, p);

% Define the Procrustes problem structure
problem = {};
problem.M = stiefel;

% Cost function: ||A*Q - B||_F^2
problem.cost = @(Q) norm(A*Q - B, 'fro')^2;

% Euclidean gradient (automatically converted to Riemannian gradient)
problem.egrad = @(Q) 2*A'*(A*Q - B);

% Set conjugate gradient options
options = struct();
options.verbosity = 1; % 0: quiet, 1: some output, 2: detailed
options.maxiter = 100; % Maximum iterations
options.tolgradnorm = 1e-6;% Stopping tolerance for gradient norm

% Solve with conjugate gradient
[Q_opt, cost_opt, info] = conjugategradient(problem, [], options);

% Display results
fprintf('Final cost: %e\n', cost_opt);
fprintf('Final gradient norm: %e\n', info(end).gradnorm);

% Verify Stiefel constraint (should be ~eye(p))
fprintf('Q''Q = \n');
disp(Q_opt'*Q_opt);
```

**Key components:**
1. **Manifold definition:** `stiefelfactory(n, p)` creates the Stiefel manifold
2. **Cost function:** Standard Frobenius norm squared
3. **Euclidean gradient:** Direct matrix calculus result
4. **Conversion to Riemannian gradient:** Handled automatically by Manopt
5. **Conjugate gradient solver:** `conjugategradient()` handles manifold-aware optimization

**Features:**
- Automatic Riemannian gradient computation
- Built-in convergence criteria
- Step-size adaptation
- Vector transport handling
- Convergence monitoring

**Output shows:**
- Optimization progress
- Final cost value
- Final gradient norm
- Orthogonality constraint verification

To use this code:
1. Install Manopt (https://www.manopt.org)
2. Adjust matrix dimensions (m, n, p) as needed
3. Replace A and B with your matrices
4. Tune options (tolerances, iterations) as required

The main advantage of using Manopt is that it handles all manifold-specific operations (retractions, transports, gradients) automatically while maintaining the conjugate gradient's convergence properties.

Bamdev Mishra

unread,

Feb 12, 2025, 10:09:31 AMFeb 12

to Nicolas Boumal, Manopt

Wow, that is some deep analysis of deepseek.

Best,

--
http://www.manopt.org
---
You received this message because you are subscribed to the Google Groups "Manopt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manopttoolbo...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/manopttoolbox/050a2463-c426-4022-8788-bf4d9d21c081n%40googlegroups.com.

Mårten Gulliksson

unread,

Feb 13, 2025, 2:54:44 AMFeb 13

to Manopt

Hi!

And thanks for the quick answer!

I started (long time ago) with asking ChatGPT to get me a Manopt code but it was messy and never worked. Maybe I did something wrong in the installation of Manopt but anyhow no success. Then I tried myself looking at the paper by Edelman but there you can only find a short code part with conjugated gradient for solving the Newton step and I never managed to translate that to a code for this Procrustes problem mainly because my knowledge in the details of parallel transport is shallow.

I just recently asked Deepseek about the code and I got one running understanding the details but the performance is very poor, see the code below (I tried to speed it up a bit). For a random starting point it converges one out of ten making my own code look silly good. That's why I put the question here. Would appreciate if you had a glance on it, with your expert knowledge in differential geometry, I suspect it is far from efficient.

Question: Do you think parallel transport will make the code more efficient? And what would that actually be doing? Sorry for my ignorance.

I will try your suggestion via Deepseek after reinstalling Manopt (I had an old version). Can I send you feedback on that through this conversation? I am sure you are busy.

Best regards,

Mårten

function Y = riemannian_conjugate_gradient(A, B, Y0, max_iter, tol)

% Riemannian Conjugate Gradient Method for the Orthogonal Procrustes Problem

% Input:

% A: m x n matrix

% B: m x p matrix

% Y0: n x p initial guess (must be on the Stiefel manifold, Y0'*Y0 = I)

% max_iter: maximum number of iterations

% tol: tolerance for stopping criterion (based on Riemannian gradient norm)

% Output:

% Y: n x p orthogonal matrix minimizing ||AY - B||_F^2

% Initialize

Y = Y0;

G = A' * (A * Y - B); % Euclidean gradient

grad_F = G - Y * (Y' * G); % Riemannian gradient

P = -grad_F; % Initial search direction (negative gradient)

iter = 0;

while iter < max_iter

% Compute the Euclidean gradient

G_prev = G;

G = A' * (A * Y - B);

% Project the gradient onto the tangent space at Y

grad_F = G - Y * (Y' * G); % Riemannian gradient

% Compute objective value and gradient norm

F = 0.5 * norm(A * Y - B, 'fro')^2;

grad_norm = norm(grad_F, 'fro');

% Print iteration details for debugging

fprintf('Iter %d: F = %.6f, ||grad_F|| = %.6f\n', iter, F, grad_norm);

% Check stopping criterion (Riemannian gradient norm)

if grad_norm < tol

fprintf('Converged after %d iterations.\n', iter);

break;

end

% Update search direction using Riemannian conjugate gradient

if iter == 0

P = -grad_F; % First iteration: steepest descent

else

% Compute the Riemannian version of the Polak-Ribière parameter

beta = trace(grad_F' * (grad_F - G_prev)) / trace(grad_F' * grad_F);

beta = max(0, beta); % Ensure beta is non-negative (Fletcher-Reeves)

P = -grad_F + beta * P; % Update search direction

end

% Project the search direction onto the tangent space at Y

P = P - Y * (Y' * P);

% Line search to find the optimal step size

alpha = line_search(Y, P, A, B);

% Ensure alpha is not too small or too large

alpha = max(1e-10, min(alpha, 1));

% Update Y using the retraction (QR-based retraction)

Y = retract(Y, P, alpha);

iter = iter + 1;

end

if iter == max_iter

fprintf('Reached maximum iterations (%d).\n', max_iter);

end

% Verify that the Riemannian gradient is close to zero at the solution

grad_F_final = A' * (A * Y - B) - Y * (Y' * (A' * (A * Y - B)));

grad_norm_final = norm(grad_F_final, 'fro');

fprintf('Final Riemannian gradient norm: %.6f\n', grad_norm_final);

end

function alpha = line_search(Y, P, A, B)

% Line search to find the optimal step size alpha (Armijo condition only)

alpha = 1; % Initial step size

rho = 0.5; % Reduction factor

c = 1e-4; % Armijo constant

max_line_search_iters = 20;

F = @(Y) 0.5 * norm(A * Y - B, 'fro')^2; % Objective function

grad_F = A' * (A * Y - B) - Y * (Y' * (A' * (A * Y - B))); % Riemannian gradient

directional_derivative = trace(grad_F' * P); % Should be negative

% Armijo backtracking line search

for i = 1:max_line_search_iters

Y_new = retract(Y, P, alpha);

F_new = F(Y_new);

if F_new <= F(Y) + c * alpha * directional_derivative

break; % Armijo condition satisfied

end

alpha = rho * alpha; % Reduce step size

end

function Y_new = retract(Y, P, alpha)

% Retraction onto the Stiefel manifold using QR decomposition

Y_new = Y + alpha * P;

[Q, ~] = qr(Y_new, 0); % Thin QR decomposition

Y_new = Q;

end

Nicolas Boumal

unread,

Feb 14, 2025, 11:16:46 AMFeb 14

to Manopt

The code generated by DeepSeek worked as is on my end, so indeed updating Manopt should help.

About parallel transport: we do not have one implemented for Stiefel. I would not a priori expect this to bring substantial gains compared to a transporter, but the only way to know for sure is to try.

Of course, for Procrustes, the most direct approach is via SVD of the data, without optimization at all.

Reply all

Reply to author

Forward