Find low rank matrix for multivariable linear regression

Luan Le

unread,

Oct 16, 2024, 5:19:26 PM10/16/24

to Manopt

Thank you for this nice package! I want to find a low rank matrix A that minimizes Frobenius norm of Y - M*A where Y and M are known. Both M and Y are large, and M is sparse. Here is my attempt on a simulated dataset:

% Matrix dimensions and parameters
m = 1500; n = 700; p = 650; % Matrix dimensions
rank_values = 0:100:600; % Range of rank values to test
rank_values(1) = 1;
lambda = 0; % Regularization parameter
k_folds = 5; % Number of cross-validation folds

% Generate sparse matrices M and true low-rank A.
M = sprandn(m, p, 0.1); % 1% non-zero elements in M.
M(M~=0) = 1;
A_true = randn(p, 200) * randn(200, n);
Y = M * A_true + 10 * randn(m, n); % Noisy matrix multiplication

% Initialize arrays to store losses for each rank and fold.
train_loss = zeros(length(rank_values), k_folds);
test_loss = zeros(length(rank_values), k_folds);

% Cross-validation: Generate random indices for 5 folds.
indices = crossvalind('Kfold', m, k_folds);

% Loop through each rank value.
for i = 1:length(rank_values)
rank_val = rank_values(i);
fprintf('Testing rank: %d\n', rank_val);

% Define the manifold for matrix A with fixed rank.
manifold = fixedrankembeddedfactory(p, n, rank_val);

% Loop over k folds.
for fold = 1:k_folds
fprintf('Fold: %d\n', fold);

% Split the data into training and testing based on the fold.
test_idx = (indices == fold);
train_idx = ~test_idx;

% Create training and testing matrices.
Y_train = Y(train_idx, :);
Y_test = Y(test_idx, :);
M_train = M(train_idx, :);
M_test = M(test_idx, :);

% Define the optimization problem for the current fold.
problem.M = manifold;
problem.cost = @(A) 0.5 * norm(Y_train - M_train * A.U * A.S * A.V', 'fro')^2;

problem.egrad = @(A) M_train' * (M_train * A.U * A.S * A.V' - Y_train);
%https://www.manopt.org/lifts.html
%problem.ehess = @(A, Adot) M_train'*M_train*Adot;

% Set optimizer options (trust-region).
options.verbosity = 1;
options.maxiter = 100;

% Initialize A randomly on the manifold.
A_init = problem.M.rand();

% Run the optimization.
[A_opt, ~, ~] = trustregions(problem, A_init, options);

% Reconstruct the optimized matrix A.
A_matrix = A_opt.U * A_opt.S * A_opt.V';

% Compute and store the training and testing losses for the fold.
train_loss(i, fold) = 0.5 * norm(Y_train - M_train * A_matrix, 'fro')^2;
test_loss(i, fold) = 0.5 * norm(Y_test - M_test * A_matrix, 'fro')^2;
end
end

% Compute the average loss across all folds for each rank.
avg_train_loss = mean(train_loss, 2);
avg_test_loss = mean(test_loss, 2);

%% Plot training and testing losses vs. rank.
figure;
plot(rank_values, avg_train_loss, '-o', 'LineWidth', 2);
hold on;
plot(rank_values, avg_test_loss, '-s', 'LineWidth', 2);
xlabel('Rank');
ylabel('Loss');
legend('Training Loss', 'Testing Loss');
title('Training and Testing Loss vs. Rank');
grid on;
hold off;

There are three problems I face:
1. I try to include hessian but it was error saying cannot do dot product on structure.
2. I want to do lasso regularization for matrix A, but I don't know how to extend the example given in https://www.manopt.org/lifts.html for a vector into a matrix.

3. How do I do parallel computation for 5 folds given a rank?

Thank you very much!

Nicolas Boumal

unread,

Oct 17, 2024, 4:58:06 AM10/17/24

to Manopt

Hello,

You could take inspiration from the three code versions here:

https://www.manopt.org/lifts.html#burer-monteiro-for-general-bounded-rank

The tool "euclideanlargefactory" is new to Manopt 8.0. It won't do exactly what you need here, I believe, because your Y matrix seems to be a general large matrix (dense, not low-rank). So you would likely want to ignore that part.

Overall, this means the code won't be very efficient on large matrices, but this is more due to the fact that the gradient and Hessian of f do not admit parsimonious representations.

See code below.

About your specific questions:

The issue is that the point, vectors in R^(mxn), and tangent vectors to the manifold are not represented as one matrix, but rather as structures that contain matrices; and the documentation explains how those matrices correspond to the underlying matrix of size mxn. This is for efficiency purposes when m and n are large.
Lasso regularization (1-norm) could be extended to matrices. But would you not prefer nuclear norm regularization? (That is already implemented.)
For parallel computing, you would likely want to use Matlab's parallel computing toolbox, which notably allows you to use "parfor" instead of "for".

Best,

Nicolas

clear; clf; clc;

% Matrix dimensions and parameters

m = 1500; n = 700; p = 650; % Matrix dimensions

r = 100; % Optimization rank

% Generate sparse matrices M and true low-rank A.

M = sprandn(m, p, 0.1); % 10% non-zero elements in M.

M(M~=0) = 1;

rank_true = 200;

A_true = randn(p, rank_true) * randn(rank_true, n);

Y = M * A_true + 10 * randn(m, n); % Noisy matrix multiplication

% This is a type of manifold lift that lets us optimize over bounded rank

% matrices, as opposed to optimizing on matrices of fixed rank.

desing = desingularizationfactory(p, n, r);

problem.M = desing;

Mtimes = @(A) M*desing.triplet2matrix(A);

problem.cost = @(A) .5*norm(Mtimes(A) - Y, 'fro')^2;

problem.egrad = @(A) M.'*(Mtimes(A) - Y);

problem.ehess = @(A, Adot) MtM(M, desing.tangent2ambient(A, Adot).X);

function eh = MtM(M, LR)

eh.L = M.'*(M*(LR.L));

eh.R = LR.R;

end

figure(1);

checkgradient(problem);

figure(2);

checkhessian(problem);

X = trustregions(problem);

Luan Le

unread,

Feb 28, 2025, 12:46:17 PMFeb 28

to Manopt

Hello Nicholas,

Thank you for your help! I've manged to figure out how to use Burer-Monteiro for general bounded rank for the problem of finding a low rank, sparse matrix X that minimizes:

0.5*|BX-A|_F + 0.5*|CX-D|_F

where A, B, C, D are sparse matrices.

I have some questions:

I use B * Rkn.to_matrix(X) which I think is not optimal for faster speed. It's probably better to be (B*X.L)*X.Rtranspose but the syntax in euclideanlargefactory does not have any function for that. My question is whether my syntax will reduce speed significantly if I have much larger matrices (e.g., A to be 200,000 x 800, X to be 800 x 1000).
The optimizatation takes long time if I increase the maximal rank allowed r_guess, e.g., increasing from 5 to 60. The reason is because I want to plot how test error varies with rank. In this case, can I use the Burer-Monteiro method or need to swithc to fix rank method and test for each rank individually?
Running with regularization takes longer time than without regularization, and finally crashes my laptop (Apple M2, 16GB). Can you suggest on how to use regularization for large matrices?
I'm really motivated to learn optimization on manifold from your software and book. However, my limited knowlege on linear algebra and math in general still hinders my attempt. I only know basic linear algebra (e.g., PCA, SVD). Can you give me advice on more basic knowledge in order to study your book?

Thank you very much!

Regards,

Luan

The code is attached for your reference.

% Problem Setup

m = 1000;

n = 1500;

r = 5;

r_guess = 5;

k = 800; % Dimension of X

% Generate low-rank sparse matrix X that needs to be found

L_X = randn(k, r);

L_X(abs(L_X) <= 1) = 0;

R_X = randn(n, r);

R_X(abs(R_X) <= 1) = 0;

X_true = sparse(L_X * R_X'); % True low-rank matrix

% Generate random sparse matrices B and C

B = randn(500, k);

B(abs(B) <= 1) = 0;

B = sparse(B);

C = randn(700, k);

C(abs(C) <= 1) = 0;

C = sparse(C);

% Generate A and D with noise

noise_level = 0.1; % Adjust noise level as needed

A = B * X_true + noise_level * randn(500, n);

A = sparse(A);

D = C * X_true + noise_level * randn(700, n);

D = sparse(D);

% Euclidean Large Factory for X (k x n)

Rkn = euclideanlargefactory(k, n);

% Define the downstairs problem

dwnstrs.M = Rkn;

% dwnstrs.cost = @(X) 0.5 * Rkn.dist(B * Rkn.to_matrix(X), A)^2 + 0.5 * Rkn.dist(C * Rkn.to_matrix(X), D)^2;

dwnstrs.cost = @(X) 0.5*norm(B * Rkn.to_matrix(X) - A, 'fro')^2 + 0.5*norm(C * Rkn.to_matrix(X) - D, 'fro')^2;

dwnstrs.grad = @(X) B' * (B * Rkn.to_matrix(X) - A) + C' * (C * Rkn.to_matrix(X) - D);

dwnstrs.hess = @(X, Xdot) B' * B * Rkn.to_matrix(Xdot) + C' * C * Rkn.to_matrix(Xdot);

% Define the lifting (Burer-Monteiro)

lift = burermonteiroLRlift(k, n, r_guess);

%upstairs = manoptlift(dwnstrs, lift);

%for adding regularizer

lambda = 0.1;

upstairs = manoptlift(dwnstrs, lift, [], lambda);

% Initialization close to zero

LR0.L = randn(k, r_guess) / sqrt(k*r_guess) / 1000;

LR0.R = randn(n, r_guess) / sqrt(n*r_guess) / 1000;

% Solve using trust-regions

LR = trustregions(upstairs, LR0);

% Reconstruct X_predict

X_predict = LR.L * LR.R';

% Plotting comparison

figure;

scatter(X_true(:), X_predict(:), 'k.');

hold on;

plot([-10, 10], [-10, 10], 'r-', 'DisplayName', 'y=x'); % Add y=x line

hold off;

xlabel('X_true');

ylabel('X_predict');

title('Scatter Plot: X_true vs X_predict');

legend('Location', 'best');

% Print singular values for verification

[Ql, Rl] = qr(LR.L, 0);

[Qr, Rr] = qr(LR.R, 0);

singular_values_LR = svd(Rl*Rr')';

fprintf('Singular Values (LR): %s\n', mat2str(singular_values_LR));

Nicolas Boumal

unread,

Mar 3, 2025, 9:30:27 AMMar 3

to Manopt

Hello,

About question 1, please check out the times() functions in euclideanlargefactory:

https://github.com/NicolasBoumal/manopt/blob/master/manopt/manifolds/euclidean/euclideanlargefactory.m#L56

It is designed to compute the product of two matrices without forming them explicitly (as the function to_matrix() does).

About question 4, I do not have a specific recommendation, but I would suggest you look for MOOCs or other open courses about numerical analysis, numerical linear algebra and continuous optimization. These are good topics to get introduced to before going into Riemannian optimization.