Hi all,
I have a factor model where I’m assuming data Y is generated in terms of latent variables X according to:
Y = CX + s
where s is some gaussian noise. Given Y, I have implemented an EM algorithm to optimize for X, C and s. In the case of C, I want to optimize restricted to the orthogonal matrices, hence manopt.
I find that:
1.- When I run manopt on simulated data, using the SAME values for the number of latent dimensions I used in the simulation, manopt does its job perfectly and the EM algorithm converges to the correct minimum.
2.- When I run manopt on simulated data, but imposing DIFFERENT values for the number of latent dimensions (a situation more akin to what happens with true data), it has some convergence issues. Specifically, there are cases in which the gradient stabilizes at a plateau for many outer iterations. Eventually though, it does converge. However, I have been able to considerably improve this problem by tweaking Delta_bar and Delta0.
3.- When I run manopt on real data, the gradient stays in the plateau forever. Changes on Delta_bar and/or Delta0 seem to be irrelevant.
So my question is, is there something else I could do to achieve convergence. Or is there any reason why it would be impossible for this problem to converge?
best, Daniel
function [tempW, tempnoise, Tcost] = Mstep0(Y, Wold, X, CovX, Wabs, eps)
% M step for the EM algorithm for the case with no shared dimensions
%
% Inputs:
% Y -> nxN matrix of data measurements
% X -> mxN matrix of latent variable measurements
% CovX -> mxm matrix of total covariance of latent variables
% Wold -> OLD orthogonal W matrix (component of C). This
% Outputs (results from the optimization):
% tempW -> Orthogonal matrix that is a component of C
% tempnoise -> noise vector.
%% Define useful variables
[num_total, N] = size(Y);
[twice_num_lat, ~] = size(X);
num_neu = num_total;
Yn = Y(1:num_neu,:);
L = eye(twice_num_lat);
%% Define manifold, cost and gradient (add Hessian?)
tuple.Wtemp = stiefelfactory(num_neu, twice_num_lat);
tuple.epsntemp = euclideanfactory(num_neu, 1);
manifold = productmanifold(tuple);
%% Compute-only-once variables (these are combinations of variables that appear in the cost and that are constant throughout each call to Mstep0)
YnToWabs = Yn*X'*L'*Wabs';
WabsToWabs = Wabs*L*CovX*L'*Wabs';
YnYn = Yn*Yn';
function [c, store] = cost(T, store)
Wtemp = T.Wtemp;
epsntemp = T.epsntemp;
if ~all(isfield(store, {'eYn', 'eW'}))
store.eYn = diag(epsntemp)*YnToWabs;
store.eW = diag(epsntemp)*Wtemp*WabsToWabs;
end
eYn = store.eYn;
eW = store.eW;
%% COST FUNCTION
c = -trace(eYn*Wtemp') + trace(eW*Wtemp')/2 + trace(YnYn*diag(epsntemp))/2 - ...
N*sum(log(epsntemp))/2 ;
end
% GRADIENT
function [g, store] = grad(T, store)
Wtemp = T.Wtemp;
epsntemp = T.epsntemp;
if ~all(isfield(store, {'eYn', 'eW'}))
store.eYn = diag(epsntemp)*YnToWabs;
store.eW = diag(epsntemp)*Wtemp*WabsToWabs;
end
eYn = store.eYn;
eW = store.eW;
egrad.Wtemp = -eYn + eW;
egrad.epsntemp = -diag(YnToWabs*Wtemp') + diag(Wtemp*WabsToWabs*Wtemp')/2 + ...
diag(YnYn)/2 - (N/2)*ones(num_neu, 1)./epsntemp;
g = manifold.egrad2rgrad(T, egrad);
end
%% Problem definition
problem.M = manifold;
problem.cost = @cost;
problem.egrad = @grad;
% d = problem.M.dim();
checkgradient(problem)
%% Solver
Told.Wtemp = Wold;
Told.epsntemp = eps(1:num_neu);
options.tolgradnorm = 1e-07;
options.maxiter = 1000;
options.maxinner = 100;
options.Delta_bar = 200;
options.Delta0 = 50;
[T, Tcost, ~, ~] = trustregions(problem, Told, options);
%% M-step assignments
tempW = T.Wtemp;
tempepsn = T.epsntemp;
tempnoise = tempepsn;
end