Why the calls number of calculate the cost and gradient is different during the optimization.

27 views
Skip to first unread message

grandowife

unread,
May 17, 2019, 10:53:12 PM5/17/19
to Manopt
Hi there, 

Recently, I have tried an optimization problem with Manopt and MADMM [1] which can be generally described as:

1. profile on;
2. for i  = 1:maxOutterIter       --->MADMM core idea
       Unew = optU(U, E)
       Enew = optE(U, E)
   end

3. function Unew = optU(U, E)     ---> Basic Manopt
        manifold.stiefelfactory(d, p);
        problem.M = manifold;
        problem.cost = @(U) mycost4U(U);
        egrad = @(U) mygrad4U(U);
        problem.grad = @(Y) manifold.egrad2rgrad(Y, egrad(Y));
        Unew = conjugategradient(problem, U, opts);
  end
4. profile report;

Note that function OptE is similar to function OptU which will not be described here.

However, I found that the above problem is extremely time-consuming.
During the process of finding the reasons for the time-consuming: 

I observe that:

1.  When using Profile to measure the performance, it shows :
QQ截图20190518104104.png
  
And would I have the reason why the 'calls' number is different between calculating the cost (function mycost4U)  and gradient (function mygrad4U)??

2.  Is there any suggestion about speeding up the MADMM and basic Manopt procedure?
The detail codes for calculating the cost and gradient are :

function obj = mycost4U(U)       
            Ut = U'; 
            Q = max(0,  Ut* D + thresh - E .* E);
            M1 = Xt - U * Z;
            M2 = Ut * Xt - Z + Y1/rho;
            M3 = H - Ut + Y2 / rho;
            
            obj = sum(sum(M1.^2, 2)) ...
                + lambda * sum(sum( Q )) ...
                + rho * sum(sum(M2.^2, 2)) ...
                + rho * sum(sum(M3.^2, 2));
            obj = 0.5 * obj;
end

function gU = mygrad4U(U)
            Ut = U'; 
            Q = max(0,  Ut* D + thresh - E .* E);
            
            gU = Xt * (Y1'  + rho * (oneData * U) - (1 + rho)  * Z') ...
               + rho * ( U - H' ) - Y2' ...
               + 0.5 * lambda * (D * sign(Q)');
 end

The size of the above variables:   U  \in Stiefel(1024, 5);  D \in Euclidean(1024, 2016); E \in Euclidean(5, 2016); thresh is a constant; Xt \in Euclidean(1024, 64); Z \in Euclidean(5, 2016); Y1 \in Euclidean(5, 64); Y2 \in Euclidean(5, 1024);  H \in Euclidean(5, 1024);

BM

unread,
May 17, 2019, 11:31:40 PM5/17/19
to Manopt
Hello,

On the conjugate gradient solver using more cost function evaluations than gradient evaluations: it is due to the backtracking linesearch procedure. It is typical of many algorithms.

On accelerating the convergence behavior: you may want to set opts.maxiter to a small value as you may not need to solve optE and optU optimally and a few iterations should suffice.

Regards
Bamdev

grandowife

unread,
May 18, 2019, 1:11:24 AM5/18/19
to Manopt
Hi Bamdev,

I have tried as you suggested, and I found the loss decreased with iterations like:

QQ截图20190518130535.png


It is a little strange that the loss is first decreased significantly then somehow increased little by little.

This situation will happen no matter what value the hyper-parameter 'opts. maxiter ' is set to. (I set it to 6 in my experiments and just now tried 2 as you suggested)

According to this situation, any suggestions?  Is something wrong with my code?


Best 

qiuying shi




在 2019年5月18日星期六 UTC+8上午11:31:40,BM写道:
Reply all
Reply to author
Forward
0 new messages