log example and required toolboxes

115 views
Skip to first unread message

jjg...@gmail.com

unread,
Aug 23, 2013, 9:36:27 PM8/23/13
to manopt...@googlegroups.com
Hello,

First, thank you for all of the work that you've devoted to this project. It's an incredible resource!

I am working on the following problem and I would really appreciate some guidance.

I have the following:

- Tx1 vector 'Z'
- TxM matrix 'Y'
- NxT matrix 'X'

I am trying to find the MxN matrix 'A' of rank 'k' that minimizes the following function:

function f = log_optimal(A)

f = 0;
for t=1:T,
f = f - log( Z(t) + Y(t,:)*A*X(:,t) );
end
f = f/T;

end

I'm comfortable computing the Euclidean gradient, but if it's not too much trouble, I could use a quick refresher on the Euclidean Hessian.

Also, I would really appreciate some advice on which of the fixed-rank factory manifolds in the toolbox is best suited to my purpose. I've been trying to work from the example for fixedrank_2factors (http://www.manopt.org/manifold_documentation_fixedrank_2factors.html), but I'm only beginning to learn about manifolds and at this point I don't understand them well enough to know whether this is the best fit for my problem.

Finally, I was wondering if you could provide a list of the necessary Matlab toolboxes that are required to run your toolbox. For example, I discovered today that I need to have the Control toolbox installed to have access to the 'lyap' function. Do you know if there are any other required toolboxes?

Thanks again for sharing all of your hard work!

Jeremy Graveline


Nicolas Boumal

unread,
Aug 24, 2013, 10:10:36 AM8/24/13
to manopt...@googlegroups.com
Hello Jeremy,

Thank you for your message, this is interesting feedback!

About lyap: we didn't realize it required a toolbox, it's good you pointed it out. Here is a replacement function you can use for small problems (say, ranks not larger than 30) if you do not have lyap available:

function X = my_lyap(A, B, C)
% Replacement for lyap if it is not available.
% A, B and C must be square and have the same size.
% This is excruciatingly slower than it could be.

    mat = @(x) reshape(x, size(A));
    vec = @(X) X(:);
    
    % % This exploits the structure better, but doesn't work very well and
    % % is actually slow.
    % tol = 1e-8;  maxit = 1000;
    % [x, ~] = gmres(@(x) vec(A*mat(x) + mat(x)*B), -vec(C), size(A, 1), tol, maxit);

    % This will break down for sizes larger than, say, 30, but is otherwise
    % pretty accurate (it's just slow though).
    I = eye(size(A));
    x = -(kron(I, A) + kron(B', I))\vec(C);

    X = mat(x);
    
end


(in what follows, I write Z, T, M and N as lower-case letters z, t, m and n, as a habit for scalars and vectors; my capital letters denote matrices).

About your optimization problem: could you tell us a little bit more about the properties of the data X, Y, z and the sizes t, m, n, k?

I wrote a bit of Matlab code for your cost and its Euclidean gradient and generated random data, and I find that the code crashes (lyap fails):

 
function manopt_jeremy_graveline()

    % Arbitrary sizes
    m = 10;
    n = 12;
    t = 6;
    k = 3;

    % Random data
    z = 1+rand(t, 1);
    X = randn(n, t);
    Y = randn(t, m);
    
    problem.M = fixedrankfactory_2factors(m, n, k);
    
    problem.cost = @cost;
    function f = cost(A)
        % A is a structure representing a rank-k matrix L*R'
        L = A.L;
        R = A.R;
        f = -mean(log(z + diag((Y*L)*(R'*X))));
    end

    problem.egrad = @egrad;
    function g = egrad(A)
        L = A.L;
        R = A.R;
        % We differentiate with respect to L and R separately
        w = z + diag((Y*L)*(R'*X));
        RtX = R'*X;
        YL = Y*L;
        g = problem.M.zerovec(A);
        for i = 1 : t
            g.L = g.L - (Y(i, :)'*RtX(:, i)')/w(i);
            g.R = g.R - (X(:, i)*YL(i, :))/w(i);
        end
        g.L = g.L / t;
        g.R = g.R / t;
    end

    checkgradient(problem); pause;

    options.tolgradnorm = 1e-12;
    A = conjugategradient(problem, [], options);

end

What happens, as far as I can tell (this is based on just a little bit of playing with the code, correct me if I'm wrong), is that the function is not bounded below: it seems to go down forever (although admittedly  very slowly so). The matrix A keeps growing in size as the algorithm iterates and lyap eventually can't keep up.

To test this idea, I replaced the manifold by this simple product of Euclidean spaces:

    problem.M = productmanifold(struct( 'L', euclideanfactory(m, k), ...
                                        'R', euclideanfactory(n, k)  ));

If you use this (instead of fixedrankfactory_2factors) there is no fancy geometry anymore (just two matrices L and R free to be whatever they want), and no lyap involved either. Run this, and you will see that the matrix A grows big (A.L and A.R are both on the order of 10^11 in my case), suggesting that there is nothing in the cost function that prevents them from growing big.

Perhaps there is something missing in the problem description, or the data should have some structure which will prevent this?

By the way, it is instructive to plot the cost function along a random direction to see what it looks like:

    A = problem.M.rand();
    V = problem.M.randvec(A);
    plotprofile(problem, A, V, linspace(-50, 50, 301));

You'll see that it has a funny behavior close to the random point A (around 0), and then it decreases "apparently" without ever stopping.

My suggestion then is that, before we investigate further about the Hessian and which geometry is best suited, that perhaps you could look into the problem definition again and see if either I am wrong or see if you can change the definition somewhat so that it will be bounded below (and hence be well-defined) ?

About toolboxes: I thought we didn't need any. Now I know for lyap, but I don't know if we missed a dependency elsewhere. Would love to hear about it if you run into trouble.

Thanks again for contacting us, it's a pleasure to see how people are using Manopt!

Cheers,
Nicolas

jjg...@gmail.com

unread,
Aug 27, 2013, 1:08:14 PM8/27/13
to manopt...@googlegroups.com
Hi Nicolas,

Thanks very much for your reply.

I experimented a bit more, and the problem seems to be that the argument to 'log' can go negative during the optimization, so that the cost function can return an imaginary value. I added some code to essentially truncate the argument to log in the cost function and ensure that it remains strictly positive. With that correction, the algorithm seems to converge (at least for most random data that I generate). I've pasted some code below (sorry, but I couldn't figure out how to give the code a different font to make it stand out).

I have some questions that I'd really appreciate your input on:

1) As I mentioned, I'm just learning about manifolds, and I don't have a good sense of the role that different geometries play. Is there a particular geometry that's best suited to the problem I'm trying to solve, or should I just experiment?

2) Is there a more practical/elegant method to keep the cost function real (i.e., the argument to log strictly positive).

3) Can you help me to understand the specific first order conditions that the optimization routine tries to set to 0?

Here's the code:

function test_manopt()
%% example to test manopt capabilities

t = 1000; m = 15; n = 8; k = 2;

X = [ones(1,t);randn(n-1,t)];

dt = 1/52; sig = 0.25; r = 0.03;

%% matrix of coefficients on "predicter"
A.R = randn(n,k); A.L = randn(m,k);
A.L = 0.05*A.L./repmat(mean(((A.L*A.R')*X)')',1,k);

%% mxt vector of excess returns
Y = exp( (r + (A.L*A.R')*X + 0.5*sig^2).*dt + sqrt(dt).*sig*randn(m,t));
r = exp(repmat(r.*dt,1,t));
Y = Y - ones(m,1)*r;

%% Finished generating random data

%% Create the problem structure
problem.M = fixedrankfactory_2factors(m,n,k);

%% Define the problem cost function
problem.cost = @cost;
function f = cost(Mat)
w = r + dot(Mat.L'*Y,Mat.R'*X);
%% make sure argument to log is strictly positive
if all(w>0)
f = -mean(log(w));
else
f = 10;
end
end


%% Define gradient of cost function
problem.grad = @(Mat) problem.M.egrad2rgrad(Mat, egrad(Mat));
function g = egrad(Mat)
LY = Mat.L'*Y;
RX = Mat.R'*X;
w = r + dot(LY,RX);
g = problem.M.zerovec(Mat);
for i=1:t
g.L = g.L - (Y(:,i)*RX(:,i)')/w(i);
g.R = g.R - (X(:,i)*LY(:,i)')/w(i);


end
g.L = g.L / t;
g.R = g.R / t;
end

checkgradient(problem); pause;

options.tolgradnorm = 1e-12; options.maxiter = 5000;

%% reasonable starting guess for problem
guessA = A; guessA.L = guessA.L./sig;

[solA1, cost1, info1] = conjugategradient(problem, guessA, options);

end

BM

unread,
Aug 27, 2013, 3:43:02 PM8/27/13
to manopt...@googlegroups.com
Hello Jeremy,

I am not getting a slope of 2 for your Euclidean gradient derivation. Could you check it again?

Regards,
BM

BM

unread,
Aug 27, 2013, 7:01:18 PM8/27/13
to manopt...@googlegroups.com
Hello Jeremy,

According to me, you have two issues with the cost function, positivity and unboundedness.

First, your earlier hack makes the cost function non smooth. I have two standard suggestions for the cost function. The following are smooth cost functions.

i) log(x) with x> 0 == log(sqrt(x^2 + epsilon^2)). epsilon is a small value, 1e-5.
ii) log(x) with x > 0 == log(y^2) with no constraints on y.

Second, the cost function is unbounded, i.e, log(x) with x>0 is unbounded. Therefore, the solution is always x* = inf irrespective of the earlier hack. To see this, consider scaling by a large number (this does not affect the rank).
To make it bounded you should impose a kind of norm constraint on x. This will effectively transform to adding a regularization term of lambda*trace(L'*L*R'*R) to the cost function. lambda is a parameter that you control.

Regards,
Bamdev

jjg...@gmail.com

unread,
Aug 27, 2013, 9:46:27 PM8/27/13
to manopt...@googlegroups.com
Hi Bamdev,

Thanks again for your help and all of your work on this toolbox.

The cost function is bounded for specific problem I'm working on (Y can be negative). However, not every random data set I generate has a bounded solution. For the ones that do, the toolbox is working great with my rough hack!

Also, thanks for your advice on keeping the cost function smooth. -log(x) -> \infty as x->0, but my understanding is that I need to adjust the cost function to keep it smooth but make it increase more quickly as x->0 so that the algorithm doesn't jump to an x<0. Is that correct?

Can I steal some advice on which built-in fixed-rank manifold I should start with?

Thanks again!

Jeremy

jjg...@gmail.com

unread,
Aug 27, 2013, 9:55:41 PM8/27/13
to manopt...@googlegroups.com
Hi Bamdev,

One more thing that I forgot to mention. I'm trying to use checkhessian to test whether I've coded in the Hessian for my problem correctly. I get the message "The slope should be 3. It appears to be: 2."

However, I get the same message when I run the example function, fixedrank_test(), at http://www.manopt.org/manifold_documentation_fixedrank_2factors.html

Perhaps there is a bug somewhere?

Best,

Jeremy

jjg...@gmail.com

unread,
Aug 27, 2013, 10:52:38 PM8/27/13
to manopt...@googlegroups.com
Oops. I just noticed the note below the example. Sorry for the hassle. Best, Jeremy

BM

unread,
Aug 28, 2013, 6:20:13 AM8/28/13
to manopt...@googlegroups.com
Hello Jeremy,

In my opinion, fixedrankfactory_2factors should be a good starting point. You can also experiment with others like,

i)  *_2factors_preconditioned (designed for a least square formulation but with limited testing it seems to also work better for your problem) to see whether you are a bit faster (or slower).
ii) *_2factors_subspace_projections imposes an orthogonality constraint on L. So, you have take care at initialization.
iii) Going to *_3factors counterparts can also be interesting but it primarily depends on what you want accomplish.

Deciding between these could be based on factors like robustness or numerical conditioning...
Your experience will definitely help to better understand the differences or similarities.

Below is the code (after the data generation)

    %% Create the problem structure
    %     problem.M = fixedrankfactory_2factors(m,n,k);
    problem.M = fixedrankfactory_2factors_preconditioned(m,n,k);

   
    %% Define the problem cost function
    lambda = 0; 1e-8; % Regularization parameter

    problem.cost = @cost;
    function f = cost(Mat)
        w = r + dot(Mat.L'*Y, Mat.R'*X);

        %% make sure argument to log is strictly positive
        if all(w>0)
            f = -mean(log(w)) + lambda*trace((Mat.L'*Mat.L)*(Mat.R'*Mat.R));

        else
            f = 10;
        end
    end
   
    %% Define gradient of cost function
    problem.grad = @(Mat) problem.M.egrad2rgrad(Mat, egrad(Mat));
    function g = egrad(Mat)
        LY = Mat.L'*Y;
        RX = Mat.R'*X;
        w = r + dot(LY,RX);
        g = problem.M.zerovec(Mat);
        for i=1:t
            g.L = g.L - (Y(:,i)*RX(:,i)')/w(i);
            g.R = g.R - (X(:,i)*LY(:,i)')/w(i);
           
        end
        g.L = g.L / t;
        g.R = g.R / t;
        g.L = g.L + 2*lambda*Mat.L*(Mat.R'*Mat.R);
        g.R = g.R + 2*lambda*Mat.R*(Mat.L'*Mat.L);
    end

Regards,
Bamdev

Nicolas Boumal

unread,
Sep 2, 2013, 4:04:03 AM9/2/13
to manopt...@googlegroups.com
Hello Jeremy,

About other questions you asked in the previous messages:

 * For a smooth cost function, the first order optimality conditions that the algorithms try to satisfy are simply grad f(x) = 0. The gradient here is the Riemannian gradient of f, of course. If you would like to investigate mathematically what this condition is, this will depend on the geometry (the manifold) you picked. If you're using this one: http://www.manopt.org/manifold_documentation_fixedrank_2factors.html, then what you want to look at in this documentation page is the egrad2rgrad function. Then, since you know the Euclidean gradient of your cost function, the optimality condition is egrad2rgrad( euclidean gradient of f ) = 0. If you use a different geometry for which we did not write a documentation page yet, you should go to the research paper where that geometry is described, and it will be somewhere in there. We can help you with that.

 * About the slope of 2 for the Hessian: for some geometries we do not have an expression for the exponential map (geodesics), so we use retractions instead, but those may not be second order, which then makes the checkHessian tool unusable, unfortunately.

Cheers,

Nicolas
Reply all
Reply to author
Forward
0 new messages