Symbolic differentiation

101 views
Skip to first unread message

bahu....@gmail.com

unread,
Aug 7, 2014, 3:31:04 PM8/7/14
to manopt...@googlegroups.com
While using your remarkable fast program, it usually takes some time to compute the derivatives, specially the Hessians in the case product manifolds.
Do you know of any tool which helps with this task? Or do you even plan to add such a tool to your program? It might contain all the rules of the matrix cookbook and do the nasty computations automatically.
As an example, I just came up with the function (of two matrix variables) || X'Y-B_1||^2+|| Y'X-B_2||^2 (Frobenius-norm), and it took me a while to correctly compute the gradient. Now I lost the energy for computing the Hessians ...

BM

unread,
Aug 8, 2014, 6:03:52 AM8/8/14
to manopt...@googlegroups.com
Thank you for the suggestion on symbolic differentiations. It is certainly worth looking at. Having said that, I am not very sure about the use of symbolic differentiations.

In your case of least-squares cost functions with Frobenius norm, it might be much better (and simpler) to derive gradients and Hessians directly. Once, we have the gradient expression, the step of computing the Hessian is, IMHO, more straightforward. In fact, personally for me the hard work is in working out the gradient expression itself. Thereafter, the chain-rule takes over.

Below is an attempt to convince you. 

Consider the cost function 0.5|| X'Y  -  B_1||^2  +  0.5|| Y'X  -  B_2||^2  (Frobenius-norm). I have added 0.5 in both the terms to make the following expressions simpler.


Few quick comments: 

1) It is a linear (in each variable) least-squares cost function. This is a big plus. 

2) It is completely symmetric w.r.t variables X and Y, pending the roles of B_1 and B_2 that change while moving from X to Y.

3) Basically, the main work is computing the partial derivative of the cost function w.r.t  X only. For Y, it should be straightforward. The expressions are below.

The partial derivative w.r.t X is   Y(Y'X  - B_1 ') +  Y(Y'X - B_2)
The partial derivative w.r.t Y is   X(X'Y  - B_2 ') +  X(X'Y - B_1)   


4)  The Hessian w.r.t X applied along (Z_X, Z_Y) is  
Z_Y(Y'X  - B_1 ')  +   Y(Z_Y'X  + Y'Z_X)  +  Z_Y(Y'X  -  B_2)  +  Y(Z_Y'X  + Y'Z_X).

If you look carefully, for the above expression, I have followed only the chain-rule on the the partial derivative w.r.t X. Nothing more.

Similarly, the Hessian w.r.t Y applied along (Z_X, Z_Y) is 
Z_X(X'Y  - B_2 ')  +  X(Z_X'Y  + X'Z_Y)  +   Z_X(X'Y - B_1)  +  X(Z_X'Y  + X'Z_Y).


Regards,
BM

Nicolas Boumal

unread,
Aug 8, 2014, 7:39:17 AM8/8/14
to manopt...@googlegroups.com
Thanks, Bamdev, for this detailed explanation.

And hello Bahu. Thanks for letting us know about this potential difficulty. Automatic or symbolic differentiation would be a nice addition indeed. We have considered it a few times, but at the moment it appears like the time it would require to add such a feature to Manopt would be too much compared to other features we judge more important (for now). Also, such differentiation tools are not something we are knowledgeable about, and perhaps that's just it: we don't clearly see how it would all work together.

We do recognize that computing gradients of matrix functions is not something everyone is familiar with. We intend to produce some tutorial material on the matter to help with this task. Hopefully, sometime soon :).

Meanwhile, trial and error can help get acquainted with the techniques, and the checkgradient tool should help.

Cheers,

Nicolas

Nicolas Boumal

unread,
Aug 8, 2014, 7:46:33 AM8/8/14
to manopt...@googlegroups.com
I should add, about computing the classical, Euclidean gradient of a matrix function f(X), where X is a matrix and f(X) is a real:

The starting point is the definition of the gradient: grad f(X) is a matrix the same size as X such that, for all other matrices U the same size as X, this holds:

(1)   D f(X)[U] = <grad f(X), U>,

where D f(X)[U] is the directional derivative of f at X along U, and <A, B> = Trace(A^T B) is the classical Euclidean inner product.

The directional derivative is usually easy to compute: it is defined as

D f(X)[U] = lim_{t -> 0} ( f(X+tU) - f(X) ) / t

so, the usual thing. All the differentiation rules you learn in high school / bachelor's apply. Except you're working with matrices, so careful with non-commutativity, but that's it.

The trick is then to write down D f(X)[U] and manipulate the expression until it look like the definition (1). Then, your gradient is standing right there.

Example:

f(X) = 0.5 * ||X - A||^2 = 0.5 <X-A, X-A>

(inner products differentiate like products: (fg)' = f'g + fg')

Df(X)[U] = 0.5*(<U, X-A> + <X-A, U>) = <X-A, U> = <grad f(X), U>

Hence, grad f(X) = X-A. That's it.

For the more complicated example you had in mind, see Bamdev's post.

Cheers,

Nicolas

bahu....@gmail.com

unread,
Aug 9, 2014, 1:31:49 AM8/9/14
to manopt...@googlegroups.com
Thanks for your computation. The result are identical to the one I obtained. I did not trust my result regarding the Hessian, because the following little test program did not confirm the correctness of the Hessian calculation:

Please try the following function on B1=rand(3); B2=rand(3);

function phi=gradtest2(B1,B2)

d=size(B1,1);

M = stiefelfactory(d, d,2);
problem.M = M;

% Define the problem cost function
problem.cost = @cost;

function f = cost(X)
f=0;
f=f+(1/2)*norm(X(:,:,1)'*X(:,:,2)-B1,'fro')^2;
f=f+(1/2)*norm(X(:,:,2)'*X(:,:,1)-B2,'fro')^2;
end

% The gradient
problem.grad = @(X) problem.M.egrad2rgrad(X, egrad(X));

function g = egrad(X)
C=(B1'+B2)/2;
g(:,:,1)=2*X(:,:,2)*(X(:,:,2)'*X(:,:,1)-C);

C=(B2'+B1)/2;
g(:,:,2)=2*X(:,:,1)*(X(:,:,1)'*X(:,:,2)-C ) ;
end

% The Hessian
problem.hess = @(X, eta) problem.M.ehess2rhess(X, egrad(X), ehess(X, eta), eta);

function Hess = ehess(X, eta)
C=(B1'+B2)/2;
Hess(:,:,1)=2*X(:,:,2)*(eta(:,:,2)'*X(:,:,1)+X(:,:,2)'*eta(:,:,1))...
+2*eta(:,:,2)'*(X(:,:,2)'*X(:,:,1)-C);
C=(B2'+B1)/2;
Hess(:,:,2)=2*X(:,:,1)*(eta(:,:,1)'*X(:,:,2)+X(:,:,1)'*eta(:,:,2))...
+2*eta(:,:,1)'*(X(:,:,1)'*X(:,:,2)-C);
end

checkgradient(problem);
drawnow;
pause;
checkhessian(problem);
drawnow;
pause;

end

Out of bad experience, I usually do not trust my code ... so please check!

BM

unread,
Aug 9, 2014, 5:33:50 AM8/9/14
to manopt...@googlegroups.com
Hello, 

Two files are attached. 

The first file, 'Test_samyak_general.m', deals with 0.5|| X'Y  -  B_1||^2  +  0.5|| Y'X  -  B_2||^2 in generality, i.e., X and Y can have different sizes.

The second file, 'Test_samyak_specific.m', deals with the same cost function but, specifically, X and Y are on the same Stiefel manifold.

Also note that in the .m files the variables X and Y are replaced by L and R, respectively.

Do let us know if you require anything else.

Regards,
Bamdev





Test_samyak_general.m
Test_samyak_specific.m

bahu....@gmail.com

unread,
Aug 9, 2014, 9:17:31 AM8/9/14
to manopt...@googlegroups.com
Your code helped me to find the error in my calculation of the Hessian: The transposition of the second eta2 in Hess(:,:,1) as well as the transposition of the second eta1 in Hess(:,:,2) is incorrect.

Considering the amount of time which I (and you!) have invested in this calculation again brings me back to my point of automatic differentiation - if only for problems with Frobenius norms and polynomial matrix functions. But - I understand well that this would be a major tasks, and I suppose other items are more urgent.

Thanks again for your wonderful program which deserves wide propagation!

BM

unread,
Aug 9, 2014, 3:43:35 PM8/9/14
to manopt...@googlegroups.com
Hello Bahu,

We are glad that the discussion and code helped find the correct computation. Nicolas' post on gradient computation nicely characterizes the steps involved. 

Regards,
BM

BM

unread,
Aug 9, 2014, 3:49:30 PM8/9/14
to manopt...@googlegroups.com
Hello Nicolas, 

Thanks for the nice explanation. It, indeed, resonates my understanding.

Regards,
Bamdev

Nicolas Boumal

unread,
Aug 12, 2014, 4:02:09 AM8/12/14
to manopt...@googlegroups.com
Hello Bahu,
 
Considering the amount of time which I (and you!) have invested in this calculation again brings me back to my point of automatic differentiation - if only for problems with Frobenius norms and polynomial matrix functions.

That's a fair point. We'll think about it.

Cheers,
Nicolas

Nicolas Boumal

unread,
Aug 18, 2014, 4:57:18 AM8/18/14
to manopt...@googlegroups.com
By the way, this blog post here is quite informative on the whole business of computing gradients of matrix functions:


One of the commenters recommends this resource:

http://research.microsoft.com/en-us/um/people/minka/papers/matrix/minka-matrix.pdf

And wikipedia hosts a long list of worked out formulas:

bahu....@gmail.com

unread,
Sep 16, 2014, 10:40:43 AM9/16/14
to manopt...@googlegroups.com

Jamie Townsend

unread,
Nov 25, 2015, 7:33:57 AM11/25/15
to Manopt

stochasti...@yahoo.com

unread,
Dec 15, 2019, 5:33:12 PM12/15/19
to Manopt
ADiMAT https://www.sc.informatik.tu-darmstadt.de/res/sw/adimat/general/index.en.jsp provides matrix level automatic differentiation in MATLAB.  Unfortunately, it doesn't appear to be in a very active state of development recently.

The reverse mode of automatic differentiation works out quite nicely for gradient computation, but Hessians can be rather computationally intensive to evaluate.

Mark L. Stone

unread,
Dec 15, 2019, 6:05:45 PM12/15/19
to Manopt
Whoops. I see the Manopt developers are already familiar with ADiMAT https://groups.google.com/forum/#!topic/manopttoolbox/JoQsGHUy8OU

Nicolas Boumal

unread,
Dec 15, 2019, 6:30:05 PM12/15/19
to Manopt
Thanks for sharing anyway!
Reply all
Reply to author
Forward
0 new messages