Find a matrix that simultaneously diagonalize a set of matrices.

153 views
Skip to first unread message

Muntadher

unread,
Apr 9, 2018, 6:50:24 AM4/9/18
to Manopt
Dear Nicolas,

I would like to thank you for your valuable information and your help to answering the questions. 

I just want to ask you one more question if possible. 

Suppose we have this function F(X)=sum_{k=1}^{K} (X^H R_k X ), that is unity invariant and X satisfy the following constraint (X^H X)=I. 
Rks are set of rank deficient matrices that are not orthogonal to each other.  

I would like to exploit the properties of the manifolds in order to obtain a unified matrix (X) (Unitary or non-Unitary does not matter), that could simultaneously/ jointly diagonalize the set of matrices Rks (or approximate diagonlization). 

Is this possible? Really need your help if possible. 

Really appreciate your kind help. 
Muntadher 

Nicolas Boumal

unread,
Apr 9, 2018, 10:30:27 PM4/9/18
to Manopt
Hello Muntadher,

I believe the problem you describe is exactly what is addressed in this paper: https://sites.uclouvain.be/absil/Publi/JD-Stiefel.htm 

Gradient and Hessian are given in Section 2.3 and could be implemented in Manopt rather directly, using the stiefelfactory (or the complex version) for the Stiefel manifold.

I hope this helps,
Best,
Nicolas

Muntadher

unread,
Apr 10, 2018, 12:56:01 PM4/10/18
to Manopt
Dear Nicolas,

I would like to thank you very much for your kind help. Thanks for your valuable information. 

 I will read it and defiantly I will use the Manopt. If I have face any problem, I will let you know. 

My best regards,
Muntadher 

Muntadher

unread,
Apr 12, 2018, 2:51:45 PM4/12/18
to Manopt
Dear Nicolas,

Following our discussion about the joint diagonalization issue for set of matrices. I have attached to you the papers that I am trying to implement using the Manopt. 

I thing these is some error which it not converge. I have attached code for the papers (JD_2002 paper and JD_2009).   The 2009 paper that you recommended to me. 

I believe that you could figure out why it dose not converge to the unitary matrix X the approximate jointly diagonalize the matrices. I have attached all the functions that you need to test it yourself. I hope you could help me with this. 

I believe it is possible to do at least approximate diagonalization using the Manopt. Please I need your help if possible. 

My best regards,
Muntadher 




On Tuesday, 10 April 2018 03:30:27 UTC+1, Nicolas Boumal wrote:
JD_2002.zip
JD_2009.zip
Soft dimension reduction for ICA by joint diagonalization on the Stiefel manifold.pdf
JOINT DIAGONALIZATION OF CORRELATION MATRICES BY USING GRADIENT METHODS WITH APPLICATION TO BLIND SIGNAL SEPARATION.pdf

Nicolas Boumal

unread,
Apr 13, 2018, 10:05:30 AM4/13/18
to Manopt
Hello Muntadher,

In the files for JD_2009, gradient function, you have two lines:

%       DY= DY+real(trace((diag((Y'*Rk_K(:,:,p)*Y)))*(diag((Y'*Rk_K(:,:,p)*Y)))'));
     
      % This is given in the paper 
      DY=DY+4*Rk_K(:,:,p)*Y*diag(diag(Y'*Rk_K(:,:,p)*Y)); 


Previously, you had commented the formula given in the paper and implemented the other line (which is here commented.) When I use the formula from the paper, I do see convergence, and the checkgradient test works (not with the other formula.)


Best,
Nicolas

Muntadher

unread,
Apr 13, 2018, 11:59:07 AM4/13/18
to Manopt
Dear Nicolas,

I would like to thank you very much for you answer and kind cooperation. Unfortunately, it does not give me the desire results + it seems that it coverage to different things every iteration. 

I believe that there is something incorrect. I also believe that there should be a way to do it in the Manopt. The criteria should be find the unitary X of (X^H Rk_K X) that maximize the diagonal and minimize the off-diagonal as much as we can.

I really need you help if possible.   

My best regards,
Muntadher 

Nicolas Boumal

unread,
Apr 13, 2018, 3:02:28 PM4/13/18
to Manopt
Hello Muntadher,

This is a non-convex problem: it is quite likely that there are numerous suboptimal local optima. There is no guarantee in general that local optimization methods such as steepest descent or trust regions etc. would converge to a global optimum.

One thing that can help a lot is to pick your initial guess better. Perhaps you can try computing the exact diagonalizers for each one of the data matrices, and use each of these as an initial guess? Then, you run optimization multiple times, and you see which outcome is the best?


Best,
Nicolas

Muntadher

unread,
Apr 13, 2018, 3:19:14 PM4/13/18
to Manopt
Dear Nicolas,

Thank you very much for the valuable information. You are deferentially correct. This is a non-convex problem.

For one Rk, the best (exact diagonalizer), would be the eigenvectors of Rk. For multi-matrices (set of Rks), I have tried to use the summation of the the eigenvectors of each of the Rk, but yet the results is not the desire one. 

If I understand you correctly, please correct me if I am wrong, your advice is to take the eigenvectors of first one try to use it as a unified one for the all matrices. Then, try the second eigenvectors of the second Rk and chick and so on until I have found the best matrix that can can be used for all the matrices? Is that what you mean? or something else ? I still use the Manopt correct? 

Really appreciate your help.

Best regards,
Muntadher    

Nicolas Boumal

unread,
Apr 13, 2018, 3:34:49 PM4/13/18
to Manopt
For each Rk, you get a unitary matrix. Use it as initial iterate (initial guess) for optimization with manopt (the optimization involves all Rk's). That means: you solve the same optimization problem K times (where K is the number of Rk's), only, every time, you start from a different initial guess. Then, see which output was the best.

Muntadher

unread,
Apr 13, 2018, 3:52:03 PM4/13/18
to Manopt
Dear Nicolas,

I will try it and see what kind of improvement I can get. I will inform you if I get something. Thank you very much for your kind help. 

My best regards,
Muntadher 

Muntadher

unread,
Apr 24, 2018, 8:16:05 AM4/24/18
to Manopt
Dear Nicolas,

Back again to our problem. I have modified the cost function a bit. I hope this would work much better. Suppose the following function:- 

$F(X)=\sum_{k=1}^{K} || \text{off}\{ X^{H}R_{k} X\} ||^{2}_{F}$,

where $R_{k} \in C^{N \times N}$, $X$ is in the above case are a square matrix i.e $X \in C ^{N\times N}$. 

$||.||^{2}_{F}$ is the frobenius norm and off{x} is  the off-diagonal element  (the sum of the squared magnitudes of the off-diagonal elements of a matrix).   
 
The Euclidean Gradient of the above function w.r.t $X$ is given by,

$Gr(X)= 2 \sum_{j=1}^{J} R_{k} X \big[X^{H}R_{k} X -I \odot (X^{H}R_{k} X)\big]$. 

where $\odot$ is the element-wise matrix multiplication.   

Now, assume that we are looking for a non-square matrix $Q$ i.e $Q \in C^{N \times P} $, so we will define a new function,

$F(Q)=\sum_{k=1}^{R} || \text{off}\{ Q^{H}R_{k} Q\} ||^{2}_{F}$,

$Q=X M$ where $M= I_{N,P}$ identity matrix with $N$ rows and $P$ columns in order to truncate the number of columns in the $X$. 

I am looking for computing the Euclidean Gradient of the above function w.r.t $X$ in order to proceed with the optimization over the manifold. Is there any idea on how to computing the Euclidean Gradient of the new function? Really appreciate your help. 

Best regards,


On Friday, 13 April 2018 20:34:49 UTC+1, Nicolas Boumal wrote:

Nicolas Boumal

unread,
Apr 24, 2018, 9:43:47 AM4/24/18
to Manopt
Hello,

On this forum, one of the first topics in the list is about computing gradients of matrix functions. There is also a link to a website that can automate much of the computations for real matrices (then there is a bit of guessing to do to adapt to complex matrices; there is no guessing required if you understand the underlying strategy, which is described in one of the webpages linked in that topic.)

Best of luck,
Nicolas

Muntadher

unread,
Apr 24, 2018, 2:33:58 PM4/24/18
to Manopt
Dear Nicolas,

Thank you very much for your reply.

In fact, I have tried with the link (matrix calculus) but the problem is that the matrix that we have is complex. I would really appropriate if you could help me with this. 

Best regards,

Nicolas Boumal

unread,
Apr 25, 2018, 12:27:10 PM4/25/18
to Manopt
Hello,

The only change for complex geometries is that, typically, manopt uses the following real inner product:

<U, V> = real(trace(U'*V))

Remember the definition of the gradient of f:

For a given X on the manifold, for any tangent vector U at X, we have:

Df(X)[U] = <grad f(X), U>

So, work out an expression for the directional derivative Df(X)[U], and work on this expression until it looks like real(trace([something]'*U) --- the something is your gradient.

If f is defined for all matrices (not just on the manifold) -- that's typically the case -- then Df(X)[U] = lim_{t -> 0}  ( f(X+tU) - f(X) ) / t.

Best,
Nicolas

Muntadher Q Alsabah

unread,
Sep 6, 2018, 3:30:33 PM9/6/18
to Nicolas Boumal, Manopt
Dear Nicolas, 

I am still not solve my problem yet.  Just to remind you I have the following function, 

F(X)=\sum_{k=1}^{K} || \text{off}\{ X^{H}A_{k} X\} ||^{2}_{F},

off {R}=R-{diag}({diag}(R)),  where, A_{k} \in C^{N \times N} and X \in C ^{N\times N} || R ||^{2}_{F} is the Frobenius norm squared, which is equal to the matrix trace of R R^{H}, where R^{H} is the conjugate transpose (Hermitian), i.e., 

||R||^{2}_F=\text{Tr}(R R_{}^{H})


The Euclidean Gradient of the above function w.r.t (X) is given by,

Gr(X)= 2 \sum_{k=1}^{K} A_{k} X \big[X^{H}A_{k} X -I \odot (X^{H}A_{k} X)\big], where, \odot is the element-wise matrix multiplication (Hadamard product).   

Suppose that I have changed the above function to new one, 

F(X)=\sum_{k=1}^{K} || \text{off}\{ (X J)^{H}A_{k} XJ\} ||^{2}_{F},

By adding the non-square matrix, J \in C^{N \times P} J =I_{N,P} just an identify matrix used to trim off the columns of the matrix X or on another word to truncate the number of columns of X.  

I am not good at complex matrix differentiation and the matrix commutativity inside the trace. 

So I got an advice to differentiate the scalar function with matrix argument F(X), by expanding the terms of F(X+dX)

So the idea is by identify all the terms of dX and unify them in single term (by taking common factor). The transpose of the other factor is the gradient. However, yet, I could not find the Euclidean gradient for the new function w.r.t X. Even by using the element-wise differentiation of the classical differentiation rules for scalars.  

I need your help to find the Euclidean gradient if possible in order to find the X using one of the method on Manopt. I have attached two papers that discussed some skills about complex matrix differentiation. However, yet I could not used those tips to find the Euclidean gradient of the new function. 

Please could you help me with that. Really appreciated. 

Best regards  







-- 
http://www.manopt.org
--- 
You received this message because you are subscribed to the Google Groups "Manopt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manopttoolbo...@googlegroups.com.
Visit this group at https://groups.google.com/group/manopttoolbox.
To view this discussion on the web visit https://groups.google.com/d/msgid/manopttoolbox/9061ab36-7c9b-4a0d-b432-e2b25c7ea7e5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
http://www.manopt.org
---
You received this message because you are subscribed to the Google Groups "Manopt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manopttoolbo...@googlegroups.com.
Visit this group at https://groups.google.com/group/manopttoolbox.
To view this discussion on the web visit https://groups.google.com/d/msgid/manopttoolbox/9061ab36-7c9b-4a0d-b432-e2b25c7ea7e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Muntadher Q Alsabah

unread,
Sep 6, 2018, 4:55:08 PM9/6/18
to Nicolas Boumal, Manopt
Dear Nicolas, 

I am still not solve my problem yet.  Just to remind you I have the following function, 

F(X)=\sum_{k=1}^{K} || \text{off}\{ X^{H}A_{k} X\} ||^{2}_{F},

off {R}=R-{diag}({diag}(R)),  where, A_{k} \in C^{N \times N} and X \in C ^{N\times N} || R ||^{2}_{F} is the Frobenius norm squared, which is equal to the matrix trace of R R^{H}, where R^{H} is the conjugate transpose (Hermitian), i.e., 

||R||^{2}_F=\text{Tr}(R R_{}^{H})


The Euclidean Gradient of the above function w.r.t (X) is given by,

Gr(X)= 2 \sum_{k=1}^{K} A_{k} X \big[X^{H}A_{k} X -I \odot (X^{H}A_{k} X)\big], where, \odot is the element-wise matrix multiplication (Hadamard product).   

Suppose that I have changed the above function to new one, 

F(X)=\sum_{k=1}^{K} || \text{off}\{ (X J)^{H}A_{k} XJ\} ||^{2}_{F},

By adding the non-square matrix, J \in C^{N \times P} J =I_{N,P} just an identify matrix used to trim off the columns of the matrix X or on another word to truncate the number of columns of X.  

I am not good at complex matrix differentiation and the matrix commutativity inside the trace. 

So I got an advice to differentiate the scalar function with matrix argument F(X), by expanding the terms of F(X+dX)

So the idea is by identify all the terms of dX and unify them in single term (by taking common factor). The transpose of the other factor is the gradient. However, yet, I could not find the Euclidean gradient for the new function w.r.t X. Even by using the element-wise differentiation of the classical differentiation rules for scalars.  

I need your help to find the Euclidean gradient if possible in order to find the X using one of the method on Manopt. I have attached two papers that discussed some skills about complex matrix differentiation. However, yet I could not used those tips to find the Euclidean gradient of the new function. 

Please could you help me with that. Really appreciated. 

Best regards  

On 25 April 2018 at 17:27, Nicolas Boumal <nicola...@gmail.com> wrote:

--
http://www.manopt.org
---
You received this message because you are subscribed to the Google Groups "Manopt" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manopttoolbox+unsubscribe@googlegroups.com.

Nicolas Boumal

unread,
Sep 6, 2018, 8:44:52 PM9/6/18
to Manopt
Hello Muntadher,

Your original cost function is F(X).

Let me call your new cost function H(X). They are related as follows: H(X) = F(XJ), where J is some fixed matrix.

You already have a formula for the gradient of F, let me call if G(X).

You want a formula for the gradient of H, let me call it Q(X).

Recall the definition of gradient: G(X) is the matrix such that, for any dX (a perturbation of X), it holds that DF(X)[dX] = <G(X), dX>, where the left-hand side is the directional derivative of F at X along dX, and in the right-hand side we have an inner product (it doesn't really matter that the inputs are complex: the output is real.)

The same definition holds for Q (the gradient of H). Let's figure out the directional derivatives of H. By the chain rule:

DH(X)[dX] = DF(XJ)[dX J] = <G(XJ), dX J> = <G(XJ) J*, dX>, where J* is the Hermitian conjugate of J.

Thus, by identification with the definition of the gradient of H, we have:

Q(X) = G(XJ) J*

I hope that makes sense -- can you check it?

Best,
Nicolas

Muntadher Q Alsabah

unread,
Sep 8, 2018, 7:04:12 AM9/8/18
to Nicolas Boumal, Manopt
Dear Nicolas, 

Thank you very much for your kind reply and help. Really appreciated your effort. 

In fact, that looks a sensible approach. However, I have tried this and I got the gradient as a matrix with Q(X) \in C^{N \times N} where it suppose to be \in C^{N \times P}

So it might this is not correct. Do you have any idea. 

Best regards,
Muntadher 



Muntadher Q Alsabah

unread,
Sep 8, 2018, 1:15:56 PM9/8/18
to Nicolas Boumal, Manopt
Dear Nicolas,

You are correct. I have found that the problem is not in the gradient. The problem seems that it minimizes the main diagonal as well, which does not give the desire results. So the method works by minimizing the off diagonal, but also minimizing the main diagonal.
So trace(X*RkX) is very small compared with the trace(Rk). It should not be very small like I got. It has to be as closest as possible to the trace(Rk). I wonder if we could put some constraint. Otherwise how can we control the main diagonal not to be minimized. 

Anyway, thank you very much for your help.

Best regards

Virus-free. www.avast.com
Reply all
Reply to author
Forward
0 new messages