I've been struggling with this problem for a while and I can't seem to find out much about it. What I have are lots of 3*3 matrices which represent linear operators. I'm interested in the properties of these linear operators and have thus decomposed them into eigenvectors and eigenvalues, most of these matrices are decomposable in the real field (have real eigenvalues) and these are useful to me, but a small minority have complex eigenvalues which are of no use to me. So my question is: given a linear operator with complex eigenvalues how does one find the nearest linear operator with real eigenvalues? Also some measure of nearness is important as well.
I reckon this would require some use of the matlab function 'norm' but I can't seem to make any progress on this problem. Any suggestions, pointers to relevant texts (as this may be an obscure maths problem), or pointers to relevant matlab functions would be greatly appreciated.
Cheers,
Gary
The simple answer is to find the nearest symmetric matrix.
However, there are some non-symmetric matrices with real
eigenvalues.
>> B = rand(3)
B =
0.39223 0.70605 0.046171
0.65548 0.031833 0.097132
0.17119 0.27692 0.82346
>> eig(B)
ans =
-0.49599
1.0481
0.69536
My immediate thought would be to solve it as an
fmincon optimization problem.
>> A = randn(3)
A =
0.55253 0.085931 -1.0616
1.1006 -1.4916 2.3505
1.5442 -0.7423 -0.6156
>> eig(A)
ans =
0.082915
-0.81879 + 1.5839i
-0.81879 - 1.5839i
Thus, find the minimal perturbation to the 3x3 matrix A,
such that eig(A + delta) is entirely real.
delta will be a 3x3 matrix, with easy to choose starting
values.
>> delta0 = (A + A.')/2 - A;
>> obj = @(delta) sum(delta(:).^2);
I might also have chosen to use norm as a measure here.
Regardless, our initial value for the additive perturbation is
the matrix that sends A to a symmetric matrix.
Supply a set of nonlinear equality constraints as an m-file.
% =============================
function [C,Ceq] = mynonlcon(delta,A)
% nonlinear constraints for fmincon eigenvalue problem
C = [];
Ceq = imag(eig(A = delta));
% =============================
Set some options.
>> opts = optimset('fmincon');
>> opts.Algorithm = 'interior-point';
And solve the problem.
>> delta = fmincon(obj,delta0,[],[],[],[],[],[],@(delta) mynonlcon(delta,A),opts)
delta
delta =
-0.041618 0.3092 -0.35434
0.71895 -0.31954 0.15115
-0.70376 0.48807 0.36116
How does this compare to delta0?
>> delta0
delta0 =
0 0.50734 1.3029
-0.50734 0 -1.5464
-1.3029 1.5464 0
Clearly delta0 is bigger.
>> norm(delta)
ans =
1.1796
>> norm(delta0)
ans =
2.0848
>> eig(A + delta0)
ans =
0.8651
-0.38934
-2.0304
>> eig(A)
ans =
0.082915
-0.81879 + 1.5839i
-0.81879 - 1.5839i
Just a quick thought. There may be better solutions, but this
seems to work.
John
[V,D] = eig(A);
A1 = real(V*real(D)/V);
The eigenvalues of A1 should be the same as the real parts of A's eigenvalues and if the imaginary parts were small, then the norm of A-A1 should be small. If the eigenvalues of A were real, then A and A1 should be equal (except for roundoff error.)
Note: The outer 'real' operator is only there to remove imaginary parts that are due to roundoff error.) The inner 'real' operator produces the essential change here.
Roger Stafford
I conjecture Roger's solution is actually optimal in the sense that minimize the spectral norm(A-A1,2), however I can't see an easy proof (I haven't not tried hard either). Roger, what is you opinion?
Bruno
abs(det(A-A1)) = abs(det(V*D*inv(V)-V*real(D)*inv(V)))
= abs(det(V*i*imag(D)*inv(V)))
= abs(det(imag(D))) = prod(diag(abs(imag(D))))
V does have an inverse because we are talking about a matrix with three distinct roots, one real and two complex conjugates.
That this is optimal in some sense remains purely conjectural on my part. It just "feels" good.
Roger Stafford
Roger Stafford
I guess the question is, what metric will you use to define
small?
John
John, I just wonder in your code will be you able to change the Frobenious's norm to spectral norm and see if the residual matrix you get is smaller than Roger's.
Thanks,
Bruno
Thanks very much guys you have been most helpful. I used Roger's solution as it was a bit simpler and more intuitive. I could in any case measure empirical how well the matrix performed relative to the one with complex eigenvalues, as each matrix was a linear transformation between two sets of measured variables (representing a dynamic I was interested in), so I could see how well the new matrix predicted the measured values relative to the old matrix. In each case the difference was negligible, so leaving proves and such aside, for my particularly problem it worked a treat. Thanks again.
Cheers,
Gary
Just thought I'd point out that this is a non-differentiable constraint, so there are certain risks using fmincon. Perhaps only on a set of measure zero, though...
Also, even though Gary has a solution he's happy with, I'll just throw out one more idea. If A is the given matrix with Jordan decomposition
A=inv(S)*J*S
then one could use the solution
Anew=inv(S)*real(J)*S
One can easily show that this minimizes the distance metric
Dist=@(B) norm(S*(A-B)*inv(S),'fro')
subject to the real eigenvalues constraint. Don't know why this metric would be worse than any other. Unfortunately, I haven't come across a way to automate Jordan decomposition in MATLAB...
> Unfortunately, I haven't come across a way to automate Jordan decomposition in MATLAB...
============
And that's possibly because it's numerically unstable. But anyway, any similarity transform which exposes the eigenvalues as entries of the matrix could be the basis for this approach.
> V does have an inverse because we are talking about a matrix with three distinct roots, one real and two complex conjugates.
>
> That this is optimal in some sense remains purely conjectural on my part. It just "feels" good.
=======
In that case, it minimizes Dist(A,B) = norm(inv(V)*(A-B)*V,'fro')
The usual way people do optimization is that: one define first the norm then minimize it. This logic is not for turning up-side-down. If one peak a specific solution in the admissible set, then arrange the definition of the norm (definition that depends on the input) so as the solution is the projection, then it's not rigorous method because one does really know what is minimized and that does not allow various bound estimation. Furthermore the eigen-vectors can be arbitrary chosen up to scaling, and that can also change the norm. Almost everything is floating in this norm. That's not good.
Bruno
I would like to have proved that the Frobenius norm, or else some other standard norm, of just the difference A-B was minimized in a way that is independent of A and B, but depending only on their difference.
Roger Stafford
>
> I would like to have proved that the Frobenius norm, or else some other standard norm, of just the difference A-B was minimized in a way that is independent of A and B, but depending only on their difference.
I would think your solution minimize the 2-norm rather the Frobenius, I do not have the exact proof but below is the reason why I think it's true:
If we take the Rayleigh–Ritz ratio of the normalized eigen vector of the residual matrix A-B (where B is your solution)
A*x = lambda*x, lambda complex eigen value
|x|=1
then
x'*(A-B)*x / (x'*x) = imag(l)
This *seems* to be the best one can do if eigen value of B is real. Thus my conjecture is that the 2-norm is minimizes (since 2 norm is defined as max of Rayleigh–Ritz ratio).
Bruno
Also if S is nearly singular, norm(inv(S)*M*S,'fro') can be very much larger than norm(M,'fro') and therefore perhaps of no particular interest in minimizing.
========
I don't see that, Roger. For instance, for any M of the form c*eye(3), the two norms coincide exactly. Regardless, if S is nearly singular, perhaps it means that you need a more demanding criterion than norm(M,'fro') to get a meaningful approximation of A anyway.
> I would like to have proved that the Frobenius norm, or else some other standard norm, of just the difference A-B was minimized in a way that is independent of A and B, but depending only on their difference.
==========
No, I don't think that's what we want to do. Nor do I agree that commutativity is necessarily of value here. Again, it's reasonable to suppose that different A might require different degrees of approximation for the task at hand and therefore that you have to make the norm adaptive to A.
Suppose, for example, that the task was instead to approximate the function
1/x with rational numbers y substituted for various points x. Suppose you had some criterion
abs(1/x -1/y)<=const.
Clearly, the distance measure you would be trying to control is
abs(x-y)/x^2 to account for the greater sensitivity of 1/x near x=0.
Bruno, although there are "floating" things in this norm, i.e., the scaling of the eigenvalues, the solution is invariant to them. Perhaps it's worth pointing out that the above norm minimization can be rewritten as the minimization of the following cost function, for any choice of the eigenvalues V
Dist(B) = norm( inv(V)*B*V - D ,'fro')
Moreover, with the change of variables Z=inv(V)*B*V
Dist(Z) = norm( Z - D, 'fro')
you always obtain the solution Z=D. Because Z is diagonal the corresponding B is invariant to how V was scaled columnwise.
>
> Bruno, although there are "floating" things in this norm, i.e., the scaling of the eigenvalues, the solution is invariant to them.
But that's even more disturbing. There is a family of "norms" that depends on scaling and the input, whereas the solution does not depend on the scaling. This solution cannot be *characterized* as the minimum of a "clean" cost function. The minimum of the norm is a property.
I'll caricature here, but I could also tell:
B = V'*real(D)*V, solution proposed by Roger, minimizes the cost function
norm(Ap-B, 'fro') where Ap:=V'*real(D)*V, and [V D]=eig(A).
That claims of course does not have any intrinsic value and it's certainly true. In a way you are doing more or less like this, you define a norm that is floating.
Bruno
Don't follow that. So what if a solution happens to be the minimum of multiple norms? We know that given an invertible linear equation y=A*x the solution x=inv(A)*y can be obtained by minimizing ||A*x-y|| for any norm whatsoever. Does that invalidate any of these norms?
> I'll caricature here, but I could also tell:
>
> B = V'*real(D)*V, solution proposed by Roger, minimizes the cost function
>
> norm(Ap-B, 'fro') where Ap:=V'*real(D)*V, and [V D]=eig(A).
>
> That claims of course does not have any intrinsic value and it's certainly true. In a way you are doing more or less like this, you define a norm that is floating.
=========
That's a pretty strong caricature. My cost function is an actual norm of A-B. Your example on the other hand measures the distance between B and something else entirely...
A = %given matrix
1 1 0
-1 0 0
0 0 0
B1 = %Roger's solution
0.5000 0.0000 0
0.0000 0.5000 0
0 0 0
B = %alternative to Roger's solution
0.9000 0 0
-2.0000 -0.1000 0
0 0 0
Clearly both B and B1 have real eigenvalues, but one can directly verify that B achieves lower norms both Frobenius and spectral,
>> NormsB1 =[norm(A-B1,'fro'), norm(A-B1)]
NormsB1 =
1.5811 1.5000
>> NormsB = [norm(A-B,'fro'), norm(A-B)],
NormsB =
1.4213 1.1000
Ah!!! good finding.
Bruno
Roger Stafford