large condition number after solving linear systems

56 views
Skip to first unread message

Simon

unread,
Oct 13, 2022, 7:08:13 AM10/13/22
to deal.II User Group
Dear all,

I am optimising parameters which are part of the pde I am working on.

To this end, I am solving n linear systems, where n is the number of parameters:
K * sensitivity[i] = rhs[i]    i=1,...,n
K is the (symmetric) tangent matrix which I already use to solve for the solution of my pde, that is,
I just assemble the rhs vectors anew.
The result is the Jacobian matrix
J = [ sensitivity[1] ;...; sensitivity[n] )
which has as many rows as my pde has degrees of freedom.
J has dimensions 320x8, that is,
my pde has 320 dofs and I want to optimise 8 parameters.
The solution of my pde (the dofs) are in the interval [0 to 1] and the parameters are in the interval [0 to 1e5] , i.e., the entries of J are rather small per nature (1e-6 to 1e-10).
I evaluated the condition number of J in matlab: cond(J) = 1e13, which is too large given that the optimization algorithm works on J^t*J.

Currently, I solve the above n linear systems using the direct solver UMFPACK. I know that there is no such thing like a preconditioner for direct solvers.
Thus, I solved the linear systems using the iterative SolverCG with a preconditioner:
        const int solver_its = 1000
        const double tol_sol = 1e-15
        SolverControl solver_control(solver_its, tol_sol);
        GrowingVectorMemory<Vector<double> > GVM;
        SolverCG<Vector<double> > solver_CG(solver_control, GVM);
        PreconditionSSOR<> preconditioner;
        preconditioner.initialize(tangent_matrix, 1.2);
        //solve the linear systems...
However, the condition number of J is still around 1e13.

My idea was to push the condition number of J at least some order of magnitudes when applying a preconditioner to my tangent matrix.
Is this consideration reasonable?

Best
Simon

Wolfgang Bangerth

unread,
Oct 13, 2022, 12:38:01 PM10/13/22
to dea...@googlegroups.com
Hi Simon,

> To this end, I am solving n linear systems, where n is the number of
> parameters:
> K * sensitivity[i] = rhs[i]    i=1,...,n
> K is the (symmetric) tangent matrix which I already use to solve for the
> solution of my pde, that is,
> I just assemble the rhs vectors anew.
> The result is the Jacobian matrix
> J = [ sensitivity[1] ;...; sensitivity[n] )
> which has as many rows as my pde has degrees of freedom.
> J has dimensions 320x8, that is,
> my pde has 320 dofs and I want to optimise 8 parameters.
> The solution of my pde (the dofs) are in the interval [0 to 1] and the
> parameters are in the interval [0 to 1e5] , i.e., the entries of J are
> rather small per nature (1e-6 to 1e-10).
> I evaluated the condition number of J in matlab: cond(J) = 1e13, which
> is too large given that the optimization algorithm works on J^t*J.

I might be missing something, but how do you define the condition number
of a non-square matrix?

Separately, though: what is the relationship between the condition
number of J and solving linear systems with K? If I understand you right...

> Currently, I solve the above n linear systems using the direct solver
> UMFPACK. I know that there is no such thing like a preconditioner for
> direct solvers.
> Thus, I solved the linear systems using the iterative SolverCG with a
> preconditioner:
>         const int solver_its = 1000
>         const double tol_sol = 1e-15
>         SolverControl solver_control(solver_its, tol_sol);
>         GrowingVectorMemory<Vector<double> > GVM;
>         SolverCG<Vector<double> > solver_CG(solver_control, GVM);
>         PreconditionSSOR<> preconditioner;
>         preconditioner.initialize(tangent_matrix, 1.2);
>         //solve the linear systems...
> However, the condition number of J is still around 1e13.


...then this is the code you use to solve with K. K being whatever it
is, the preconditioner you use only affects *how* you solve the linear
systems with K, but it has no effect on the *solution* of the linear
systems, and so this piece of code has no influence on J: J is what it
is, and the condition number of J is what it is independent of the code
you use to *solve for J*.

> My idea was to push the condition number of J at least some order of
> magnitudes when applying a preconditioner to my tangent matrix.
> Is this consideration reasonable?

The condition number of J is determined by your choice of parameters. If
I understand things right, your J^T J is 8x8, and if it is poorly
conditioned, you need to choose the 8 parameters differently, for
example (i) scaled in a different way, and (ii) as linear combinations
of what you use right now.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Simon Wiesheier

unread,
Oct 13, 2022, 1:56:00 PM10/13/22
to dea...@googlegroups.com
Thank you for your reply!

" I might be missing something, but how do you define the condition number
of a non-square matrix? "

In the matlab documentation is stated:
cond(J) is the 2-norm condition number for inversion,
equal to the ratio of the largest singular value of J to the smallest.

" The condition number of J is determined by your choice of parameters. If
I understand things right, your J^T J is 8x8,"

Correct, J^T J is 8x8.

" and if it is poorly
conditioned, you need to choose the 8 parameters differently, for
example (i) scaled in a different way, and (ii) as linear combinations
of what you use right now. "

I double-checked that my J has full rank.
If I interpret this correctly, there are no correlation between my parameters
and what describe as your point (ii) is not neccessary in my case.

I definitely have to scale my paramters somehow.
Do you have a recommendation how to scale the parameters?


Best
Simon


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/5ShTkx3dMCU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/998f5bc1-e755-282c-7eb6-504249f669b9%40colostate.edu.

Wolfgang Bangerth

unread,
Oct 13, 2022, 4:03:15 PM10/13/22
to dea...@googlegroups.com
On 10/13/22 11:55, Simon Wiesheier wrote:
>
> " and if it is poorly
> conditioned, you need to choose the 8 parameters differently, for
> example (i) scaled in a different way, and (ii) as linear combinations
> of what you use right now. "
>
> I double-checked that my J has full rank.
> If I interpret this correctly, there are no correlation between my
> parameters
> and what describe as your point (ii) is not neccessary in my case.

No, that is definitely not true. The correlations are expressed by the
eigenvectors associated with the eigenvalues. J^T J is symmetric and
positive definite, so all of the eigenvectors are mutually orthogonal.
But they will generally not lie along the coordinate axes, and so
express correlations between the parameters.


> I definitely have to scale my paramters somehow.
> Do you have a recommendation how to scale the parameters?

You would generally choose the (normalizes) eigenvectors of the J^T J
you have as the unit directions and the new set of parameters are then
multipliers for these directions. (So far your parameters are
multipliers for the unit vectors e_i in your 8-dimensional parameter
space.) The matrix J^T J using this basis for your parameter space will
then already be diagonal, though still poorly conditioned. But if you
scale the eigenvectors by something like the square root of the
eigenvalues, you'll get J^T J to be the identity matrix.

Of course, all of this requires computing the eigenvalues and
eigenvectors of the current J^T J once.

Simon Wiesheier

unread,
Oct 14, 2022, 3:25:14 AM10/14/22
to dea...@googlegroups.com
" No, that is definitely not true. The correlations are expressed by the
eigenvectors associated with the eigenvalues. J^T J is symmetric and
positive definite, so all of the eigenvectors are mutually orthogonal.
But they will generally not lie along the coordinate axes, and so
express correlations between the parameters."

I see. My point pertains to the following:
If J has full rank, this means that none of the 8 columns is linearly dependent
on another column or, equivalently, no parameter is linearly dependent on another parameter.

" You would generally choose the (normalizes) eigenvectors of the J^T J
you have as the unit directions and the new set of parameters are then
multipliers for these directions. "

Is it reasonable to choose \sum_{i=1}^8 ev_i (ev_i are the normalized eigenvectors)
as the new set of parameters?

" Of course, all of this requires computing the eigenvalues and
eigenvectors of the current J^T J once. "

If I get the gist correctly, I first run my dealii program to compute J like I do it right now,
thence compute the eigenvectors, and based on them the new set of parameters.
Finally, I run my dealii program again to compute the new J with the new parameters.
Correct?
I did not read about such a scaling technique in the context of parameter estimation so far.
Do you have any reference where the procedure is described?

I have to check whether the above scaling works for my problem or not:
For instance, if there are three old parameters {0, 20,000, 300,000} and your proposed method
produces the new set of parameters {5,000, 40,000, 90,000},
it may happen that my pde solver does not converge anymore.
Also, the old ranking (0 < 20,000 < 300,000) has to be retained during scaling.


Best
Simon

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/779dc304-26df-1733-83a0-e171ecb524b1%40colostate.edu.

Wolfgang Bangerth

unread,
Oct 14, 2022, 4:27:29 PM10/14/22
to dea...@googlegroups.com

Simon:

> If J has full rank, this means that none of the 8 columns is linearly
> dependent
> on another column or, equivalently, no parameter is linearly dependent
> on another parameter.

Yes.


> " You would generally choose the (normalizes) eigenvectors of the J^T J
> you have as the unit directions and the new set of parameters are then
> multipliers for these directions. "
>
> Is it reasonable to choose \sum_{i=1}^8 ev_i(ev_i are the normalized
> eigenvectors)
> as the new set of parameters?

The sum you show is only one vector. You need 8 basis vectors.

Think of it this way: You have 8 parameters, which I'm going to call
a,b,c,d,e,f,g,h. The parameter space is 8-dimensional, and the solution
of the parameter estimation problem is a point in this parameter space
which I'm going to call
x^*
and you choose to represent this point in 8-space via
x^* = a^* e_a + b^* e_b + ...
where
e_a=(1,0,0,0,0,0,0,0)
e_b=(0,1,0,0,0,0,0,0)
etc.

But you could have chosen to represent x^* as well via
x^* = A^* E_1 + B^* E_2 + ...
where
E_1 = \sqrt{lambda_1} ev_1
E_2 = \sqrt{lambda_2} ev_2
etc
(or some such) and then you are seeking the coefficient A,B,C, ...


> " Of course, all of this requires computing the eigenvalues and
> eigenvectors of the current J^T J once."
>
> If I get the gist correctly, I first run my dealii program to compute J
> like I do it right now,
> thence compute the eigenvectors, and based on them the new set of
> parameters.
> Finally, I run my dealii program again to compute the new J with the new
> parameters.
> Correct?

Yes, something like this.


> I did not read about such a scaling technique in the context of
> parameter estimation so far.
> Do you have any reference where the procedure is described?

Sorry, I don't. But I have a test problem for you: Try to minimize the
function
f(x,y) = 10000*(x-y)^2 + (x+y)^2
It has a unique minimum, but the x,y-values along the valley of this
function are highly correlated and so a steepest descent method will be
very ill-conditioned. The eigenvectors of the Hessian of this function
give you the uncorrelated coordinate directions:
ev_1 = 1/sqrt(2) [1 1]
ev_2 = 1/sqrt(2) [1 -1]
If you introduce variables
A = (x-y)*sqrt(10000)
B = (x+y)*sqrt(1)
your objective function will become
f(A,B) = A^2 + B^2
which is much easier to minimize. Once you have the solution A^*, B^*,
you can back out to x^*, y^*.

Simon Wiesheier

unread,
Oct 15, 2022, 5:15:15 AM10/15/22
to dea...@googlegroups.com
Thanks for taking the time to come back to this!
I will give it a try to implement your suggested scaling.

" The sum you show is only one vector. You need 8 basis vectors.

Think of it this way: You have 8 parameters, which I'm going to call
a,b,c,d,e,f,g,h. The parameter space is 8-dimensional, and the solution
of the parameter estimation problem is a point in this parameter space
which I'm going to call
   x^*
and you choose to represent this point in 8-space via
   x^* = a^* e_a + b^* e_b + ...
where
   e_a=(1,0,0,0,0,0,0,0)
   e_b=(0,1,0,0,0,0,0,0)
   etc.

But you could have chosen to represent x^* as well via
   x^* = A^* E_1 + B^* E_2 + ...
where
   E_1 = \sqrt{lambda_1} ev_1
   E_2 = \sqrt{lambda_2} ev_2
   etc
(or some such) and then you are seeking the coefficient A,B,C, ... "

This makes sense.
So, given the scaled eigenvectors E_1,...,E_8, how can I find the coefficients A^*,...,H^* ?
Is it just a matrix multiplication
P* =  (E_1; ... ; E_8) \times p* ,
where P* = (A^*,...,H^*) are the new parameters and p* = (a^*,...,h^*) are the old parameters?

Assuming that my pde solver still converges for the new parameters, the overall procedure would be as follows:
1. run dealii program to compute J with old parameters p*
2. compute the new basis (EV_i) and the new parameters P*
3. run dealii program to compute the new J with the new parameters P*
4. compute p* =  (E_1; ... ; E_8)^-1  \times P*
Repeat 1-4 for all iterations of the optimsation algorithm (Levenberg-Marquardt).
Is that correct?

At the end, the ensuing parameters have to be the same, no matter
wheter I use the above scaling or not. The sole difference is that
the scaled version improves (amongs others) the condition number of J and may lead to
a better convergence of the optimsation algorithm, right?


Best
Simon



--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/5ShTkx3dMCU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/4efd314a-f3e2-0345-ea83-7b81bf3e96b9%40colostate.edu.

Wolfgang Bangerth

unread,
Oct 17, 2022, 12:14:44 AM10/17/22
to dea...@googlegroups.com
On 10/15/22 03:15, Simon Wiesheier wrote:
>
> This makes sense.
> So, given the scaled eigenvectors E_1,...,E_8, how can I find the coefficients
> A^*,...,H^* ?
> Is it just a matrix multiplication
> P* =  (E_1; ... ; E_8) \times p* ,
> where P* = (A^*,...,H^*) are the new parameters and p* = (a^*,...,h^*) are the
> old parameters?

Something of the sort. It's the same point in 8-space, you're just expressing
it with regard to different bases.


> Assuming that my pde solver still converges for the new parameters, the
> overall procedure would be as follows:
> 1. run dealii program to compute J with old parameters p*
> 2. compute the new basis (EV_i) and the new parameters P*
> 3. run dealii program to compute the new J with the new parameters P*
> 4. compute p* =  (E_1; ... ; E_8)^-1  \times P*
> Repeat 1-4 for all iterations of the optimsation algorithm (Levenberg-Marquardt).
> Is that correct?

Conceptually, this is correct. In practice, it may not be necessary to
actually do it that way: All you're looking for is a well-conditioned basis.
You don't need the exact vectors E_i. In some applications you can guess such
vectors (like in the model case I drew up), in others you compute them once
for one case and re-use for other cases where maybe they are not exactly the
eigenvalues of the matrix J^T J. Or you live with the ill-conditionedness.


> At the end, the ensuing parameters have to be the same, no matter
> wheter I use the above scaling or not. The sole difference is that
> the scaled version improves (amongs others) the condition number of J and may
> lead to
> a better convergence of the optimsation algorithm, right?
Yes.

Simon Wiesheier

unread,
Oct 17, 2022, 2:44:47 AM10/17/22
to dea...@googlegroups.com
One last thing:

" Or you live with the ill-conditionedness. "

I plan to work with the correlation matrix (related to the inverse of J^t J).
If J has condition number about 1e13, do you think I can work on J^t J
without getting into numerical problems?

Best
Simon


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/1e45e679-0528-d64d-e922-0dc6fdaeb471%40colostate.edu.

Wolfgang Bangerth

unread,
Oct 17, 2022, 9:47:38 AM10/17/22
to dea...@googlegroups.com
On 10/17/22 00:44, Simon Wiesheier wrote:
>
> I plan to work with the correlation matrix (related to the inverse of J^t J).
> If J has condition number about 1e13, do you think I can work on J^t J
> without getting into numerical problems?

It always depends on what you want to do with it. If your goal is to show that
the problem has a few large and a few small singular vectors, you don't need
to know them accurately. If you want to multiply with the matrix, the
ill-conditioning is likely also not going to hurt very much. If you want to
solve a linear system with it, that might be a different story.

Simon Wiesheier

unread,
Oct 17, 2022, 10:12:39 AM10/17/22
to dea...@googlegroups.com
" It always depends on what you want to do with it. If your goal is to show that
the problem has a few large and a few small singular vectors, you don't need
to know them accurately. If you want to multiply with the matrix, the
ill-conditioning is likely also not going to hurt very much. If you want to
solve a linear system with it, that might be a different story. "

I do not plan to do anything else than computing (J^t J)^-1
and some matrix multiplications with it.
Based on what you said, this should not cause any undue problems, right?

Best
Simon

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.

Wolfgang Bangerth

unread,
Oct 17, 2022, 10:20:26 AM10/17/22
to dea...@googlegroups.com
On 10/17/22 08:12, Simon Wiesheier wrote:
>
> I do not plan to do anything else than computing (J^t J)^-1
> and some matrix multiplications with it.
> Based on what you said, this should not cause any undue problems, right?
>

Probably not.

Simon Wiesheier

unread,
Oct 24, 2022, 12:12:17 PM10/24/22
to dea...@googlegroups.com
Sorry that I come back to this again, but I would appreciate your feedback to the issue I am confronted with:

" You would generally choose the (normalizes) eigenvectors of the J^T J
you have as the unit directions and the new set of parameters are then
multipliers for these directions. (So far your parameters are
multipliers for the unit vectors e_i in your 8-dimensional parameter
space.) The matrix J^T J using this basis for your parameter space will
then already be diagonal, though still poorly conditioned. "

I tried to implement this for only 3 parameters and the 3x3 Hessian evaluated at the start vector reads
H = J^T J = [4.91e-12    1.07e-12   -5.98e-12
                                       6.50e-13   -1.72e-12
                     sym                             7.70e-12 ]
The eigenvectors are
EV_1 = [0.58; 0.58; 0.58]
EV_2 = [0.54; -0.80; 0.26]
EV_3 = [-0.61; -0.16; 0.77]
with associated eigenvalues {-8.72e-28, 4.80e-13; 1.28e-11}.
Clearly, I can not scale the eigenvectors by 1/sqrt(eigenvalue) as the eigenvalues are numerically negative or very small.
Thus, I transformed the
old parameters [500000;0;500000]
using the transformation matrix [EV_1; EV_2; EV_3] and
obtained the
new parameters [-16778; 206381; 675972].
The set of new parameters is inadmissible in my case as the first and last parameter have to be greater than the second parameter. 
There is probably no straightforward way to incorporate this constraint in the transformation, is it?

But anyway, it seems that I have to scale my parameters already before I compute the J^T J.
The best alternative to linearly scale the parameters p_i is to normalize them according to
-1 < p_i < +1 .
Right?

Best,
Simon

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/5ShTkx3dMCU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/bf533083-5b0f-1627-9582-c73809f0c614%40colostate.edu.
Reply all
Reply to author
Forward
0 new messages