Conjugate Gradient for Schur complement, serial vs parallel discrepancy in solution and effect of tolerance.

Dimitris Ntogkas

unread,

Nov 27, 2017, 3:31:07 PM11/27/17

to deal.II User Group

Dear all,

I have a question with regards to the behavior of the Conjugate Gradient method in serial and parallel. I am using version 8.5 of dealii and I have a parallel implementation based on Trilinos.

The system matrix is in a block format and sparse, with blocks A, B, B^T and 0. The right hand side has two blocks, f_0 and f_1 = 0. I am using a Schur complement similar to step 20 but in parallel to solve the system and I am facing an issue with the first step of the solve routine, where I use the conjugate gradient to solve for y_1. At this point I am not using any preconditioning.

I have exported and converted the matrices and vectors in appropriate format, so that I am able to work with them in Matlab too. When I compare the system matrix created serially and the one created in parallel (say mpirun -n2), their maximum difference in absolute value is of order 1e-11. The right hand sides created serially and in parallel are identical. However, the solution of the system with tolerance for CG 1e-8, has a maximum difference of order 1e-4. However, for this particular calculation the condition number of the Schur complement is of order 1e+5 (calculated in Matlab). Moreover, when I use Matlab to do the Schur solve with CG for those matrices and the same tolerance, the resulting solutions differ by an order of 1e-12.

The above discrepancy in the solution reduces by two orders if I make the tolerance for CG to be smaller, i.e. of order 1e-11, for both the serial and the parallel execution.

My question is why for this difference in the matrices and this condition number do I see such a difference in the solution? Could this be related to how CG is implemented in parallel and how the tolerance is guaranteed in parallel vs serially?

Thanks,

Dimitris

Bruno Turcksin

unread,

Nov 27, 2017, 4:24:16 PM11/27/17

to deal.II User Group

Dimitris,

The same implementation of CG will give you (slightly) different results in serial an parallel because the round-off errors will be different. This round-off errors will be amplified if you have a large condition number (see https://en.wikipedia.org/wiki/Condition_number). So if you precondition your system and the condition number decreases you can expect better results. This explains why there is a difference between the serial and the parallel run. Now about the maximum value of the change. I think what you are doing is wrong. You are looking at the maximum difference, i.e., at the L infinity norm but the tolerance is computed in the L2 norm. A tolerance of 1e-8 in the L2 norm does not mean that you will also get a tolerance of 1e-8 in the L infinity norm.

Best,

Bruno

Dimitris Ntogkas

unread,

Nov 27, 2017, 5:10:27 PM11/27/17

to deal.II User Group

Hi Bruno,

Thanks for your quick response! You are right about the l2 vs max norm. However, the error is 1e-4 in the l2 norm too. Just a clarification to make sure I understand your response. I was indeed thinking of the condition number, that's why I checked it, but in my case the 1e-11 should lose up to 5 more digits, which is still better than 1e-4. However, probably your point is that since I was using cg with tolerance of 1e-8, this is already a loss of accuracy that I did not take into account in the above calculation. Is this correct?

Thanks again,

Dimitris

Bruno Turcksin

unread,

Nov 27, 2017, 6:57:31 PM11/27/17

to dea...@googlegroups.com

Dimitris,

2017-11-27 17:10 GMT-05:00 Dimitris Ntogkas <di...@math.umd.edu>:

Thanks for your quick response! You are right about the l2 vs max norm. However, the error is 1e-4 in the l2 norm too. Just a clarification to make sure I understand your response. I was indeed thinking of the condition number, that's why I checked it, but in my case the 1e-11 should lose up to 5 more digits, which is still better than 1e-4. However, probably your point is that since I was using cg with tolerance of 1e-8, this is already a loss of accuracy that I did not take into account in the above calculation. Is this correct?

Here is what I am thinking. The tolerance is 1e-8 so if the residual of serial is 1e-9 and the residual of parallel is 3e-9, they both satisfy the tolerance. Because the condition number is 1e5, a difference of 1e-9 in the residual will give you a difference of 1e4 in the error (which is what you have). So now the question is: does a difference of 1e-11 in the matrix plus round-off errors can create a difference of 1e-9 in the residual? This sounds possible but you have to check by looking what is going on at each step of the algorithm.

Best,

Bruno

Dimitris Ntogkas

unread,

Nov 28, 2017, 6:12:02 PM11/28/17

to deal.II User Group

Bruno,

Yeah, this makes sense. Numerically that's what I observe too, some difference in the residual of order 1e-9 that then propagates to the solution. In the meantime I also figured out where the difference in the matrices was coming from, so now they are identical. This improved the overall behavior of the method, although the difference in the residual is still present. Thanks again for your help!

Reply all

Reply to author

Forward