Reusing a preconditioner when updating the matrix and RHS.

C B

unread,

Jan 12, 2021, 6:14:42 PM1/12/21

to Denis Demidov, amgcl

Denis,

First of I wanted to express my gratitude to you because (IMHO) AMGCL is such a great tool!

Coming back to reusing the preconditioner, you provided us this example

https://gist.github.com/ddemidov/6b645268ead4d0cb8b6239b1406e22e4

which I tried and it works very well when only the RHS changes.

Now when I tried changing a little the matrix (while keeping the matrix rank constant),

I realized (I think) that this example is not updating the actual matrix A needed to compute the new residual |A x - f| (while iterating with the old preconditioner).

Please, could you tell me if this is the case, and if so how can we update the matrix while keeping the old preconditioner?

Thanks in advance!

C B

unread,

Jan 12, 2021, 9:36:13 PM1/12/21

to Denis Demidov, amgcl

Denis,

Reviewing your first email on this subject I realized that I would have to modify the call to the solver something like this:

//std::tie(iters, error) = (*solve)(rhs, x);
std::tie(iters, error) = (*solve)( std::tie(nsize, ptr, col, valmod), rhs, x);

And it works, but unfortunately I found a relatively small change to the matrix slows down the converge rate a lot, to the point where there is almost no gain.

You had mentioned this ...

I am attaching the source code of your example, where as a modification I just scaled the diagonal of the matrix, a change that I thought would be very benign

because it would make the matrix more diagonal dominant.

Please let me know what you think, if there is anything that could be done to speed up convergence.

Thanks,

amgcl_time_dependent_reuse_preconditioner.cxx

Denis Demidov

unread,

Jan 13, 2021, 12:06:30 AM1/13/21

to amgcl

Hi Carl,

A couple of problems with the code:

First, you are updating valmod, but are using the unmodified val to setup the solver.

Second, the matrix update and the solver rebuild are in the wrong order (not that it matters in view of the first point):

You rebuild the solver for the matrix from the previous iteration, which means that even immediately after the rebuild, the preconditioner is setup for the (slightly) wrong system.

I have fixed these and updated the example so that it outputs the timings for individual setups and solves, and allows to control the system size and the iteration limit from the command line:

https://gist.github.com/ddemidov/83a84a61a6c8c462ac95c62d22745947

Here is the output for the 64^3 system, with iter_limit set to 10:

./amgcl_time_dependent_reuse_preconditioner 64 10

0: 6 (6.59412e-07)

1: 33 (6.47397e-07)

2: 3 (1.61887e-07)

3: 4 (4.7899e-07)

4: 5 (5.0403e-07)

[Profile: 2.272 s] (100.00%)

[ self: 0.014 s] ( 0.60%)

[ assemble: 0.012 s] ( 0.52%)

[ setup: 0.637 s] ( 28.02%)

[ 0: 0.480 s] ( 21.12%)

[ 2: 0.157 s] ( 6.90%)

[ solve: 1.610 s] ( 70.86%)

[ 0: 0.174 s] ( 7.65%)

[ 1: 1.077 s] ( 47.38%)

[ 2: 0.102 s] ( 4.48%)

[ 3: 0.116 s] ( 5.10%)

[ 4: 0.142 s] ( 6.25%)

The number of iterations jumps after the first matrix update, and on the next iteration (step 2) the solver is rebuilt.

After that, the same solver is used for the rest of the iterations and seems to work ok.

We can see if we gained anything by reducing the iteration limit, which makes the solver to be rebuilt on each step:

./amgcl_time_dependent_reuse_preconditioner 64 1

0: 6 (6.59412e-07)

1: 4 (2.25966e-07)

2: 3 (1.61887e-07)

3: 3 (1.24298e-07)

4: 2 (2.67272e-07)

[Profile: 1.641 s] (100.00%)

[ self: 0.013 s] ( 0.80%)

[ assemble: 0.013 s] ( 0.78%)

[ setup: 1.070 s] ( 65.18%)

[ 0: 0.476 s] ( 29.02%)

[ 1: 0.218 s] ( 13.31%)

[ 2: 0.153 s] ( 9.34%)

[ 3: 0.116 s] ( 7.05%)

[ 4: 0.106 s] ( 6.46%)

[ solve: 0.546 s] ( 33.24%)

[ 0: 0.178 s] ( 10.83%)

[ 1: 0.108 s] ( 6.58%)

[ 2: 0.105 s] ( 6.38%)

[ 3: 0.090 s] ( 5.48%)

[ 4: 0.065 s] ( 3.98%)

Does not seem like it. The setup + solve without reusing the preconditioner costs 1.070s + 0.546s = 1.616s, and with reusing it is 0.637s + 1.610s = 2.247s. However, the setup time was reduced, and the majority of the time in solve was spent on step 1, after the first update of the matrix. This potentially is not as important with a longer run (more time steps). We can also try to tune the solution by adjusting the iteration limit, so that the solver is rebuilt after the step 0:

./amgcl_time_dependent_reuse_preconditioner 64 5

0: 6 (6.59412e-07)

1: 4 (2.25966e-07)

2: 6 (2.14156e-07)

3: 3 (1.24298e-07)

4: 3 (1.73928e-07)

[Profile: 1.519 s] (100.00%)

[ self: 0.013 s] ( 0.85%)

[ assemble: 0.012 s] ( 0.79%)

[ setup: 0.827 s] ( 54.43%)

[ 0: 0.490 s] ( 32.25%)

[ 1: 0.227 s] ( 14.93%)

[ 3: 0.110 s] ( 7.25%)

[ solve: 0.667 s] ( 43.94%)

[ 0: 0.193 s] ( 12.70%)

[ 1: 0.118 s] ( 7.78%)

[ 2: 0.180 s] ( 11.87%)

[ 3: 0.079 s] ( 5.19%)

[ 4: 0.097 s] ( 6.40%)

The total time now is 0.827s + 0.667s = 1.494s which is slightly better than the rebuild-every-step variant. I'd say this needs more testing with a more realistic example.

C B

unread,

Jan 13, 2021, 6:09:40 PM1/13/21

to Denis Demidov, amgcl

Hi Denis,

Thank you so much for taking the time to review this.

You are absolutely right on your comments about the code sequence for a real transient / matrix changing implementation.

My intention with this code snippet was just to factorize the matrix once, and try to see if the associated precond was ok to get solutions with slightly different matrices thereafter.

I think that the first matrix is important because it has the main physics of conservation, for internal nodes the sum of the entries on a row/col is zero...

In your demonstration it is clear that the precond from this matrix is not good enough because it takes 33 iters to converge when the first matrix perturbation is used.

When the second precond is created with the new perturbed matrix, then convergence is good, but this new precond is based on the "more diagonally dominant"

matrix which in this case represents the following "perturbations", but in general we won't know the nature of the perturbations, so the starting matrix is all that we know.

In my real application the size of the system increases, so the perturbation is the existence of new rows/cols associated with the additional dofs (~ 0.02 percent of the total dofs per iteration)

In this case to create a precond that could be used, I extend the matrices with a "unit" diagonal representing the "extra" dofs that could be used latert on if the precond was good enough to converge.

But I am finding out that the precond does not lead to a quick convergence, similarly to what we observed in your example with the original matrix and just perturbing the diagonal.

I guess I may be asking too much when expecting that the AMG precond also be good for "extended" matrices.

Using shared memory the AMGCL code is much faster than the PETSc approach that I was using for large systems,

I see that AMGCL scales almost linearly with the # dofs, whereas PETSc is more exponential.

But because PETSc has a non-linear solver that offers a good speedup when the system only changes a little, I thought that perhaps AMGCL could be used in a similar fashion.

NOTE: also because in my case just using an L2 relative error of ~ 0.01 is enough, just 1 or 2 iters of AMGCL are enough most of the time, I thought that reusing the precond could lead to a 4-5 fold gain.

I have sequences of ~ 20 matrices/rhs that I saved from PETSc, some sequences are small ~ 10k dofs, others large like 5mm dofs, and a small code to read them and get results with AMGCL if you would like to review them, if you think that perhaps there is an algorithm that could work for transient non-linear cases.

Please let me know If you can think of any other approach to handle matrix changes that only affect a very small fraction of the matrix fill/coeffs.

I thought perhaps moving the delta matrix to the RHS would work:

(A+dA) x = f => A x(i) = (f - dA x(i-1)) and iterate, but it is unstable.

Thanks again for your time!

--
You received this message because you are subscribed to the Google Groups "amgcl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amgcl+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/ebdeab35-8add-4c82-9ec3-6085f4238d8an%40googlegroups.com.

Denis Demidov

unread,

Jan 13, 2021, 11:27:55 PM1/13/21

to amgcl

On Thursday, January 14, 2021 at 2:09:40 AM UTC+3 cebau...@gmail.com wrote:

Hi Denis,
Thank you so much for taking the time to review this.

You are absolutely right on your comments about the code sequence for a real transient / matrix changing implementation.
My intention with this code snippet was just to factorize the matrix once, and try to see if the associated precond was ok to get solutions with slightly different matrices thereafter.
I think that the first matrix is important because it has the main physics of conservation, for internal nodes the sum of the entries on a row/col is zero...
In your demonstration it is clear that the precond from this matrix is not good enough because it takes 33 iters to converge when the first matrix perturbation is used.
When the second precond is created with the new perturbed matrix, then convergence is good, but this new precond is based on the "more diagonally dominant"
matrix which in this case represents the following "perturbations", but in general we won't know the nature of the perturbations, so the starting matrix is all that we know.

In my real application the size of the system increases, so the perturbation is the existence of new rows/cols associated with the additional dofs (~ 0.02 percent of the total dofs per iteration)
In this case to create a precond that could be used, I extend the matrices with a "unit" diagonal representing the "extra" dofs that could be used latert on if the precond was good enough to converge.
But I am finding out that the precond does not lead to a quick convergence, similarly to what we observed in your example with the original matrix and just perturbing the diagonal.

I guess I may be asking too much when expecting that the AMG precond also be good for "extended" matrices.
Using shared memory the AMGCL code is much faster than the PETSc approach that I was using for large systems,
I see that AMGCL scales almost linearly with the # dofs, whereas PETSc is more exponential.
But because PETSc has a non-linear solver that offers a good speedup when the system only changes a little, I thought that perhaps AMGCL could be used in a similar fashion.

NOTE: also because in my case just using an L2 relative error of ~ 0.01 is enough, just 1 or 2 iters of AMGCL are enough most of the time, I thought that reusing the precond could lead to a 4-5 fold gain.

Maybe you need to focus on reducing the preconditioner cost then. For example, using a simple single-level preconditioner like damped Jacobi or spai0?

I have sequences of ~ 20 matrices/rhs that I saved from PETSc, some sequences are small ~ 10k dofs, others large like 5mm dofs, and a small code to read them and get results with AMGCL if you would like to review them, if you think that perhaps there is an algorithm that could work for transient non-linear cases.

Please share those. Something like https://wetransfer.com/ works well for large files.

Please let me know If you can think of any other approach to handle matrix changes that only affect a very small fraction of the matrix fill/coeffs.
I thought perhaps moving the delta matrix to the RHS would work:
(A+dA) x = f => A x(i) = (f - dA x(i-1)) and iterate, but it is unstable.

I don't have a lot of ideas. Maybe using a Schur pressure correction solver (FIELDSPLIT in PETSc terms) to separate the initial part of the matrix and the small growing sublock (initially diagonal).

If you look at https://amgcl.readthedocs.io/en/latest/tutorial/Stokes.html, then the initial matrix is "Pressure" and the small changing block is "Flow".

C B

unread,

Jan 14, 2021, 8:53:29 PM1/14/21

to Denis Demidov, amgcl

Hi Denis,

Thank you very much for your tips/recommendations.
I am taking up your offer and I am sending you 6 sets of file sequences.

Each one of the links below points to a 7zip file with a sequence of N binary
file sets exported from PETSc, each set has matrix, rhs, and approximate
PETSc solution obtained with a large tolerance. The solution is just
for your reference in case you want to compute the error.

For each of the system sets, the initial guess is always a zero vector
because I am solving (A * dx) = (f - A * x) = rhs, xnew = x + dx

The goal is to solve these systems with prm.solver.tol = 0.01 - 0.05,
now I am using 0.01, but I think that I may be able to work with 0.05.

Please see my note about scaling these systems below.

The last number in the zip filename is the ~ size of the systems included
in the file. In each 7zip directory file there is also a text file named
neqns.txt which lists the sequence of system ranks, although this file
may have more numbers than the number of sets included in the 7zip.
In any case, you are not going to rely on those number for reading,
I just put this file to give you an idea of the increasing matrix ranks,
only in the smallest set I noticed that the ranks can remain constant
for a few iterations, but this is rare.

7zip file: ranks ~14000, 39 sets of matrix mat_bin_0##.dat,
rhs_bin_0##.dat, and PETSc solution sol_bin_0##.dat obtained with a
large tolerance:
https://drive.google.com/file/d/1wPtrzCWuiZpWyra8SbVcxAlPp5Caz0AP

7zip file: ranks ~16000, 39 sets, 19 MB
https://drive.google.com/file/d/1QZItE0xI3aWttU0Wgu__Ru5Fc7JCwyBI

7zip file: ranks ~23000, 39 sets, 34 MB
https://drive.google.com/file/d/1yrTK6RAKvQAv99-XJtHyZ_AtcPF-mR9s

7zip file: ranks ~634000, 19 sets, 280 MB
https://drive.google.com/file/d/13kJLc3o5pVyYMiqBC4VvSl2KbCKjrnwc

7zip file: ranks ~758000, 19 sets, 340 MB
https://drive.google.com/file/d/1GNI-UePomUmGNI3rfl5rSx2W4-VJGaxL

7zip file: ranks ~5550000, 6 sets, 1200 MB
https://drive.google.com/file/d/1SJcdlibIJ-eduZti7FJbw597YKgPRct8

Because I am using a large penalty value to impose Dirichlet bcs,

after reading the PETSc files, and before solving the system with AMGCL,

I am scaling each row and the corresponding rhs entry with the value of the diagonal,

ending up with a unity diagonal in each row. (BTW, each row has at most 7 entries).

This scaling on rows renders the originally symmetric matrix non-symmetric,

but AMGCL does not assume symmetry, so it works like a charm.

I have the source code to read these PETSc binary files, and the
associated code the solve the systems with AMGCL, please let me know
if such code snippets would be of any help to you, and I will write
some documentation and send them to you.

I hope I have not missed any important details, please let me know if you have any questions.
Looking forward to hearing your expert advice on how to solve these systems minimizing wall time.

Thanks again for your advice!

Cheers!

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/ba2af3b2-4f46-4d8c-bf5b-12c45d8f6d96n%40googlegroups.com.

Denis Demidov

unread,

Jan 15, 2021, 9:07:21 AM1/15/21

to amgcl

Carl,

I don't have experience with PETSc matrices, so yes, please send the format parsing code.

On Friday, January 15, 2021 at 4:53:29 AM UTC+3 cebau...@gmail.com wrote:

Hi Denis,

C B

unread,

Jan 16, 2021, 3:54:20 AM1/16/21

to Denis Demidov, amgcl

Hi Denis,

I created a binary matrix following your code mm2bin.cpp,

and for the rhs I just guessed it would be something like:

precondition(io::write(fr, nrows), "File I/O error.");
precondition(io::write(fr, rhs), "File I/O error.");

The attached 7zip file has one mat/rhs set to verify if the format is correct.

Please, would you test:

./solver --binary --matrix mat.bin --rhs rhs.bin -p solver.tol=1e-2

The solution should be ~ like this:

x[0]=55.324976
x[1]=55.203986
x[2]=52.842242
x[3]=46.848367
.................................
x[2875]=-0.300335
x[2876]=-0.639552
x[2877]=-0.639473

I wanted to test it myself, but my compiled version of "solver" has an exception whenever I try to use external matrices,

even with the example from the tutorial

./solver -A poisson3Db.mtx -f poisson3Db_b.mtx precond.relax.type=chebyshev

Similarly mm2bin, etc, but the funny part is that the other AMGCL code that I also compiled with VS works like a charm !

I am attaching the code, in case you want to see the PETSc binary format, but it has many windows dependencies.

Thanks in advance!

--
You received this message because you are subscribed to the Google Groups "amgcl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amgcl+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/4d57408b-2fca-4fe3-b282-4f1cf3b1f62en%40googlegroups.com.

mat_rhs_binary.7z

petscbin2bin.7z

Denis Demidov

unread,

Jan 16, 2021, 4:59:33 AM1/16/21

to amgcl

On Saturday, January 16, 2021 at 11:54:20 AM UTC+3 cebau...@gmail.com wrote:

Hi Denis,

I created a binary matrix following your code mm2bin.cpp,
and for the rhs I just guessed it would be something like:
precondition(io::write(fr, nrows), "File I/O error.");

There should be

precondition(io::write(fr, ncols), "File I/O error.");

here, with ncols = 1.

precondition(io::write(fr, rhs), "File I/O error.");

The attached 7zip file has one mat/rhs set to verify if the format is correct.
Please, would you test:
./solver --binary --matrix mat.bin --rhs rhs.bin -p solver.tol=1e-2

I am getting an unhandled exception from this as well. It looks like the ptr array is written out using 32bit precision integers, instead of 64bit ones (the bin format used in amgcl is not platform-independent). I am correctly reading the size of the matrix to be 2878, but ptr[0], ptr[1] etc contain garbage like 21474836480, 42949672960, ...

It looks like you are compiling your code in 32bit mode.

The solution should be ~ like this:
x[0]=55.324976
x[1]=55.203986
x[2]=52.842242
x[3]=46.848367
.................................
x[2875]=-0.300335
x[2876]=-0.639552
x[2877]=-0.639473

I wanted to test it myself, but my compiled version of "solver" has an exception whenever I try to use external matrices,
even with the example from the tutorial
./solver -A poisson3Db.mtx -f poisson3Db_b.mtx precond.relax.type=chebyshev

In this specific case, there should not be any problems with integer type size (the matrices are in text format), so I am not sure what could happen here. By the way, if you have docker installed on your system, you could also run the amgcl examples with

$ docker run -ti -v $PWD:/data amgcl/examples solver -A poisson3Db.mtx -f poisson3Db_b.mtx

Similarly mm2bin, etc, but the funny part is that the other AMGCL code that I also compiled with VS works like a charm !
I am attaching the code, in case you want to see the PETSc binary format, but it has many windows dependencies.

You could also write the matrices into matrix market format which should be platform-independent. You can do it with

amgcl::io::mm_write("A.mtx", std::tie(n, ptr, col, val));

for the system matrix, and

amgcl::io::mm_write("b.mtx", rhs.data(), n, 1);

for the RHS. See the definition for these here:
https://github.com/ddemidov/amgcl/blob/master/amgcl/io/mm.hpp

C B

unread,

Jan 16, 2021, 7:56:34 PM1/16/21

to Denis Demidov, amgcl

Denis,

Thank you very much for your suggestions and pointers!

I created a VM and I was able to build everything on Centos, quite easily ! :)

Then I went back and tried again on Windows, the "solver" tool works well with the fixed/generated AMGCL .bin files,

but not with the MTX files...an exception occurs.

However, I am mainly interested in testing/working with AMGCL bin files, so I am happy !

With your info on how to build AMGCL's binary RHS, now everything works well.

I tested the generated AMGCL binary files with "solver", comparing the first/last few entries of the solution and everything is fine :).

I did not create MTX files because on Windows VS refused to compile this line, the template could not be resolved...:

amgcl::io::mm_write("b.mtx", rhs.data(), n, 1);

however this line compiles fine:

amgcl::io::mm_write("b.mtx", rhs.data(), nrows, 1);

Here are the sets of matrices/rhss that I would like to solve with a tolerance ~ 0.01 with the least Wall time,

always starting from an initial X=0.

Rank ~ 14000, 39 sets, 7 MB

https://drive.google.com/file/d/1X012yjDv4gsi_pFizYIfzHdUMdSVo5be/view?usp=sharing

Rank ~ 16000 , 39 sets , 17 MB

https://drive.google.com/file/d/1Ou9Qfp_qWwlaszc1aBuObhJoXtNu1S6E/view?usp=sharing

Rank ~ 23000 , 39 sets , 33 MB

https://drive.google.com/file/d/10qNaKuOCNOi6RWKaFYuzjRRJ_1xAHUjO/view?usp=sharing

Rank ~ 634000 , 19 sets , 210 MB

https://drive.google.com/file/d/11asMSZrPyYgD_8-g4I0LwrcZRwcVhozE/view?usp=sharing

Rank ~ 758000 , 19 sets , 260 MB

https://drive.google.com/file/d/1aON6-JweUMv6Al25ZRjz57moEmkfPVIb/view?usp=sharing

Rank ~ 5550000 , 6 sets , 1100 MB

https://drive.google.com/file/d/1u84wppnZDULkFoNlaRoyZcvuCUggkI_Z/view?usp=sharing

Please let me know if you find any problems with these.

Thanks again !

Cheers.

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/b48e1344-9eca-4db9-928e-9e7eb8c4bc92n%40googlegroups.com.

petscbin2amgclbin.7z

Denis Demidov

unread,

Jan 17, 2021, 1:42:56 AM1/17/21

to amgcl

I tried to look at the performance gains of reusing the preconditioner on the example of the rank-634000 set.

The code may be found here (it should work with other sets without changes):
https://gist.github.com/ddemidov/c662da0409a89243bf245c93fb731cc5

From this plot, it is obvious there is a sweet spot at iter_limit = 8, but the speed up w.r.t iter_limt = 0 is not high.

I also tried the flat preconditioner variant (single-level relaxation instead of full AMG hierarchy), but the increased solve time neglected any gains in the setup speed.

Denis Demidov

unread,

Jan 17, 2021, 1:58:15 AM1/17/21

to amgcl

And of course, this would change if you switch to a GPGPU backend. The solution there is much cheaper than setup, so one can afford to reuse a preconditioner for longer spans.

Here is a quick test comparing OpenMP backend:

solver -B -A mat_bin_003.bin -f rhs_bin_003.bin solver.tol=1e-2

634526

0 3 ... 3770994 3770996

Solver

======

Type: BiCGStab

Unknowns: 634526

Memory footprint: 33.89 M

Preconditioner

==============

Number of levels: 4

Operator complexity: 1.45

Grid complexity: 1.15

Memory footprint: 184.26 M

level unknowns nonzeros memory

---------------------------------------------

0 634526 3770996 145.13 M (69.07%)

1 87090 1539418 35.49 M (28.19%)

2 6717 138171 2.97 M ( 2.53%)

3 627 11371 678.85 K ( 0.21%)

Iterations: 3

Error: 0.00959578

[Profile: 0.574 s] (100.00%)

[ self: 0.004 s] ( 0.61%)

[ reading: 0.043 s] ( 7.58%)

[ setup: 0.295 s] ( 51.47%)

[ solve: 0.231 s] ( 40.33%)

with CUDA one:

solver_cuda -B -A mat_bin_003.bin -f rhs_bin_003.bin solver.tol=1e-2

GeForce GTX 1050 Ti

Solver

======

Type: BiCGStab

Unknowns: 634526

Memory footprint: 33.89 M

Preconditioner

==============

Number of levels: 4

Operator complexity: 1.45

Grid complexity: 1.15

Memory footprint: 140.97 M

level unknowns nonzeros memory

---------------------------------------------

0 634526 3770996 111.10 M (69.07%)

1 87090 1539418 26.94 M (28.19%)

2 6717 138171 2.25 M ( 2.53%)

3 627 11371 688.64 K ( 0.21%)

Iterations: 3

Error: 0.00959578

[Profile: 0.822 s] (100.00%)

[ self: 0.278 s] ( 33.89%)

[ reading: 0.042 s] ( 5.17%)

[ setup: 0.434 s] ( 52.86%)

[ solve: 0.066 s] ( 8.08%)

Denis Demidov

unread,

Jan 17, 2021, 2:55:47 AM1/17/21

to amgcl

I did not create MTX files because on Windows VS refused to compile this line, the template could not be resolved...:
amgcl::io::mm_write("b.mtx", rhs.data(), n, 1);

That is strange, I've added a unit test for the mm io in amgcl (https://github.com/ddemidov/amgcl/blob/master/tests/test_io.cpp), and it seems to work on windows with VS (search for 'test_io' on the page):
https://ci.appveyor.com/project/ddemidov/amgcl/builds/37303423#L118

Denis Demidov

unread,

Jan 17, 2021, 3:48:34 AM1/17/21

to amgcl

Looks like enabling spectral radius estimation (these lines) in the smoothed aggregation coarsening allows the preconditioner to stay effective for longer periods. See the updated results here:

https://gist.github.com/ddemidov/45527bd23efa7cdd12f5518eca3ee6de

On Sunday, January 17, 2021 at 9:42:56 AM UTC+3 Denis Demidov wrote:

C B

unread,

Jan 18, 2021, 3:33:16 AM1/18/21

to Denis Demidov, amgcl

Denis,

Thank you so much for taking the time to analyze this case and provide us with very clear examples.

I took your code and I run the same set as you did and the other two larger sets.

I compared the .log files between your run and mine on my computer and they are similar .... for example:

I was very excited because this set on my computer was giving a 35% speedup, but the larger sets do not show

the same speedup as we can see below:

I don't know if this is related to the parallel performance, I did not set OMP_NUM_THREADS, and I can see that in all cases the CPU % is very high....Or if this is just because the matrices have different properties.

I also followed your advice on testing different solvers and precondit using your solver application,

and for the Rank:634000 and 5550000 sets gmres is faster than bicgstab ...

$ sh compare_solver_precond.sh

Testing Rank:634000 Tole:5e-2 Solver:bicgstab Precond:spai0
Num mat-rhs 19: Total Wall 16.119 seconds, Ave Wall 0.848368 seconds
Maximum L2 relat error: 0.0495827
Solver Max memory used: 33.89M
Precon Max memory used: 184.28M
===================== 000 ===========================

Testing Rank:634000 Tole:5e-2 Solver:gmres Precond:spai0
Num mat-rhs 19: Total Wall 13.634 seconds, Ave Wall 0.717579 seconds
Maximum L2 relat error: 0.0452648
Solver Max memory used: 154.92M
Precon Max memory used: 184.28M
===================== 000 ===========================

Testing Rank:758000 Tole:5e-2 Solver:bicgstab Precond:spai0
Num mat-rhs 19: Total Wall 14.613 seconds, Ave Wall 0.769105 seconds
Maximum L2 relat error: 0.048981
Solver Max memory used: 40.48M
Precon Max memory used: 242.27M
===================== 000 ===========================

Testing Rank:758000 Tole:5e-2 Solver:gmres Precond:spai0
Num mat-rhs 19: Total Wall 15.666 seconds, Ave Wall 0.824526 seconds
Maximum L2 relat error: 0.0479046
Solver Max memory used: 185.08M
Precon Max memory used: 242.27M
===================== 000 ===========================

Testing Rank:5550000 Tole:5e-2 Solver:bicgstab Precond:spai0
Num mat-rhs 19: Total Wall 253.193 seconds, Ave Wall 13.3259 seconds
Maximum L2 relat error: 0.0493905
Solver Max memory used: 297.67M
Precon Max memory used: 1.83G
Program Peak Memory used: 3.2 GB
===================== 000 ===========================

Testing Rank:5550000 Tole:5e-2 Solver:gmres Precond:spai0
Num mat-rhs 19: Total Wall 218.838 seconds, Ave Wall 11.5178 seconds
Maximum L2 relat error: 0.0496392
Solver Max memory used: 1.33G
Precon Max memory used: 1.83G
Program Peak Memory used: 4.0 GB
===================== 000 ===========================

Perhaps I am splitting hairs, and I should not expect that one will always be faster than the other for this class of matrices :).

The compilation error in mm_write seems to come from a typedef (value_type), high level template knowledge is needed to figure this out. I am attaching the error in case you want to look at it.

This has been a very interesting exercise, and I must say your AMGCL is a great tool.

Cheers!

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/a6e40fac-3194-4818-a6d3-3bcf52144243n%40googlegroups.com.

compare_solver_precond.7z

VS2017-compilation-error-of-mm_write.7z

Denis Demidov

unread,

Jan 18, 2021, 3:37:28 AM1/18/21

to amgcl

Re the compilation error: it looks like you forgot to include <amgcl/adapter/crs_tuple.hpp>?

Denis Demidov

unread,

Jan 18, 2021, 3:40:14 AM1/18/21

to amgcl

Re larger sets: the matrices I have in the largest (matrix-size-wise) set all have different sizes, so the preconditioner rebuild is triggered on every step for me. Could this be the problem in your example as well?

C B

unread,

Jan 18, 2021, 9:49:07 AM1/18/21

to Denis Demidov, amgcl

Denis, you are absolutely right, the include fixed the problem :)

Thanks!

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/dff9d3bb-deb8-45bb-b81d-96f47b24bb3bn%40googlegroups.com.

C B

unread,

Jan 18, 2021, 5:42:59 PM1/18/21

to Denis Demidov, amgcl

Denis, yes increasing the matrix size from iteration to iteration happens most of the time, actually staying constant for a few iteration like in the 634k is rare.

When I tried to reuse the precond, the matrix for the precond was extended with a diagonal of "1" and rhs of "0" for approx 0.1% of the actual size, this extended size value saved as say NEXT, and then each subsequent matrix with a size <= NEXT was extended to NEXT in the same way. If the new matrix size was > NEXT then this was another criteria to trigger a precond reset. However, in my implementation with regular bicgstab / spai0 I did not succeed in reducing the Wall time. But now I am going back to implement some details, for example,

computing and resetting maxiter so that we never end up doing too many iters while reusing the precond:

If time_setup and time_solve are the last times from the last precond rebuilt, we can limit the num of iters as:

iters_max_local = (time_setup + time_solve) / time_solve_per_iter + 0.5;
prm.solver.maxiter = iters_max_local; // in case next iterations are done without precond rebuild

But now how can we update the values of prm in "solve" which had been built with a call like this:

solve = std::make_shared<Solver>(std::tie(nsize, ptr, col, val), prm);

Or otherwise how can I pass prm in the call to solve:

(*solve)(std::tie(nsolve, ptr, col, val),rhs, sol);

Also, I have another question, how can I call the destructor of *solve ?

When convergence is not reach in iters_max_local, I want to delete *solve.

Thanks in advance.

Cheers

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/e020c93e-db52-4b41-a56a-b4787ece4d27n%40googlegroups.com.

Denis Demidov

unread,

Jan 19, 2021, 12:48:19 AM1/19/21

to amgcl

On Tuesday, January 19, 2021 at 1:42:59 AM UTC+3 cebau...@gmail.com wrote:

Denis, yes increasing the matrix size from iteration to iteration happens most of the time, actually staying constant for a few iteration like in the 634k is rare.

When I tried to reuse the precond, the matrix for the precond was extended with a diagonal of "1" and rhs of "0" for approx 0.1% of the actual size, this extended size value saved as say NEXT, and then each subsequent matrix with a size <= NEXT was extended to NEXT in the same way. If the new matrix size was > NEXT then this was another criteria to trigger a precond reset. However, in my implementation with regular bicgstab / spai0 I did not succeed in reducing the Wall time. But now I am going back to implement some details, for example,
computing and resetting maxiter so that we never end up doing too many iters while reusing the precond:
If time_setup and time_solve are the last times from the last precond rebuilt, we can limit the num of iters as:
iters_max_local = (time_setup + time_solve) / time_solve_per_iter + 0.5;
prm.solver.maxiter = iters_max_local; // in case next iterations are done without precond rebuild

But now how can we update the values of prm in "solve" which had been built with a call like this:
solve = std::make_shared<Solver>(std::tie(nsize, ptr, col, val), prm);
Or otherwise how can I pass prm in the call to solve:
(*solve)(std::tie(nsolve, ptr, col, val),rhs, sol);

I think something like

solve->prm.maxiter = 10;

should work.

Also, I have another question, how can I call the destructor of *solve ?
When convergence is not reach in iters_max_local, I want to delete *solve.

See https://en.cppreference.com/w/cpp/memory/shared_ptr

solve.reset() should reset the pointer (call the destructor).

But, the solver will be destroyed anyway either on the next iteration, when you assign a new value to it,

or when it is out of scope (at the return from the function).

Thanks in advance.
Cheers

C B

unread,

Jan 19, 2021, 3:34:58 PM1/19/21

to Denis Demidov, amgcl

Denis,

Thanks for these pointers.

I could not get the maxiter value to reset, I tried in one of the tutorial examples as shown below, compiles fine but the new value is not used.

Solver::params prm;
prm.solver.tol = 1e-6;
prm.solver.maxiter = 50;

Solver solve(std::tie(n, ptr, col, val), prm);

//solve.prm.maxiter = 2; //compil error
solve.prm.solver.maxiter = 2;
std::tie(iters, error) = solve(rhs, x);

In the meantime, I did a few tests on the 5+ million size set, and I am starting to think that if I get some gain by doing this, it is going to be marginal at best.

I guess my conclusion is that AMGCL is such an optimal approach, that it is hard to save much time tweaking/customizing what is already coded!.

Cheers,

--
You received this message because you are subscribed to the Google Groups "amgcl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amgcl+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/5c7c866d-fa8f-414f-99c1-b18e8476e75an%40googlegroups.com.

Reply all

Reply to author

Forward