AMGCL's solvers tests - Preconditioner options mapped to coding options

68 views
Skip to first unread message

C B

unread,
Jul 8, 2021, 11:12:48 AM7/8/21
to amgcl
Hi Denis,

I am trying to mimic these two options:
    1) amgcl::relaxation::as_preconditioner<PrecBackend, amgcl::relaxation::spai0>
    2) amgcl::preconditioner::dummy<PrecBackend>

With the solvers example, however when I try these options:
-1 -p precond.type=spai0
-1 -p precond=spai0
-1 -p precond=dummy

They all produce the same residual and number of iterations, so I suppose they are all interpreted the same. I was thinking that if the matrix does not have a unit diagonal, spai0 should give different results than dummy, but I am not sure...

Thanks!

Denis Demidov

unread,
Jul 8, 2021, 12:25:48 PM7/8/21
to amgcl
Hi Carl,

1) should be -1 -p precond.type=spai0 (or -p precond.class=relaxation precond.type=spai0)
2) should be -p precond.class=dummy

Denis Demidov

unread,
Jul 9, 2021, 1:13:06 AM7/9/21
to amgcl


---------- Forwarded message ---------
From: C B <cebau...@gmail.com>
Date: Fri, Jul 9, 2021 at 2:03 AM
Subject: Re: AMGCL's solvers tests - Preconditioner options mapped to coding options
To: Denis Demidov <dennis....@gmail.com>


Hi Denis,
Thank you so much for this information, I am so fortunate getting your help all the time :).

I look into other libraries to find out if there is anything out there that can compete with AMGCL,
and I must say that from what I have seen, for general sparse systems on regular workstations, AMGCL is way better than anything else that I have tried.
So kudos to you, I am baffled because I can't understand why people continue to use other libraries that have much lower performance than AMGCL.

With that said, and because I am interested in just raw speed with the simplest solver to reduce the residual just by 0.1 or 0.05,
I tested ViennaCL and I got these values:
image.png

WIth my limited testing I could not get Vienna to do any AMG, etc, my first impression is that it is not robust, but perhaps it is because I did not spend enough time .....
But when it comes the the simplest  case that I want to solve, 
I see that on my old laptop the performances are comparable, 
but then on the the newer gpu the wall time reduction is much more pronounced with Vienna,
and I think that I finally understood what you told me before about the 1st/2nd run,
because I see that the second solve within the same executing process, with the same num of iters, same residual,
the measured time decreases considerably, I guess it is because the compilation is already done, library initialized, etc...

I was wondering if you know what is going on, why Vienna gets such better performance on this AMD card.
Of course this affects me because of my particular use case, I bet this is not an issue when AMG is used, but I just wanted to know your take on this.
Regards,





--
You received this message because you are subscribed to the Google Groups "amgcl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amgcl+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/413c2913-f604-4d19-9073-ade965ad6cf5n%40googlegroups.com.


--
Cheers,
Denis

Denis Demidov

unread,
Jul 9, 2021, 1:25:11 AM7/9/21
to amgcl
Carl,

Since you are not mentioning the preconditioner, I am assuming you are testing unpreconditioned solvers here. ViennaCL manual says they use pipelined versions of the solvers when used without preconditioner:

And dummy preconditioner in amgcl is not completely cost-free: it models identity matrix, and the cost is equivalent to a single vector copy. It is mostly there to test the unpreconditioned solver convergence. Maybe the unpreconditioned version of pipelined bicgstab from viennacl simply works best for your problem.

Finally, there is the ViennaCL backend implemented in amgcl:
Below are some comparisons using cuda, vexcl, and viennacl backends.

$ solver_cuda -n 128
NVIDIA GeForce GTX 1050 Ti

Solver
======
Type:             BiCGStab
Unknowns:         2097152
Memory footprint: 112.00 M

Preconditioner
==============
Number of levels:    4
Operator complexity: 1.62
Grid complexity:     1.13
Memory footprint:    567.74 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0      2097152       14581760    422.42 M (61.61%)
    1       263552        7918340    127.63 M (33.46%)
    2        16128        1114704     15.07 M ( 4.71%)
    3          789          53055      2.62 M ( 0.22%)

Iterations: 10
Error:      2.50965e-09

[Profile:          2.141 s] (100.00%)
[ self:            0.255 s] ( 11.90%)
[  assembling:     0.158 s] (  7.36%)
[  setup:          1.168 s] ( 54.56%)
[  solve:          0.561 s] ( 26.19%)

$ solver_vexcl_cl -n 128
1. NVIDIA GeForce GTX 1050 Ti (NVIDIA CUDA)

Solver
======
Type:             BiCGStab
Unknowns:         2097152
Memory footprint: 112.00 M

Preconditioner
==============
Number of levels:    4
Operator complexity: 1.62
Grid complexity:     1.13
Memory footprint:    744.74 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0      2097152       14581760    553.22 M (61.61%)
    1       263552        7918340    168.88 M (33.46%)
    2        16128        1114704     20.01 M ( 4.71%)
    3          789          53055      2.62 M ( 0.22%)

Iterations: 10
Error:      2.50965e-09

[Profile:          2.198 s] (100.00%)
[ self:            0.097 s] (  4.42%)
[  assembling:     0.119 s] (  5.41%)
[  setup:          1.421 s] ( 64.62%)
[  solve:          0.562 s] ( 25.54%)

$ solver_viennacl -n 128
NVIDIA GeForce GTX 1050 Ti (NVIDIA Corporation)

Solver
======
Type:             BiCGStab
Unknowns:         2097152
Memory footprint: 0.00 B

Preconditioner
==============
Number of levels:    4
Operator complexity: 1.62
Grid complexity:     1.13
Memory footprint:    2.60 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0      2097152       14581760      0.00 B (61.61%)
    1       263552        7918340      0.00 B (33.46%)
    2        16128        1114704      0.00 B ( 4.71%)
    3          789          53055      2.60 M ( 0.22%)

Iterations: 10
Error:      2.50965e-09

[Profile:          2.420 s] (100.00%)
[ self:            0.102 s] (  4.21%)
[  assembling:     0.122 s] (  5.04%)
[  setup:          1.321 s] ( 54.60%)
[  solve:          0.875 s] ( 36.15%)

C B

unread,
Jul 9, 2021, 5:40:18 AM7/9/21
to Denis Demidov, amgcl
Denis,

Thank you so much for your insights.

Yes, as you guessed I am comparing the unpreconditioned solvers, and now I understand when you mentioned pipelined solvers a few days ago :).
The pipelined approach seems to make a big difference for the atypical requirements that I am dealing with.

Thank you again for your help, it is very much appreciated.
Cheers,


Reply all
Reply to author
Forward
0 new messages