AMGCL's CUDA performance is great - Can we get even more ?

109 views
Skip to first unread message

C B

unread,
May 16, 2021, 3:52:12 PM5/16/21
to amgcl
Hello Denis and All !

I hope this email finds you well,
I am very happy with AMGCL's GPU performance, and I wonder if we can get even more ! :)

AMGCL's performance on GPUs is great, even with a low end GPU (GTX 1060) I am getting much better performance than on the CPU using CG Stab, without AMG because I only need to reduce the residual one order of magnitude.

Case1: (over a very large number of setup/solves)
CPU setup+solve Wall= 1114+2476
GPU setup+solve Wall= 1639+660   => saves 1290 seconds !!!

Case2: (over a very large number of setup/solves)
CPU setup+solve Wall= 1653+4808
GPU setup+solve Wall= 1180+744   => saves 4550 seconds !!!
(the setup times are not flipped, I double checked them ..., in any case the GPU wall times are much lower, even when Windows shows GPU CUDA  less than 5% usage but this may be because the code does other things besides AMGL solutions, and I always check zero with the CPU version).

I can send you an AMGL's Profile stdout if you would like to see, but I wonder if there are any CUDA settings that end users can try.

It would be great if Denis would do recordings on AMGCL and post them on Youtube, who knows if AMGCL takes off you would make a good income :)).

Thanks in advance!
Regards,

Denis Demidov

unread,
May 17, 2021, 12:13:45 AM5/17/21
to amgcl
Hi Carl,

I don't remember if you tried the preconditioner reuse strategy with the CUDA backend?
Also, there was recently added a possibility to partially reuse the constructed amg hierarchy for another matrix of the same size:


it keeps the transfer operator matrices P and R, and updates the hierarchy using the Galerkin operator RAP with the new matrix A. From my experiments, it is about 40-50% faster than a full rebuild.
You can enable this by doing prm.precond.allow_rebuild = true and then call the rebuild with
solver.precond().rebuild(A, bprm) where A is the new matrix (in the same format it was passed to the initial constructor).
Since your matrices change size sometimes, you will also need to do the full rebuild when that happens.

C B

unread,
May 22, 2021, 5:47:28 PM5/22/21
to Denis Demidov, amgcl
Hi Denis,
Thank you very much for your guidance. Sorry it took me a while to get back to you on this thread.
I did try reusing the precond early on when you showed me how, this was only with OpenMP/CPU.
On the CPU I did not get an advantage because my matrices are always growing in size, I attempted to work with a matrix that had "additional" ones in the diagonal for future reuse when the system was larger, but I could not make this competitive with rebuilding the precond.
You have a point, if I were using AMG for solving on the GPU, it is likely that reusing the precond could pay off because of hte high speed in the solve part :).  I hope I am understanding what you are suggesting.  This would definitely make sense for typical solutions where we need to reduce the residual by several orders, but it may not in my current case where I only need 0.1, and therefore I am not using AMG, actually here comes the best part:

When you pointed me to the documentation, I asked myself why am I using spai0, perhaps I can save time not using anything, (I take case of starting from a symmetric PD matrix with all 1s in the diagonal), 
Before I was using:
amgcl::relaxation::as_preconditioner<PrecBackend, amgcl::relaxation::spai0>,
amgcl::solver::bicgstab<SolvBackend>

And I changed to:
amgcl::preconditioner::dummy<PrecBackend>,
amgcl::solver::bicgstab<SolvBackend>


and now I am saving ~ 20% of wall time in the solution :)), just with OpenMP, I have not had a chance of tryin on a GPU.
I will try on a GPU next weekend.
I have more questions on GPUs that I will send on a seperate thread.
Thanks again,
Cheers

--
You received this message because you are subscribed to the Google Groups "amgcl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to amgcl+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/amgcl/e35a10dd-f815-4831-ad93-344bd1d053d4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages