Hi, deal community and developers,
Thank you for all the help I ever got from you guys.
I'm developing a MPI vector-valued non-linear PDEs solver. The way I chose is dumping newton iteration, which uses AMG preconditioner support from Trilinos 12.18 for linear solver. Currently, the program is CPU code. It works, and fast enough for practical purposes.
However, as the subject suggests, I always wonder if I can take advantage of GPU to make my code faster or cheaper. The clusters I am using have Nvidia Ampere A100 and AMD MI215 gpu nodes. It will be a big waste if I don't use them.
I know dealii has CUDA wrapper and Kokkos leverage. But what I don't know is how I can *properly* use them to speed up matrix-based newton iteration code. I wonder if there is a *suggested pattern* to conduct GPU programming in dealII? A naive idea in my mind is to use GPU for system-matrix assembly, but I didn't see this in the only cuda tutorial i.e., step-64. As there is possibly(always) an untold conversion for a legendary library. I wonder what are the suggested ways in dea.ll for using GPU in matrix-based code?
Tim,
Sincerely