Status of hybrid shared/distributed parallelism

60 views
Skip to first unread message

Lucas Myers

unread,
Mar 16, 2023, 2:40:24 PM3/16/23
to deal.II User Group
Hi folks,

I'm wondering if there's somewhere I can look to get a broad overview of the different parallelization schemes available in deal.II for various pieces of the library, and maybe what people's experiences have been. As I understand it, any matrix/vector assembly can be done with a shared-parallel scheme with the threads framework, but I'm less sure about solvers (which are typically the bottleneck in my applications). To relay my understanding so far (and ask some specific questions):

Matrix-based AMG methods:
  • PETSc and Trilinos Epetra only use MPI distributed parallelism
  • Trilinos Tpetra with MueLu is hybrid, but requires Kokkos. Do the Tpetra wrappers have associated solvers? And how do they work with Kokkos via CUDA?
Matrix-based GMG methods:
  • Use deal.II's own distributed vector, and wraps a regular deal.II solver (although perhaps uses a Trilinos smoother). Are any of the solvers hybrid-parallelized? And are there deal.II-specific smoothers to avoid copying to Epetra vectors?
Matrix-free GMG methods:
  • Use deal.II's distributed vector and wrap regular solver. Additionally, the matrix multiplication is done via a matrix-free operator. Is it possible (or easy) to write a custom matrix-free operator which takes advantage of shared parallelism? And will the solver take advantage of shared-memory parallelism when (say) adding vectors?
Matrix-free GMG methods with hp-adaptivity:
  • The only example that I've seen is step-75, and that uses Trilinos AMG for the coarse solver, which I think is limited to MPI distributed parallelism. Could the coarse solver be adapted to use shared parallelism, and are the other aspects of this solver able to be parallelized?
As a final question: which of these aspects are eligible for handling by CUDA, MPI-3.0 shared memory, and Kokkos? And have folks found a significant speed-up by implementing any of these tools in their code?

Thanks so much for any insight,
Lucas

Bruno Turcksin

unread,
Mar 16, 2023, 5:25:47 PM3/16/23
to deal.II User Group
Lucas,

Why are you interested in hybrid parallelism? Are you hoping to improve the performance of your code or is it simply something you want to try? If the solver is the bottleneck in your code, you should focus on finding a better preconditioner. With that being said, matrix-free methods tend to be faster than matrix-based method if you can use them. As to your question, in general you can assume that when you use Trilinos or PETSc, we only use distributed parallelism. However, when you use deal.II's own data structure (Solver, Vector, etc.), we use multithreading. We just don't advertise it. The matrix-free framework support MPI-3.0 shared memory. We have CUDA matrix-free methods but we are rewriting them using Kokkos. Hopefully, the refactor will be done by the end of next month.

Best,

Bruno
Reply all
Reply to author
Forward
0 new messages