Lucas,
Why are you interested in hybrid parallelism? Are you hoping to improve the performance of your code or is it simply something you want to try? If the solver is the bottleneck in your code, you should focus on finding a better preconditioner. With that being said, matrix-free methods tend to be faster than matrix-based method if you can use them. As to your question, in general you can assume that when you use Trilinos or PETSc, we only use distributed parallelism. However, when you use deal.II's own data structure (Solver, Vector, etc.), we use multithreading. We just don't advertise it. The matrix-free framework support MPI-3.0 shared memory. We have CUDA matrix-free methods but we are rewriting them using Kokkos. Hopefully, the refactor will be done by the end of next month.
Best,
Bruno