Distributed Block-Gauss-Seidel (BGS) Preconditioning

52 views

Skip to first unread message

Nik Markus Leuenberger

unread,

Sep 5, 2024, 5:43:18 PM9/5/24

to deal.II User Group

Dear all,

It has been a very nice experience over the summer beginning to learn deal.ii and *mis*using it for my finite volume calculations.

In the meantime, I have a distributed code for the sparse matrix assembly using MPI.

So the next challenge :) is to have an efficient/scalable way of solving this large sparse system.

Since my system is really the same as the one in https://www.sciencedirect.com/science/article/pii/S004578251930146X (equation 34, i.e. the isothermal case), I would ideally like to use the same solution/preconditioning approach as they used there.

However, when I think about how to implement this using deal.ii and Trilinos, I don't yet have a concrete plan in mind, how one would do this. Some of my questions are:

How can I globally re-order the dof's using cell_wise according to material id's (e = electrolyte, a = anode, c = cathode) with ne,na,nc being the number of cells that have that specific material id (this is the system with blocks in eqn. 34) and solution component (u_1 and u_2, this might not be necessary but could also help to treat different variables seperately), such that the right hand side vector would look for example as follows: (I use FE_DGQ(0), so one value per component per cell)
u_{1,e,0} = first solution component of the first (arbitrary) cell in domain e
u_{2,c,nc} = second solution component of the last (arbitrary) cell in domain c

RHS = [u_{1,e,0,} ...., u_{1,e, ne}, u_{2,e,0},...,u_{2,e,ne}, u_{1,a,0}, ...,u_{1,a,na}, u_{2,a,0}, ...., u_{2,a,na}, u_{1,c,0},...,u_{1,c,nc},u_{2,c,0}, ..., u_{2,c,nc}]
For a single processor, I was able to write a function to compute the new cell ordering and then pass it to DoFRenumbering::cell_wise(). However, in parallel, cell_wise() only works on the locally owned cells, so I don't know how I could produce the desired global reordering.
I do remember the comment by Prof. Bangerth (https://groups.google.com/g/dealii/c/Mseg4sKKylw/m/kwxiWAaxAQAJ) suggesting the use of " different vector conditions (-> step-46)." If you think this would be the best way to extract the individual blocks of the fully-coupled matrix, I can go back and try to re-write the assembly using different FESystems for the different domains.
Also, I have a conceptual question related to distributed preconditioning (as an example, the Block Gauss Seidel (BGS) preconditioner described in section 3.3. BGS preconditioner of the paper above. Of course, the even better one is its combination with AMG, but starting with the BGS preconditioner and a direct solver for the individual blocks, to see how it works, would already be super nice:
Now the question:
Assuming I managed to extract all the relevant blocks for the BGS preconditioner (maybe with extract_submatrix_from() after reordering?), and the corresponding right hand side pieces as a TrilinosWrappers::sparse_matrix and TrilinosWrappers::MPI::Vector respectively,
could I then *just* use TrilinosWrappers::sparse_matrix::vmult() function to perform the necessary matrix operations for the BGS routine and this would be done in parallel behind the scenes?

Again, I know these are many questions and I appreciate all help, feedback and thoughts very much.

Thank you very much,

Nik

Nik Markus Leuenberger

unread,

Sep 6, 2024, 2:23:27 PM9/6/24

to deal.II User Group

Quick update on the post above after some more reading of the documentation - I apologize that I didnt't wait with the post until I had fully read into these things.

On Thursday, September 5, 2024 at 11:43:18 PM UTC+2 Nik Markus Leuenberger wrote:

Dear all,

It has been a very nice experience over the summer beginning to learn deal.ii and *mis*using it for my finite volume calculations.

In the meantime, I have a distributed code for the sparse matrix assembly using MPI.
So the next challenge :) is to have an efficient/scalable way of solving this large sparse system.

Since my system is really the same as the one in https://www.sciencedirect.com/science/article/pii/S004578251930146X (equation 34, i.e. the isothermal case), I would ideally like to use the same solution/preconditioning approach as they used there.

However, when I think about how to implement this using deal.ii and Trilinos, I don't yet have a concrete plan in mind, how one would do this. Some of my questions are:
How can I globally re-order the dof's using cell_wise according to material id's (e = electrolyte, a = anode, c = cathode) with ne,na,nc being the number of cells that have that specific material id (this is the system with blocks in eqn. 34) and solution component (u_1 and u_2, this might not be necessary but could also help to treat different variables seperately), such that the right hand side vector would look for example as follows: (I use FE_DGQ(0), so one value per component per cell)
u_{1,e,0} = first solution component of the first (arbitrary) cell in domain e
u_{2,c,nc} = second solution component of the last (arbitrary) cell in domain c

RHS = [u_{1,e,0,} ...., u_{1,e, ne}, u_{2,e,0},...,u_{2,e,ne}, u_{1,a,0}, ...,u_{1,a,na}, u_{2,a,0}, ...., u_{2,a,na}, u_{1,c,0},...,u_{1,c,nc},u_{2,c,0}, ..., u_{2,c,nc}]

After some more reading, the current solution that I see is the one explained in a bit more detail below, essentially using an fe_collection with different FESystems on each subdomain.

For a single processor, I was able to write a function to compute the new cell ordering and then pass it to DoFRenumbering::cell_wise(). However, in parallel, cell_wise() only works on the locally owned cells, so I don't know how I could produce the desired global reordering.

I think both DoFRenumbering::cell_wise() or also DoFRenumbering:: sort_selected_dofs_back() won't work here, since they are not implement for global reordering on meshes, distributed using MPI. Please correct me if I'm wrong on this.

I do remember the comment by Prof. Bangerth (https://groups.google.com/g/dealii/c/Mseg4sKKylw/m/kwxiWAaxAQAJ) suggesting the use of " different vector conditions (-> step-46)." If you think this would be the best way to extract the individual blocks of the fully-coupled matrix, I can go back and try to re-write the assembly using different FESystems for the different domains.

I think this is actually the way to go because if I define/use different elements in the different material domains, I can just use DoFRenumbering::component_wise() and I will get the 6 different components that I would like to have. Once renumbered, I should then be able to construct the different blocks like in step-20: (https://github.com/dealii/dealii/blob/39134a2dfd1cb0dab7ab3e4894d818443ffab0cf/examples/step-20/step-20.cc#L384-L406).

Of course, I will have to setup the coupling terms to obtain the correct sparsity pattern similar to step-46 (https://github.com/dealii/dealii/blob/39134a2dfd1cb0dab7ab3e4894d818443ffab0cf/examples/step-46/step-46.cc#L405C5-L431C6)

And even before that, use the material_id tags of the cells to correctly choose which FESystem out of the collection to work with (same as here in step-46: https://github.com/dealii/dealii/blob/39134a2dfd1cb0dab7ab3e4894d818443ffab0cf/examples/step-46/step-46.cc#L297C3-L309C4)

Also, I have a conceptual question related to distributed preconditioning (as an example, the Block Gauss Seidel (BGS) preconditioner described in section 3.3. BGS preconditioner of the paper above. Of course, the even better one is its combination with AMG, but starting with the BGS preconditioner and a direct solver for the individual blocks, to see how it works, would already be super nice:
Now the question:
Assuming I managed to extract all the relevant blocks for the BGS preconditioner (maybe with extract_submatrix_from() after reordering?), and the corresponding right hand side pieces as a TrilinosWrappers::sparse_matrix and TrilinosWrappers::MPI::Vector respectively,
could I then *just* use TrilinosWrappers::sparse_matrix::vmult() function to perform the necessary matrix operations for the BGS routine and this would be done in parallel behind the scenes?

I will get back to this part, once the above things are working.

Wolfgang Bangerth

unread,

Sep 8, 2024, 10:55:26 PM9/8/24

to dea...@googlegroups.com

Nik,
it sounds like you've already figured out the big building blocks. Indeed, the
right way to deal with this is to use different finite elements on the
different parts of the domain (like in step-46), and then to sort DoFs by
vector component -- which then automatically produces the block structure.

> * For a single processor, I was able to write a function to compute the new

> cell ordering and then pass it to DoFRenumbering::cell_wise(). However, in
> parallel, cell_wise() only works on the locally owned cells, so I don't
> know how I could produce the desired global reordering.
>
> I think both DoFRenumbering::cell_wise() or also DoFRenumbering::
> sort_selected_dofs_back() won't work here, since they are not implement for
> global reordering on meshes, distributed using MPI. Please correct me if I'm
> wrong on this.

This is correct. You can sort the DoFs by cells *within a subdomain* according
to whatever criterion, but you probably don't want to do that on the entire
domain (and in any case, there is not currently a function that would that).
The reason is that if you did, the DoF indices stored by each process would
not form a contiguous range, and that would result in problems with vector and
matrix classes.

(I will note that when you sort by vector component, you *also* don't end up
with contiguous ranges -- but then you use block vectors and block matrices,
and *within these*, each process owns a contiguous range of DoFs.)

If you have any questions left, feel free to open a separate question!

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Reply all

Reply to author

Forward

0 new messages