Overlapping decomposition using p4est and deal.ii

prati...@gmail.com

unread,

Apr 2, 2018, 10:16:54 AM4/2/18

to deal.II User Group

Hello all,

I wanted to know if it was possible to do overlapping decompositions using deal.II and p4est. These are my thoughts:

deal.II has the parallel::distributed::Triangulation class which provides the functionality to partition and distribute the meshes using p4est's partition and adaptive space filling techniques. According to what I have understood from step-40, using the parallel::distributed::Triangulation , one can automatically partition the domain into equal (load balanced) parts and distribute the mesh to the separate processes so that no single process needs to have the entire mesh (except the initial coarse mesh). I would like to extend this to have an overlapping decomposition such that each process has a certain overlap with the neighboring processes and maybe impose boundary conditions on these nodes.

I understand that in step-40 there are ghost cells which are actually owned by the neighboring processes but serve some similar aspects. But, as told here ( https://groups.google.com/forum/#!searchin/dealii/overlap|sort:date/dealii/e-V2ZaPed1c/WMsZGtT2wWkJ ) it seems you cannot control the width and I am not sure if one can impose boundary conditions on them.

Using METIS, I could probably use the partMesh_Kway and then do a breadth wise inclusion of the neighboring nodes based on the overlap, but I am not sure how I can accomplish this using p4est in deal.II.

In short, using p4est, I would like to have an overlapping decomposition on a parallel distributed case where I could impose boundary conditions on the said overlapped boundary nodes. These overlapped nodes would be a part of both the current process and the neighboring process.

Any suggestions or alternative recommendations would be really helpful.

Thanks,
Pratik.

Wolfgang Bangerth

unread,

Apr 2, 2018, 2:44:58 PM4/2/18

to dea...@googlegroups.com

Pratik,

You are correct that deal.II currently only allows one layer of ghost
cells around the locally owned region. I believe that this could be
changed, however, given that p4est (to the best of my knowledge) allows
to change this. It would be a bit of work figuring out where in deal.II
one would need to call the corresponding functions of p4est, but I
imagine that is feasible if you wanted to dig around a bit. (We'd be
happy to provide you with the respective pointers.)

The bigger problem is that with the approach you suggest, you would have
to enumerate degrees of freedom that are live on each processor
independently from the global numbering, so that you can build the
linear systems on each processors subdomain plus layers of ghost cells.
There is no functionality for this at the moment. I suspect that you
could build this as a simple map from global DoFs to local DoFs, though,
and so that would likely also be feasible.

I think the question I would ask you is why you want to do this? I know
that overlapping domain decomposition methods were popular in the 1990s
and early 2000s, primarily because it allowed to re-use existing,
sequential codes: each processor simply has to solve their own, local,
PDE, and all communication is restricted to exchanging boundary value
information. But we know today that (i) this does not scale very well to
large numbers of processors, and (ii) global space methods where you
solve one large linear system across many processors, as we do in
step-40 for example, is a much better method. In other words, the reason
why there is currently little support for overlapping DD methods in
deal.II is because as a community, we have recognized that these methods
are not as good as others that have been developed over the last 20 years.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

prati...@gmail.com

unread,

Apr 2, 2018, 3:27:23 PM4/2/18

to deal.II User Group

Dear Wolfgang,

Thank you for your reply. I understand that DD methods are not quite that popular now. But I wanted to explore the avenue keeping in mind the issue of scalability as well. For scalability, as you mention communication is probably the bottleneck. A few papers have worked towards the idea of Asynchronous methods where there would be no synchronization points which could potentially have large benefits. There have been a few studies with asynchronous Schwarz methods (overlapping and non-overlapping) whose performance has been better than those for their Synchronous counterparts. Granted that their applicability in terms of rate of convergence (for the linear system solution) or the physical problem at hand is currently limited, I think it could be interesting at the Exascale level particularly with adaptive meshing.

It would be very helpful if you could provide me with some pointers for where and how to change the p4est functions in deal.II.

Thanks,
Pratik.

Wolfgang Bangerth

unread,

Apr 2, 2018, 4:01:54 PM4/2/18

to dea...@googlegroups.com

> Thank you for your reply. I understand that DD methods are not quite
> that popular now. But I wanted to explore the avenue keeping in mind the
> issue of scalability as well. For scalability, as you mention
> communication is probably the bottleneck.

No, communication is not the issue. It's not a computational problem but
a mathematical one: in each DD iteration, you only transport information
from one subdomain to the next subdomain. If you split the domain into
too many subdomains, you will need a lot of iterations to transport
information across the entire domain. In other words, the reason why DD
methods are not good for large processor counts is because the number of
outer iterations grows with the number of processors you have -- it just
doesn't scale.

A different perspective is that DD methods lack some kind of coarse grid
correction in which you exchange information globally. You could
presumably use a non-overlapping mortar method and get good parallel
scalability if you had a good preconditioner for the Schur complement
posed on the skeleton (i.e., the DoFs that are located on the interfaces
between subdomains). Of course, constructing such preconditioners has
also proven to be difficult.

> It would be very helpful if you could provide me with some pointers for
> where and how to change the p4est functions in deal.II.

All of the code that interfaces with p4est is in
source/distributed/tria.cc. We build the ghost layer in line 2532 where
we call p4est_ghost_new (through its dimension independent alias). That
function is defined here:

http://p4est.github.io/api/p4est__ghost_8h.html#a34a0bfb7169437f6fc2382a67c47e89d

You'll probably have to call p4est_ghost_expand() one or more times:

http://p4est.github.io/api/p4est__ghost_8h.html#ab9750fa62cbc17285a0eb5cfe13a1e28

I would just stick a call to this function right after the one to
ghost_new as a trial and see what happens in a small test program that
on each processor just outputs information about what cells are locally
owned and which are ghosts.

prati...@gmail.com

unread,

Apr 2, 2018, 4:22:35 PM4/2/18

to deal.II User Group

Yes, that is completely true. Information exchange is one of the main aspects that plagues the domain decomposition methods. But probably this is improved by the optimized Schwarz which introduces artificial boundary conditions to enhance the information exchange. I am not sure how this scales for a lot of processes. It might very well be that you would need a coarse grid preconditioner to improve the global information exchange. I think this could be interesting.

Thank you for the information on the p4est functions. I will try those out and get back to you.

Regards,
Pratik.

Pratik

unread,

Apr 11, 2018, 10:25:21 AM4/11/18

to deal.II User Group

Hello,

I had a few more questions and was some grateful for some clarification:

1. I tried out the p4est_ghost_expand as Wolfgang suggested, but it gives me an error because for some reason I get if(cell->is_ghost){cell->user_flag_set()==true} after the calls communicate_dof_indices_on_marked_cells() in dof_handler_policy.cc line 3875. I believe the function communicate_dof_indices_on_marked_cells() in line 3492 in the same file exchanges the dof indices (sending the local cells that are ghost somewhere else and receiving its ghost cells from other procs) but the unpack function should identify all the cells marked as ghost and receive the corresponding indices from the pack function, right ? Or have I misunderstood something here ?

2. All places in example 40 (my reference) for the matrix and rhs assembly, you use the locally_relevant_dofs and locally_owned_dofs. As you say in the Glossary, for locally_active_dofs

" Since degrees of freedom are owned by only one processor, degrees of freedom on interfaces between cells owned by different processors may be owned by one or the other, so not all degrees of freedom on a locally owned cell are also locally owned degrees of freedom."

And hence you assemble your local matrices with the locally_owned_dofs. But for the overlapped decomposition, I need to assemble matrices including the "ghost cells" . In my case, they wouldn't be ghost cells but locally owned cells but duplicated between logically overlapping processes. I am not really sure how to go about doing this.

Thank you,

Best Regards,
Pratik.

Wolfgang Bangerth

unread,

Apr 12, 2018, 1:57:32 AM4/12/18

to dea...@googlegroups.com, Pratik

> 1. I tried out the p4est_ghost_expand as Wolfgang suggested, but it gives me
> an error because for some reason I get
> if(cell->is_ghost){cell->user_flag_set()==true} after the calls
> communicate_dof_indices_on_marked_cells() in dof_handler_policy.cc line 3875.
> I believe the function communicate_dof_indices_on_marked_cells() in line 3492
> in the same file exchanges the dof indices (sending the local cells that are
> ghost somewhere else and receiving its ghost cells from other procs) but the
> unpack function should identify all the cells marked as ghost and receive the
> corresponding indices from the pack function, right ? Or have I misunderstood
> something here ?

Hm, good point. When we decide who each process needs to send the locally
owned DoF indices to (so that they can be set on ghost cells on the receiving
process), we determine the cells in question by asking whether one of their
vertices is adjacent to a ghost cell -- in other words, whether a cell is at
the boundary of the locally owned subdomain.

But if you have more than one layer of ghost cells, then this is not enough. I
think you need to extend the logic in a way where each process sends a list of
its ghost cells to the owning process of these cells, and then gets the
information back for exactly those cells. That may be a little bit of work,
but I don't think it's going to be terribly difficult to implement. Let us
know if you need help with this.

> 2. All places in example 40 (my reference) for the matrix and rhs assembly,
> you use the locally_relevant_dofs and locally_owned_dofs. As you say in the
> Glossary

> <http://dealii.org/developer/doxygen/deal.II/DEALGlossary.html#GlossParallelScaling>,

> for locally_active_dofs
>
> " Since degrees of freedom are owned by only one processor, degrees of freedom
> on interfaces between cells owned by different processors may be owned by one
> or the other, so not all degrees of freedom on a locally owned cell are also
> locally owned degrees of freedom."
>
> And hence you assemble your local matrices with the locally_owned_dofs. But
> for the overlapped decomposition, I need to assemble matrices including the
> "ghost cells" . In my case, they wouldn't be ghost cells but locally owned
> cells but duplicated between logically overlapping processes. I am not really
> sure how to go about doing this.

This is what I tried to explain in my previous email: on each process, you
need a separate enumeration of all degrees of freedom that live on that
process (either on locally owned or ghost cells). This enumeration must be
independent of the "global" enumeration we typically use in deal.II which
uniquely splits up all DoF indices among the processors.

Pratik

unread,

Apr 13, 2018, 5:53:46 PM4/13/18

to deal.II User Group

Wolfgang,

Thank you for your reply. If I understand it correctly, in step 4 of the ParallelDistributed::distribute_dofs() function in the dof_handler.cc file you compute and flag the cells that have ghost neighbours by looking at only the vertices by calling the function compute_vertices_with_ghost_neighbors() as you say above.

This uses the internal::p4est::p4est_iterate function to iterate the member variable of the struct FindGhosts.vertices_with_ghosts_neighbors which is a map of the cells and its subdomain id's through the find_ghost_faces() and find_ghost_corners() for 2d and find_ghost_edges() functions as well for 3d and flag those cells which have a face, corner or edge with.a ghost cell ?

I should be able to use this p4est_iterate function to search through the cells (even ghosts) and add them to the map right ? I am not sure how to do this. Could you please give me some pointers on this ?

If I understand correctly, the function p4est_iterate takes these function pointers (find_ghost_faces() ... ) and user structs ( FindGhosts )and iterates it through the parallel_forest. But I am not sure how to write a function that searches through the ghost cells and flags them and if that is all is needed for this and how do I make sure that the appropriate pairs are flagged for receiving and sending from and to the appropriate domains ?

Thank you very much,
Pratik.

Pratik

unread,

Apr 26, 2018, 5:04:24 PM4/26/18

to deal.II User Group

Hello Wolfgang,

I went through the functions again and would be grateful if you could tell me if I have got understood anything wrong here:

In file p4est_wrappers.cc from line 46:
1. As before, I would like to iterate over all the ghost layers to identify the cells over which I need to exchange the indices. The function that I think I would need to modify would be the compute_vertices_with_ghost_neighbors()

which calls the p4est_iterate function. Considering one of the callbacks of this function, find_ghosts_corner() which takes the sides (of type dealii::internal::p4est::iter<dim>::corner_side) variable and checks if any of the sides of that corner belong to a ghost cell using the cell_from_quad function.

In this function, there are two for loops (line 64 to 74 and line 84 to 97). First loop returns the ghost cell and gets the subdomain_id of that cell into the sc_array subids and the second loop adds this subdomain_id information to the vertices_with_ghost_neighbors.

I was wondering if it possible to also do something like this in the first loop as well:

                  for (j = 0; j < nsubs; j++)
                  {
                    (*vertices_with_ghost_neighbors)[cell->vertex_index(sides[i].corner)]
                    .insert (subdomain_ids[j]);
                  }

so that even the ghost cells to the first layer of ghost cells are added to the vertices_with_ghost_neighbors. Would this work ? Is there a better way ?

Thank you,
Pratik.

Wolfgang Bangerth

unread,

Apr 27, 2018, 12:33:33 AM4/27/18

to dea...@googlegroups.com, Pratik

Pratik,

> Thank you for your reply. If I understand it correctly, in step 4 of the
> ParallelDistributed::distribute_dofs() function in the dof_handler.cc file you
> compute and flag the cells that have ghost neighbours by looking at only the
> vertices by calling the function compute_vertices_with_ghost_neighbors() as
> you say above.

Yes. In essence, you have to find out which locally owned cell is a ghost cell
on some other processor. If you have only one layer of ghost cells around,
looking at which vertices are adjacent to ghost cells, and then checking which
locally owned cells are adjacent to these vertices is enough.

But if you have multiple ghost layers, this logic is not enough any more. You
will have to find a different algorithm to determine which of the locally
owned cells are ghosts somewhere else, and where. This will likely not be
trivial and I would guess involves communication in some way. It will require
some serious thinking.

> This uses the internal::p4est::p4est_iterate function to iterate the member
> variable of the struct FindGhosts.vertices_with_ghosts_neighbors which is a
> map of the cells and its subdomain id's through the find_ghost_faces() and
> find_ghost_corners() for 2d and find_ghost_edges() functions as well for 3d
> and flag those cells which have a face, corner or edge with.a ghost cell ?
>
> I should be able to use this p4est_iterate function to search through the
> cells (even ghosts) and add them to the map right ? I am not sure how to do
> this. Could you please give me some pointers on this ?

I don't know. I don't know the details of p4est_iterate, and I would have to
put some deep thought into how exactly I would want to write this algorithm.

> If I understand correctly, the function p4est_iterate takes these function
> pointers (find_ghost_faces() ... ) and user structs ( FindGhosts )and iterates
> it through the parallel_forest. But I am not sure how to write a function that
> searches through the ghost cells and flags them and if that is all is needed
> for this and how do I make sure that the appropriate pairs are flagged for
> receiving and sending from and to the appropriate domains ?

I don't know that either. As I said, I don't think that there is an easy
criterion for whether a locally owned cell is a ghost cell somewhere else if
you have more than one layer of cells.

You could, however, do it recursively. You know that if a cell is a ghost cell
on processor X, then all of its (locally owned) vertex neighbors are also
ghost cells on processor X if you have two layers of ghost cells. And so
forth. In other words, this would allow you to write the algorithm in a
recursive way.

Wolfgang Bangerth

unread,

Apr 27, 2018, 12:41:41 AM4/27/18

to dea...@googlegroups.com

On 04/26/2018 03:04 PM, Pratik wrote:
>
> In this function, there are two for loops (line 64 to 74 and line 84 to 97).
> First loop returns the ghost cell and gets the subdomain_id of that cell into
> the sc_array subids and the second loop adds this subdomain_id information to
> the vertices_with_ghost_neighbors.
>
> I was wondering if it possible to also do something like this in the first
> loop as well:
>
> |

> for(j =0;j <nsubs;j++)

> {
> (*vertices_with_ghost_neighbors)[cell->vertex_index(sides[i].corner)]
> .insert (subdomain_ids[j]);
> }
> |
>
>
> so that even the ghost cells to the first layer of ghost cells are added to
> the vertices_with_ghost_neighbors. Would this work ? Is there a better way ?

I have to admit that I don't know what the function does, exactly, and that
it's a bit hard to read -- we definitely had a different programming style at
the time :-(

What happens if you try this? The way to test these sorts of things is to come
up with a simple case for which you can draw little pictures -- say, you start
with a unit square and then refine it 2 or 3 times. Run things with two
processors and print a lot of debug output (or run things in a debugger) --
you can follow each step this way, and with a piece of paper and a pencil, you
can draw little pictures that help you understand what is supposed to happen
and what actually happens.

I apologize for not being of more help. The code was written some ten years
ago by Timo Heister when he was visiting my lab. I have no recollection of how
this code works, and would have to spend significant time to read through it
to be better able to help -- time I unfortunately don't have. Timo might
remember more, but he's got other things going on in his life right now as well.

Pratik

unread,

Apr 28, 2018, 12:44:49 AM4/28/18

to deal.II User Group

Thank you for your reply. The recursive idea looks interesting. I will try that for simple demonstratable cases. Also as a side question, p4est is still relevant today right ? Are there any disadvantages of using it with deal.ii at either very large or moderately large scales ?

Thanks,
Pratik.

Wolfgang Bangerth

unread,

Apr 28, 2018, 9:06:52 AM4/28/18

to dea...@googlegroups.com

On 04/27/2018 10:44 PM, Pratik wrote:
> Thank you for your reply. The recursive idea looks interesting. I will try
> that for simple demonstratable cases. Also as a side question, p4est is still
> relevant today right ? Are there any disadvantages of using it with deal.ii at
> either very large or moderately large scales ?

Yes, it is still very relevant. We don't have any other option implemented
that could handle distributing meshes in parallel.

Reply all

Reply to author

Forward