Removing cells in parallel distributed triangulation

119 views
Skip to first unread message

Mohammadreza ya'ghoobi

unread,
Apr 20, 2021, 2:14:10 PM4/20/21
to deal.II User Group
Hi,

I'm trying to subtract some elements based on a user-defined criteria. Ideally, I want to use each processor in a way that they are responsible only for their partition (using is_locally_owned()) to remove the cells from the triangulation using GridGenerator::create_triangulation_with_removed_cells. However, when I used  this, it will not proceed after GridGenerator::create_triangulation_with_removed_cells. 
Very inefficient solution is each processor will process the whole triangulation space and remove the elements. This works. However, as the simulation size increases, this becomes really a bottle neck.

Is there any way I can tackle this smarter?

Thanks
Reza

Below is the version of the code which works but very inefficiently.
I added the red lines so every processor be responsible just for their own parts. It stops at GridGenerator::create_triangulation_with_removed_cells.



GridGenerator::subdivided_hyper_rectangle (triangulation2, userInputs.subdivisions, Point<dim>(), Point<dim>(userInputs.span[0],userInputs.span[1],userInputs.span[2]), true);

typename Triangulation< dim, dim >::active_cell_iterator cell = triangulation2.begin_active(), endc = triangulation2.end();

for (; cell!=endc; ++cell) {
if (cell->is_locally_owned()){
double pnt3[3];
const Point<dim> pnt2=cell->center();
for (unsigned int i=0; i<dim; ++i){
pnt3[i]=pnt2[i];
}

gID=orientations_Mesh.getMaterialID(pnt3);
cellOrientationMap_Mesh.push_back(gID);
}
}

unsigned int materialID;
unsigned int cell2=0;
cell = triangulation2.begin_active(); endc = triangulation2.end();
std::set<typename Triangulation< dim, dim >::active_cell_iterator> cells_to_remove;
unsigned int NumberOwned=0;
for (; cell!=endc; ++cell) {
if (cell->is_locally_owned()){
materialID=cellOrientationMap_Mesh[cell2];
NumberOwned=NumberOwned+1;
if (materialID ==userInputs.deletionGrainID){
cells_to_remove.insert (cell);
}
cell2=cell2+1;
}
}

char buffer[200];

if (cells_to_remove.size()>0){
GridGenerator::create_triangulation_with_removed_cells(triangulation2,cells_to_remove,triangulation);
}
else{
triangulation.copy_triangulation(triangulation2);
}
}
else {
triangulation.copy_triangulation(triangulation2);
}

Marc Fehling

unread,
Apr 20, 2021, 3:47:13 PM4/20/21
to deal.II User Group
Hello Reza,

`parallel::distributed::Triangulation` objects store the entire coarse mesh, see also the module documentation. This allows for a quick execution of adaptive refinement. So from my understanding, you need to perform this operation on all active cells, including ghost and artificial ones. So the working code you have might be the only way to use `GridGenerator::create_triangulation_with_removed_cells()` in the parallel distributed context.

Can you quantify how inefficient your code is? For example did you time how long your mesh creation takes and compare it to the overall runtime?

Best,
Marc

mohammadreza yaghoobi

unread,
Apr 20, 2021, 3:59:08 PM4/20/21
to dea...@googlegroups.com
Thanks Marc. 

My biggest challenges are both time and memory. I'm trying to model a crystal plasticity sample with ~20 million elements. In the case of no element deletion, I can simply use global refinement and I won't face memory issues while generating mesh and running the code (and of course no time on element deletion).

However, when I'm trying to add the element deletion feature, then I cannot use global refinement,  and I'll face the memory issue, since all of the triangulation will be on the coarsest level. On top of that, for every processor, the code has to search the element deletion pattern for all of the triangulation (~20 million elements). As I increase the sample size, each processor still needs to handle all the elements which becomes a big bottle-neck, and also uses the resources very inefficiently. Actually, parallelization cannot help at all with this part of the code, which actually serves as the serial with large memory requirements. Just to give you an idea, it will take about 12 hrs in my example to bypass the element deletion search part running on 1500 processors, each has 38gb memory. I actually have to underutilize my node (instead of 24, I'm using 5) because of memory issues.

The ideal case would be each processor just handling their own part and some reduction happened throughout the processors.

I hope I was able to clarify the bottle neck.

Thanks
Reza



--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/07d82b49-8eae-4eed-84b8-bbbb1f534baen%40googlegroups.com.

Wolfgang Bangerth

unread,
Apr 20, 2021, 4:05:18 PM4/20/21
to dea...@googlegroups.com
On 4/20/21 1:58 PM, mohammadreza yaghoobi wrote:
>
> The ideal case would be each processor just handling their own part and some
> reduction happened throughout the processors.

Reza -- the parallel::distributed::Triangulation class was written under the
assumption that storing artificial cells is not expensive, which in practice
boils down to the assumption that the mesh is refined globally or adaptively
sufficiently many times and/or that the coarse mesh is relatively small. In
your case, your coarse mesh has 20M cells, and that is a case that the p::d::T
class simply was not designed for.

But, the good news is that that there is a different class,
parallel::fullydistributed::Triangulation, that is meant for this kind of
case. This class is meant to be used for meshes where the coarse mesh is so
large that it can not be replicated on all processes. (I don't recall right
now whether it allows mesh refinement in its current state, though.)

There is currently no tutorial program that illustrates its use, but maybe you
can gather enough information from what exists in the class documentation and
what you can find in the test suite. Feel free to also ask on the mailing list
if you have questions about this class!

Best
Wolfgang

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

mohammadreza yaghoobi

unread,
Apr 20, 2021, 4:12:51 PM4/20/21
to dea...@googlegroups.com
Thanks Wolfgang.

That's a really good news as this is our biggest current bottleneck. 
Just a quick question to warm up! does GridGenerator::create_triangulation_with_removed_cells or something similar is available for this class?

Thanks
Reza

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.

Wolfgang Bangerth

unread,
Apr 20, 2021, 4:39:00 PM4/20/21
to dea...@googlegroups.com
On 4/20/21 2:12 PM, mohammadreza yaghoobi wrote:
> Just a quick question to warm up! does
> GridGenerator::create_triangulation_with_removed_cells or something similar is
> available for this class?

I think the workflow would be different. The function above takes an existing
triangulation (which in your case is large!) and creates another triangulation
with some cells removed. p::f::T would probably be created differently, from
some intermediate data structure that describes the mesh but is not a
triangulation yet. You probably want to read up on the class -- its use case
is different and tailored for exactly the situation you have: where just
storing the whole triangulation on one process is not feasible.

Jean-Paul Pelteret

unread,
Apr 20, 2021, 5:19:13 PM4/20/21
to dea...@googlegroups.com
Hi Reza,

With no context as to how element deletion affects the geometry itself, may I ask if you’ve considered element deactivation using the FE_Nothing element? If you can afford to store the mesh but primarily want to ensure that no DoFs are assigned to a region outside of the physical domain boundaries, then this might be an alternate approach to adjusting the mesh directly.

Best,
Jean-Paul

Mohammadreza Yaghoobi

unread,
Apr 20, 2021, 9:16:32 PM4/20/21
to dea...@googlegroups.com
Thanks Jean-Paul and Wolfgang,

I'll give a try on parallel::fullydistributed::Triangulation as adding FE_nothing would still have that issue of memory. If I don't get success, I'll circle back with you on FE_nothing.

Thanks
Reza

Mohammadreza ya'ghoobi

unread,
Apr 22, 2021, 12:52:06 PM4/22/21
to deal.II User Group
Dear Jean-Paul,

I'm trying to solve a simpler problem of uniaxial response of a plate with hole using FE_Nothing as you recommended. I re-read steps 10, 27, and 46 to refresh my memory. I think I have the logic but I appreciate if you check some of the steps I have a little concern about. The two main step I took are:
1) I assign the cells inside the hole as FE_Nothing and the rest as regular FE.
2) During the stiffness matrix and residual force formation, I only go over the elements in regular FE part and exclude the FE_Nothing element from the calculation.

My only concern are the cells on the interface. Does performing assembly and residual formation just on the FE cells (and exclude FE_Nothing cells) guarantee the traction free surface of the cells on the surface of the hole?  Or I need to take extra action to guarantee the traction free surface of the hole?

Thanks
Reza

Wolfgang Bangerth

unread,
Apr 22, 2021, 12:55:47 PM4/22/21
to dea...@googlegroups.com
On 4/22/21 10:52 AM, Mohammadreza ya'ghoobi wrote:
>
> I'm trying to solve a simpler problem of uniaxial response of a plate with
> hole using FE_Nothing as you recommended. I re-read steps 10, 27, and 46 to
> refresh my memory. I think I have the logic but I appreciate if you check some
> of the steps I have a little concern about. The two main step I took are:
> 1) I assign the cells inside the hole as FE_Nothing and the rest as regular FE.
> 2) During the stiffness matrix and residual force formation, I only go over
> the elements in regular FE part and exclude the FE_Nothing element from the
> calculation.
>
> My only concern are the cells on the interface. Does performing assembly and
> residual formation just on the FE cells (and exclude FE_Nothing cells)
> guarantee the traction free surface of the cells on the surface of the hole?
> Or I need to take extra action to guarantee the traction free surface of the hole?

That depends on how you create the FE_Nothing element. Its constructor takes a
flag that determines what the solution should look like on the other side:
Should it be zero at the interface, or be unconstrained? You want it to be
unconstrained -- it's a boundary at which you intend to do nothing special
(which will then correspond to zero-traction Neumann-type boundary condition).

Best
W.

Mohammadreza ya'ghoobi

unread,
Apr 24, 2021, 5:31:45 PM4/24/21
to deal.II User Group
I successfully implemented  parallel::fullydistributed::Triangulation in my code. The way it works right now is it requires an initial triangulation to generate the TriangulationDescription::Description< dim, dim > using create_description_from_triangulation. It means that I should first generate the triangulation. This still imposes the memory constraint on the process as you already generated the large mesh for each processor. I tried it and still require the same memory.

Is there any way that while we use GridGenerator, it can directly generate the mesh and distribute it over nodes using p::f::T? Any recommendation is highly appreciated.

Thanks
Reza

Wolfgang Bangerth

unread,
Apr 25, 2021, 7:46:34 PM4/25/21
to dea...@googlegroups.com
On 4/24/21 3:31 PM, Mohammadreza ya'ghoobi wrote:
> I successfully implemented parallel::fullydistributed::Triangulation in my
> code. The way it works right now is it requires an initial triangulation to
> generate the TriangulationDescription::Description< dim, dim >
> using create_description_from_triangulation. It means that I should first
> generate the triangulation. This still imposes the memory constraint on the
> process as you already generated the large mesh for each processor. I tried it
> and still require the same memory.
>
> Is there any way that while we use GridGenerator, it can directly generate the
> mesh and distribute it over nodes using p::f::T? Any recommendation is highly
> appreciated.

Reza: I suspect that there is no example you can look up, but the idea is that
the p::f::T class is built based on the TriangulationDescription object. How
such an object is created is not actually important: You could generate it any
other way as well. So that means that you'll have to start reading through how
these objects look like and how you could generate one without create a
complete Triangulation first. Does this make sense?

A good point to start with is to look through the tests in
tests/fully_distributed_grids/ and see whether there is something that can
point you in the right direction.

Best
W>
Reply all
Reply to author
Forward
0 new messages