Hi all,
I have a project requiring to read in a large coarse mesh from gmsh to deal.ii
> 1M dofs. Most of cells have their own characteristics, which means I
cannot combine them and create a coarse mesh.
Currently, I implemented it by using shared memory triangulation for
parallelization. Since I want to scale it to a cluster system and target a 100M
mesh (no need for mesh refinement), I have to use distributed tria via MPI (is
there any better solution?). I found out that the initial cost is large because
of the duplication of triangulation and p4est forest. I was wondering if there
is any method to remove part of triangulation or p4est data. Or can I build
initial distributed triangulation by myself? Based on my understanding, the
vertices and cells need a cross reference for ID (cell_id is like deal-p4est permutation
vectors). However, these changes require comprehensive understanding of the
whole triangulation system of deal.ii. I was wondering if anyone can give me
some idea of it. Thanks
Here I provide one simple trick if you have similar projects like mine. In order
to read cells of different characteristics, cell_id is a good way to id each
cell and give its own parameters. However, since deal.ii limited number of
cell_id in char, then the maximum cell_id is 256. A simple way to fix it is by
changing type of cell_id in type.h to unsigned int and also its corresponding
instantiation.
On the other hand, if you are using solution transfer in shared_tria, you can
add “solution_transfer.cc : (all_out[0]).compress(VectorOperation::insert);” so
that the solution tria should be
template<int dim, typename VectorType, typename DoFHandlerType>
void SolutionTransfer<dim, VectorType, DoFHandlerType>::interpolate
(const VectorType &in,
VectorType &out) const
{
Assert (in.size()==n_dofs_old,
ExcDimensionMismatch(in.size(), n_dofs_old));
Assert (out.size()==dof_handler->n_dofs(),
ExcDimensionMismatch(out.size(), dof_handler->n_dofs()));
std::vector<VectorType> all_in(1);
all_in[0] = in;
std::vector<VectorType> all_out(1);
all_out[0] = out;
interpolate(all_in,
all_out);
(all_out[0]).compress(VectorOperation::insert); // this helps avoid interpolate error
out=all_out[0];
}
Another simple trick to speed up the system, you can turn
off the reorder mesh function and mesh by yourself and give a dummy
permutation. The idea of p4est is to limit the communication so that just make
sure the adjacent cells are grouped properly. This trick gives a huge save in computation
time (>1M dofs).
best
YC Chen
Yi-Chung,
> Thank you for your prompt reply. I was wondeirng if I can integrate
> other partition tools, such as metis or parmetis to handle the fully
> distributed triangulation. I can develop that part by myself (or with
> some help from the community). Do you have any suggestions?
I believe that it would not be impossible to develop this, but it will
probably not be a small exercise. You would have to develop a fair share
of code, and gain an understanding of parts of the library you don't
usually get to see if you use deal.II.
If you're willing to do this, we can guide you along the process, but
it's going to be a bit of work for sure.
> My following
> project also relies on this since I will try to manage number of cells
> in each processor. With p4est, it is hard to manage number of cells
> based on a in-house algorithm.
That is actually not quite true. p4est (and the deal.II classes that use
it) allow you to describe a "weight" for each cell, and p4est will
partition in a way that the sum of weights is roughly equal among all
processors.
> My application is about IC designs that
> may have million to billion cells. A fully distributed triangulation
> helps to reduce memory usage. The current shared_memory system can
> handle 20M (single core) in system of 32GB main memory.
That's already quite impressive :-) What kind of meshes do you have that
require so many cells? Are they geometrically incredibly complicated to
require that many cells already at the coarse level?
> Any design of 1M cells on the distributed triangulation have problem of
> computation time because of the reorder step. This is why I bypassed it
> and provided a sorted mesh to grid_in (read_msh). For a problem of 5M
> cells, I can save 200sec at the step of create_triangulation.
Yes, I'm willing to believe this. The algorithm wasn't intended for
meshes of this size, though we did test it with ~300k cells in 2d and we
know that it scales like O(N). So 200 seconds seems like a long time. Is
this in debug mode?