collect_periodic_faces slow on large grids and in parallel

51 views
Skip to first unread message

Praveen C

unread,
Dec 16, 2025, 5:10:41 AM12/16/25
to Deal.II Googlegroup
Dear all

I am using parallel::distributed::Triangulation for a 3D problem.

Case 1. Suppose I create a 256^3 mesh like this

GridGenerator::subdivided_hyper_rectangle
Call for x,y,z
    GridTools::collect_periodic_faces
triangulation.add_periodicity

then code runs for a very long time to do this, and on some machines crashes with oom_kill error, which is related to out of memory error.

Case 2. It works fine if I do

GridGenerator::hyper_rectangle (initial mesh has single cell)
Call for x,y,z
    GridTools::collect_periodic_faces
triangulation.add_periodicity
triangulation.refine_global(8)

However I have nonstandard grid sizes, e.g., 195x72x106 which cannot be obtained by refine_global.

What options are there to improve the periodicity setup part.

Will parallel::fullydistributed:Tria help ?

thanks
praveen

Wolfgang Bangerth

unread,
Dec 17, 2025, 11:34:21 PM12/17/25
to dea...@googlegroups.com
On 12/16/25 03:10, Praveen C wrote:
>
> *Case 1.* Suppose I create a 256^3 mesh like this
>
> GridGenerator::subdivided_hyper_rectangle
> Call for x,y,z
>     GridTools::collect_periodic_faces
> triangulation.add_periodicity
>
> then code runs for a very long time to do this, and on some machines crashes
> with oom_kill error, which is related to out of memory error.

Praveen:
Can you be more specific? Show us a minimal code that demonstrates the problem
and that we can run. It should be possible to do this with not much more than
20 lines of code.

Specifically, in which operation is the CPU time used, and in which do you run
out of memory?

(My best guess is that the algorithms use a double loop over all coarse mesh
cells. This works well if you have a few dozen or a few hundred coarse mesh
cells, as we often do. But your 195x72x106 mesh has 1.5M coarse mesh cells,
and I'm not surprised that that doesn't work. You won't fix that issue by
going to a different triangulation class. I think it's inherent in the current
design of the periodic face finding algorithm, but we can't know for sure
unless we know where exactly the problem appears.)

Best
W.

Praveen C

unread,
Dec 20, 2025, 11:00:18 AM12/20/25
to dea...@googlegroups.com
Hello Wolfgang

I made example code for both distributed and fullydistributed triangulations.


On a 128^3, I get this timing

mpirun -np 2 ./main

distributed

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |       463s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Add periodicity                 |         1 |       445s |        96% |
| Collect faces x                 |         1 |    0.0425s |         0% |
| Collect faces y                 |         1 |    0.0313s |         0% |
| Collect faces z                 |         1 |    0.0335s |         0% |
+---------------------------------+-----------+------------+------------+

fullydistributed

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |      12.2s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Add periodicity                 |         1 |    0.0038s |         0% |
| Collect faces x                 |         1 |     0.368s |         3% |
| Collect faces y                 |         1 |    0.0153s |      0.13% |
| Collect faces z                 |         1 |    0.0133s |      0.11% |
+---------------------------------+-----------+------------+------------+

The major time is taken in add_periodicity function. fullydistributed does this faster, I have to see if other parts of my code work with this triangulation.

best
praveen

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dealii/547e683a-8075-49cc-8f09-6f59a0ae04bd%40colostate.edu.

Wolfgang Bangerth

unread,
Dec 20, 2025, 7:07:51 PM12/20/25
to dea...@googlegroups.com
On 12/20/25 08:59, Praveen C wrote:
>
> On a 128^3, I get this timing
>
> mpirun -np 2 ./main
>
> *distributed*
>
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |       463s |            |
> |                                             |            |            |
> | Section                         | no. calls |  wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Add periodicity                 |         1 |       445s |        96% |
> | Collect faces x                 |         1 |    0.0425s |         0% |
> | Collect faces y                 |         1 |    0.0313s |         0% |
> | Collect faces z                 |         1 |    0.0335s |         0% |
> +---------------------------------+-----------+------------+------------+

Ah yes, that's clearly bad :-) Are you in a position to put timers into the
implementation of that function for the p::d::T case to figure out which part
of the algorithm is so slow?

Short of that, I think it would be useful to try with, say, 32^3, 64^3, 128^3
to see whether the run time grows like N^2, N^3, etc. This helps narrow down
which parts of the code one would have to look at. (E.g., if it's N^2, you
know you have to look for double loops.)

Best
W.

Daniel Arndt

unread,
Dec 21, 2025, 1:21:47 PM12/21/25
to dea...@googlegroups.com
Running that program in a profiler shows that almost all time is spent
in dealii::internal::p4est::functions<dim>::connectivity_join_faces.

Best,
Daniel
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/dealii/5cb216f7-2cc8-407d-a723-9296830b8e55%40colostate.edu.

Wolfgang Bangerth

unread,
Dec 26, 2025, 6:41:04 PM (12 days ago) 12/26/25
to dea...@googlegroups.com
On 12/21/25 11:21, Daniel Arndt wrote:
> Running that program in a profiler shows that almost all time is spent
> in dealii::internal::p4est::functions<dim>::connectivity_join_faces.

That's just an alias for a p4est function, right? So the time is actually
spent in p4est?

Best
W.

Daniel Arndt

unread,
Dec 29, 2025, 11:59:18 AM (10 days ago) 12/29/25
to dea...@googlegroups.com
Yes, exactly, that's what my profiler was showing. I didn't dive
deeper into p4est to understand why that is or if there is a better
interface to call.

Best,
Daniel
Reply all
Reply to author
Forward
0 new messages