collect_periodic_faces slow on large grids and in parallel

Praveen C

unread,

Dec 16, 2025, 5:10:41 AM12/16/25

to Deal.II Googlegroup

Dear all

I am using parallel::distributed::Triangulation for a 3D problem.

Case 1. Suppose I create a 256^3 mesh like this

GridGenerator::subdivided_hyper_rectangle

Call for x,y,z

GridTools::collect_periodic_faces

triangulation.add_periodicity

then code runs for a very long time to do this, and on some machines crashes with oom_kill error, which is related to out of memory error.

Case 2. It works fine if I do

GridGenerator::hyper_rectangle (initial mesh has single cell)

Call for x,y,z

GridTools::collect_periodic_faces

triangulation.add_periodicity

triangulation.refine_global(8)

However I have nonstandard grid sizes, e.g., 195x72x106 which cannot be obtained by refine_global.

What options are there to improve the periodicity setup part.

Will parallel::fullydistributed:Tria help ?

thanks

praveen

Wolfgang Bangerth

unread,

Dec 17, 2025, 11:34:21 PM12/17/25

to dea...@googlegroups.com

On 12/16/25 03:10, Praveen C wrote:
>
> *Case 1.* Suppose I create a 256^3 mesh like this

>
> GridGenerator::subdivided_hyper_rectangle
> Call for x,y,z
> GridTools::collect_periodic_faces
> triangulation.add_periodicity
>
> then code runs for a very long time to do this, and on some machines crashes
> with oom_kill error, which is related to out of memory error.

Praveen:
Can you be more specific? Show us a minimal code that demonstrates the problem
and that we can run. It should be possible to do this with not much more than
20 lines of code.

Specifically, in which operation is the CPU time used, and in which do you run
out of memory?

(My best guess is that the algorithms use a double loop over all coarse mesh
cells. This works well if you have a few dozen or a few hundred coarse mesh
cells, as we often do. But your 195x72x106 mesh has 1.5M coarse mesh cells,
and I'm not surprised that that doesn't work. You won't fix that issue by
going to a different triangulation class. I think it's inherent in the current
design of the periodic face finding algorithm, but we can't know for sure
unless we know where exactly the problem appears.)

Best
W.

Praveen C

unread,

Dec 20, 2025, 11:00:18 AM12/20/25

to dea...@googlegroups.com

Hello Wolfgang

I made example code for both distributed and fullydistributed triangulations.

https://codeberg.org/cpraveen/deal_ii/src/branch/master/add_periodicity

On a 128^3, I get this timing

mpirun -np 2 ./main

distributed

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 463s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Add periodicity | 1 | 445s | 96% |
| Collect faces x | 1 | 0.0425s | 0% |
| Collect faces y | 1 | 0.0313s | 0% |
| Collect faces z | 1 | 0.0335s | 0% |
+---------------------------------+-----------+------------+------------+

fullydistributed

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 12.2s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Add periodicity | 1 | 0.0038s | 0% |
| Collect faces x | 1 | 0.368s | 3% |
| Collect faces y | 1 | 0.0153s | 0.13% |
| Collect faces z | 1 | 0.0133s | 0.11% |
+---------------------------------+-----------+------------+------------+

The major time is taken in add_periodicity function. fullydistributed does this faster, I have to see if other parts of my code work with this triangulation.

best
praveen

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dealii/547e683a-8075-49cc-8f09-6f59a0ae04bd%40colostate.edu.

Wolfgang Bangerth

unread,

Dec 20, 2025, 7:07:51 PM12/20/25

to dea...@googlegroups.com

On 12/20/25 08:59, Praveen C wrote:
>
> On a 128^3, I get this timing
>
> mpirun -np 2 ./main
>

> *distributed*

>
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start | 463s | |
> | | | |
> | Section | no. calls | wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Add periodicity | 1 | 445s | 96% |
> | Collect faces x | 1 | 0.0425s | 0% |
> | Collect faces y | 1 | 0.0313s | 0% |
> | Collect faces z | 1 | 0.0335s | 0% |
> +---------------------------------+-----------+------------+------------+

Ah yes, that's clearly bad :-) Are you in a position to put timers into the
implementation of that function for the p::d::T case to figure out which part
of the algorithm is so slow?

Short of that, I think it would be useful to try with, say, 32^3, 64^3, 128^3
to see whether the run time grows like N^2, N^3, etc. This helps narrow down
which parts of the code one would have to look at. (E.g., if it's N^2, you
know you have to look for double loops.)

Best
W.

Daniel Arndt

unread,

Dec 21, 2025, 1:21:47 PM12/21/25

to dea...@googlegroups.com

Running that program in a profiler shows that almost all time is spent
in dealii::internal::p4est::functions<dim>::connectivity_join_faces.

Best,
Daniel

> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.

> To view this discussion visit https://groups.google.com/d/msgid/dealii/5cb216f7-2cc8-407d-a723-9296830b8e55%40colostate.edu.

Wolfgang Bangerth

unread,

Dec 26, 2025, 6:41:04 PM12/26/25

to dea...@googlegroups.com

On 12/21/25 11:21, Daniel Arndt wrote:
> Running that program in a profiler shows that almost all time is spent
> in dealii::internal::p4est::functions<dim>::connectivity_join_faces.

That's just an alias for a p4est function, right? So the time is actually
spent in p4est?

Best
W.

Daniel Arndt

unread,

Dec 29, 2025, 11:59:18 AM12/29/25

to dea...@googlegroups.com

Yes, exactly, that's what my profiler was showing. I didn't dive
deeper into p4est to understand why that is or if there is a better
interface to call.

Best,
Daniel

Wolfgang Bangerth

unread,

Jan 12, 2026, 8:45:39 PMJan 12

to dea...@googlegroups.com

On 12/29/25 09:58, Daniel Arndt wrote:
>> On 12/21/25 11:21, Daniel Arndt wrote:
>>> Running that program in a profiler shows that almost all time is spent
>>> in dealii::internal::p4est::functions<dim>::connectivity_join_faces.
>> That's just an alias for a p4est function, right? So the time is actually
>> spent in p4est?
>>
> Yes, exactly, that's what my profiler was showing. I didn't dive
> deeper into p4est to understand why that is or if there is a better
> interface to call.

Ah, that's a bummer. Someone with more time might have to dig into p4est,
and/or the interface between p4est and deal.II to see how this could be
improved :-(

p4est used to have some algorithms that were written with the assumption that
the coarse mesh is relatively small, until we came along with meshes that had
a few 100,000 cells and that blew up quadratic algorithms. The p4est folks
eventually fixed the cases we pointed out, but it is quite possible that the
function here is one that has simply never been tried with large coarse
meshes. You may want to open an issue with the p4est folks explaining the
situation and seeing whether they have an easy solution.

Best
W.

Reply all

Reply to author

Forward