Weak scaling running out of memory

32 views
Skip to first unread message

Lucas Myers

unread,
Jun 7, 2023, 3:04:41 PM6/7/23
to deal.II User Group
Hi everyone,

I'm trying to run a scaling analysis on my code, but when I make the system too large I get a (Killed) error, which I strongly suspect has to do with memory issues. I'm unsure why this is happening, because I am increasing resources proportional to the system size.

Details: I'm trying to get a sense for weak scaling in 3D, so for this I use a subdivided hyper-rectangle. Since the configuration is nearly constant in the third dimension, I use p1 = (-20, -20, -5) and p2 = (20, 20, 5) for my defining points. To try to keep the number of DoFs per core constant, I do runs with the following sets of parameters (so 5 runs total):

hyper-rectangle subdivisions: (4, 4, 1), (4, 4, 2), (4, 8, 2), (4, 4, 1), (4, 4, 2)
global refines: 5, 5, 5, 6, 6
# cores: 128, 256, 512, 1024, 2048

Each node is 128 cores, and so doubling the number of cores also doubles the amount of available memory. However, it seems that the memory is running out even before starting to assemble the system (so it couldn't be the solver that is causing this problem). Are there any data structures in deal.II which might scale poorly (memory-wise) in this scenario? And also are there any nice ways of figuring out what is eating all the memory?

- Lucas

Martin Kronbichler

unread,
Jun 7, 2023, 3:54:39 PM6/7/23
to dea...@googlegroups.com

Dear Lucas,

Without seeing your code, it is difficult to nail down the issue. But by far the most common mistake that lead to this type of problem in my codes is that I forgot to initialize an AffineConstraints object with an index set for the locally relevant DoFs, i.e., I was missing this line: https://github.com/dealii/dealii/blob/ea23d6bb90739b6bd2f7af96a2b9b73bb10c7298/examples/step-40/step-40.cc#L294

In general, deal.II has been used for far larger problems than this, so most data structures should be scalable. But of course, there are many places that might be problematic, and as a library deal.II might have many things that have not been tested at scale. I found it useful to add lines like https://github.com/kronbichler/multigrid/blob/6b43f32b4758a169af5b4bb54546ad279d6fee9f/poisson_dg/program.cc#L245-L247 with the 'print_time' function given here https://github.com/kronbichler/multigrid/blob/6b43f32b4758a169af5b4bb54546ad279d6fee9f/common/laplace_operator.h#L38-L52 (the name is not really good and only done on a side branch, the motivation is of course to print some statistics) at various places in the code, in order to identify the problem. With measurements of the actual memory consumption you are often able to identify the problem on a smaller scale, even though it might take 2-3 attempts to have the timers in the right spots.

Best,
Martin

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/fee240c8-795f-432f-a1e2-c462a981c26fn%40googlegroups.com.

Lucas Myers

unread,
Jun 7, 2023, 4:49:20 PM6/7/23
to dea...@googlegroups.com
Hi Martin,

The problem that you suggest was indeed what was happening. Thanks so much for the help! I imagine this helped me avoid a long debugging process.

As an aside, is this something that would qualify for an error message? Something like putting an Assert line at the top of any number of functions which operate on an AffineConstraints object using a DoFHandler. This would query whether the triangulation is distributed, and also whether constraints had been initialized with locally_relevant_dofs, and if it does not it would stop the program.

If so, I can file a bug report and try to make it happen.

- Lucas

You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/mZ7ejCvA-kU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/fbb8751e-e850-c9d5-4a72-4dd9d21067d2%40gmail.com.
Reply all
Reply to author
Forward
0 new messages