Unable to match the performance in step-40

61 views
Skip to first unread message

Wasim Niyaz Munshi ce21d400

unread,
Apr 5, 2023, 11:10:58 AM4/5/23
to deal.II User Group
Hello everyone.
I am trying to run step-40 on my local system and match the performance against the results given in the tutorial problem. I am running the problem on a single processor. My total time is significantly less than that given in the tutorial problem. But my output takes up around 50% of the total time, which isn't the case in the tutorial problem.
I cannot figure out why my total time is significantly less than that in the tutorial and why my output takes 50% of the total wall clock time.
I am attaching both results for reference.

Thanking in anticipation
Regards
Wasim Niyaz
step-40_my_local_system.png
step-40_tutorial.png

Wolfgang Bangerth

unread,
Apr 5, 2023, 12:54:37 PM4/5/23
to dea...@googlegroups.com

Wasim,

> I am trying to run step-40 on my local system and match the performance
> against the results given in the tutorial problem. I am running the
> problem on a single processor. My total time is significantly less than
> that given in the tutorial problem. But my output takes up around 50% of
> the total time, which isn't the case in the tutorial problem.
> I cannot figure out why my total time is significantly less than that in
> the tutorial and why my output takes 50% of the total wall clock time.
> I am attaching both results for reference.

Are you running in debug or release mode?

For reference, here is what I get in debug mode for cycle 3:

Cycle 3:
Number of active cells: 7096
Number of degrees of freedom: 31639
Solved in 11 iterations.


+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 5.84s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| assembly | 1 | 1.85s | 32% |
| output | 1 | 0.532s | 9.1% |
| refine | 1 | 2.01s | 34% |
| setup | 1 | 0.926s | 16% |
| solve | 1 | 0.52s | 8.9% |
+---------------------------------+-----------+------------+------------+



And here for release mode:

Cycle 3:
Number of active cells: 7096
Number of degrees of freedom: 31639
Solved in 11 iterations.


+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start | 1.07s | |
| | | |
| Section | no. calls | wall time | % of total |
+---------------------------------+-----------+------------+------------+
| assembly | 1 | 0.0412s | 3.9% |
| output | 1 | 0.204s | 19% |
| refine | 1 | 0.282s | 26% |
| setup | 1 | 0.0349s | 3.3% |
| solve | 1 | 0.506s | 47% |
+---------------------------------+-----------+------------+------------+


As a general rule, though, these tiny problems are not of great
interest. step-40 is written to be run on substantial numbers of
processes, on millions or billions of degrees of freedom.

Best
W.
--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Wasim Niyaz Munshi ce21d400

unread,
Apr 5, 2023, 1:27:31 PM4/5/23
to deal.II User Group
I am running in release mode. I am attaching the results for cycle 3 for both debug and release modes. I will try to reproduce the plot of wall time vs the number of processors for 52M DOFs as given in the tutorial problem. That would be a better way to compare the performances!
debug_mode.png
release_mode.png

Wolfgang Bangerth

unread,
Apr 5, 2023, 1:37:25 PM4/5/23
to dea...@googlegroups.com
On 4/5/23 11:27, Wasim Niyaz Munshi ce21d400 wrote:
> I am running in release mode. I am attaching the results for cycle 3 for
> both debug and release modes. I will try to reproduce the plot of wall
> time vs the number of processors for 52M DOFs as given in the tutorial
> problem. That would be a better way to compare the performances!

Yes!

As for why your output function is so slow, the only thing I can imagine
is that whatever disk you write to is rather slow -- but I don't know
for sure.

Wasim Niyaz Munshi ce21d400

unread,
Apr 6, 2023, 3:31:36 AM4/6/23
to deal.II User Group
I tried to run step-40 with 52M DOFs on 32 processors. I am using GridGenerator::subdivided_hyper_rectangle to create a mesh with 5000*5000 elements. I have a single cycle in my simulation. However, I am running into some memory issues.
 I am getting the following error: Running with PETSc on 32 MPI rank(s)...
Cycle 0:
--------------------------------------------------------------------------
mpirun noticed that process rank 5 with PID 214402 on node tattva exited on signal 9 (Killed).
I tried with 40 processors (125 GB RAM) but I am getting the same error.

Wolfgang Bangerth

unread,
Apr 6, 2023, 11:51:19 AM4/6/23
to dea...@googlegroups.com
On 4/6/23 01:31, Wasim Niyaz Munshi ce21d400 wrote:
> **
>
> I tried to run step-40 with 52M DOFs on 32 processors. I am using
> *GridGenerator::subdivided_hyper_rectangle *to create a mesh with
> 5000*5000 elements. I have a single cycle in my simulation. However, I
> am running into some memory issues.
>  I am getting the following error: *Running with PETSc on 32 MPI
> rank(s)...*
> *Cycle 0:
> --------------------------------------------------------------------------
> *
> *mpirun noticed that process rank 5 with PID 214402 on node tattva
> exited on signal 9 (Killed)*.
> I tried with 40 processors (125 GB RAM) but I am getting the same error.

I'm pretty sure you run out of memory. You need a smaller problem, a
larger machine, or both.

Wasim Niyaz Munshi ce21d400

unread,
Apr 6, 2023, 12:07:10 PM4/6/23
to dea...@googlegroups.com
Yes, I also had the same feeling. But, when I look at the plot in the tutorial of step-40 for 52M Dofs, I see that they have solved the problem using just 32 processors also. Can you kindly let me know how much memory is available when you you run the problem on 32 processors? I get the memory error even when I use 80 processors (250 GB memory).

Thanks and regards 

Wasim Niyaz
Research scholar
CE Dept.
IITM

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/SP2s3PajYcY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/08b5e423-617a-3df5-9075-a609c819ebba%40colostate.edu.

Wolfgang Bangerth

unread,
Apr 6, 2023, 12:11:21 PM4/6/23
to dea...@googlegroups.com
On 4/6/23 10:06, Wasim Niyaz Munshi ce21d400 wrote:
> **
>
> Yes, I also had the same feeling. But, when I look at the plot in the tutorial
> of step-40 for 52M Dofs, I see that they have solved the problem using just 32
> processors also. Can you kindly let me know how much memory is available when
> you you run the problem on 32 processors? I get the memory error even when I
> use 80 processors (250 GB memory).

Wasim: Why don't you try a problem of intermediate size?

Wasim Niyaz Munshi ce21d400

unread,
Apr 7, 2023, 6:06:47 AM4/7/23
to deal.II User Group
I ran a problem with 2000*2000 cells (around 16M DOFs) on 16, 32 and 40 processors. It took 63 seconds on 16 processors, 74 seconds on 32 processors and is giving the same memory error for 40 processors. For 52M DOfs, I get the memory error for all the 3 cases.

Regards
Wasim
Reply all
Reply to author
Forward
0 new messages