Memory usage using MPI/PETSc on single processor

Pete Griffin

unread,

Jul 25, 2016, 1:59:53 PM7/25/16

to deal.II User Group

I found that running step-8 and step-17 on a single processor Intel® Core™ i7-3630QM CPU @ 2.40GHz × 8 used substantially more Peak resident memory (> 5x) than I thought it would. This surprised me since I thought from reading step-17 that the memory increase was on the order of the solution vector which should have been << 2x greater. I verified some of the larger memory usage numbers using top.

Is my assumption correct that 5x Peak resident memory is more than it should be?

The results of other simulations with a beam with body-force load and with traction loads with and without HP and with and without MPI/PETSc all show the same results and they agree with beam theory.

The output of the modified step-8.cc and step-17.cc are attached along with a plot of peak virtual memory and peak resident memory

vs. DOF. The changes between the original distributed step-8 and step-17, with comments and extra newlines excluded (modified file >), are as below:

Thanks beforehand

Pete Griffin

======================================================================================

diff ~/Documents/Zipstore2/dealii-8.4.1-PETSc/examples/step-8/step-8.cc step-8.cc

56c47,48

< // This again is C++:

---

> #include <deal.II/base/utilities.h>

767a394,402

>

> Utilities::System::MemoryStats stats;

> Utilities::System::get_memory_stats(stats);

> std::stringstream Str;

>

> Str.str("");

> Str << " Peak virtual memory: " << stats.VmSize/1024 << " MB, Peak resident memory: "

> << stats.VmRSS/1024 << " MB" << std::endl;

> std::cout << Str.str();

781c411

< Step8::ElasticProblem<2> elastic_problem_2d;

---

> Step8::ElasticProblem<3> elastic_problem_2d;

======================================================================================

diff ~/Documents/Zipstore2/dealii-8.4.1-PETSc/examples/step-17/step-17.cc ../step-17/step-17.cc

84a50

> #include <deal.II/base/utilities.h>

1015c355

< for (unsigned int cycle=0; cycle<10; ++cycle)

---

> for (unsigned int cycle=0; cycle<8; ++cycle)

1018d357

<

1022c361

< triangulation.refine_global (3);

---

> triangulation.refine_global (2);

1049a383,391

>

> Utilities::System::MemoryStats stats;

> Utilities::System::get_memory_stats(stats);

> std::stringstream Str;

>

> Str.str("");

> Str << " Peak virtual memory: " << stats.VmSize/1024 << " MB, Peak resident memory: "

> << stats.VmRSS/1024 << " MB" << std::endl;

> std::cout << Str.str();

1073,1074c403

<

< ElasticProblem<2> elastic_problem;

---

> ElasticProblem<3> elastic_problem;

=============================================================================================

MPI-Mem.txt

PeakMemoryMPI.png

Bruno Turcksin

unread,

Jul 25, 2016, 2:31:09 PM7/25/16

to deal.II User Group

Pete,

you need to make sure that the codes do exactly the same thing. I haven't compared the codes too much but they use different preconditioners. This will impact how much memory you will use.

Best,

Bruno

Pete Griffin

unread,

Jul 25, 2016, 6:52:02 PM7/25/16

to deal.II User Group

Bruno, thanks for your quick response.

I varied the preconditioner 6 times with the resulting memory usage essentially the same. See results in attached file. Does anyone have something else that may be important to resulting in the large memory usage difference. Another fact I noticed was there was much less difference between the peak virtual memory and peak resident memory in step-17 vs. step-8.

I would like to repeat, that simple beam tests result in correct maximum deflection and angle, for a number of loads and situations.

template <int dim>

unsigned int ElasticProblem<dim>::solve ()

{

SolverControl solver_control (solution.size(),

1e-8*system_rhs.l2_norm());

PETScWrappers::SolverCG cg (solver_control,

mpi_communicator);

//#### Original PETScWrappers::PreconditionBlockJacobi preconditioner(system_matrix);

//#### PETScWrappers::PreconditionJacobi preconditioner(system_matrix);

//#### PETScWrappers::PreconditionSSOR preconditioner(system_matrix);

//#### PETScWrappers::PreconditionSOR preconditioner(system_matrix);

//#### PETScWrappers::PreconditionNone preconditioner(system_matrix);

//#### PETScWrappers::PreconditionLU preconditioner(system_matrix);

//#### std::cerr << "PETScWrappers::PreconditionLU preconditioner(system_matrix);" << std::endl;

PETScWrappers::PreconditionILU preconditioner(system_matrix);

std::cerr << "PETScWrappers::PreconditionILU preconditioner(system_matrix);" << std::endl;

cg.solve (system_matrix, solution, system_rhs,

preconditioner);

Vector<double> localized_solution (solution);

hanging_node_constraints.distribute (localized_solution);

solution = localized_solution;

return solver_control.last_step();

}

Thanks again.

Pete Griffin

On Monday, July 25, 2016 at 1:59:53 PM UTC-4, Pete Griffin wrote:

MPI-Mem.txt

Pete Griffin

unread,

Jul 27, 2016, 9:38:00 AM7/27/16

to deal.II User Group

Bruno, I posted a reply to your comment. but not within your comment but at top of page. If you did not receive an indication of reply check the original thread.

Pete Griffin

Bruno Turcksin

unread,

Jul 27, 2016, 9:45:53 AM7/27/16

to dea...@googlegroups.com

Pete,

I saw it. I am just busy :-) I will try to take another look later.

Best,

Bruno

> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "deal.II User Group" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/dealii/PRVVkvyfIao/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> dealii+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Pete Griffin

unread,

Jul 27, 2016, 1:23:30 PM7/27/16

to deal.II User Group

Thanks, Bruno.

I'm in no rush. I have other things I can work on too. The only other difference I could find, other than the MPI/PETSc stuff, had to do with solver_control() which should only effect # steps to solution.

I mainly brought it up because the results contrasted what I read in the step-17 documentation.

Pete Griffin

unread,

Jul 29, 2016, 9:41:46 AM7/29/16

to deal.II User Group

I did another plot (see attached) of Memory usage vs. #DOF with step-18. This is, as the other two, a 3d elasticity problem. The results in terms of memory usage were in line with step-8 and contrasted step-17. I will try to understand step-18 well enough to transfer the MPI/PETSc stuff to step-17 with the goal of getting memory usage of step-17 down to expected. Whether the changes it will work on multiprocessor system I will have no way of knowing since I run only on a single processor.

Pete Griffin

On Monday, July 25, 2016 at 1:59:53 PM UTC-4, Pete Griffin wrote:

PeakMemoryMPI_with_step18.jpg

Timo Heister

unread,

Jul 30, 2016, 2:51:19 AM7/30/16

to dea...@googlegroups.com

I think the reason is that step-17 uses an inefficient constructor for
the system matrix:
system_matrix.reinit (mpi_communicator,
dof_handler.n_dofs(),
dof_handler.n_dofs(),
n_local_dofs,
n_local_dofs,
dof_handler.max_couplings_between_dofs());

This will lead to very slow setup and large memory requirements. You
can try something closer to what is used in step-40 (using
DynamicSparsityPattern).

> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---

> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dealii+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Timo Heister
http://www.math.clemson.edu/~heister/

Pete Griffin

unread,

Jul 30, 2016, 7:08:46 AM7/30/16

to deal.II User Group

I used the methodology of step-18 for step-17 which showed memory usage improvements. I have attached a new version of step-17 that allow selecting between the NEW and OLD. I also attached the results from both and another plot. I don't know whether this version will work with more than one processor I run only on 1. It might be helpful if someone would try to run it and report success or failure.

One final thing, I left solution (was incremental_displacement) as a Vector<double> vs, PETScWrappers::MPI::Vector. This may add a little memory and/or slow it down slightly, but for my purposes modifying the program faster was for me better and sufficient.

Thanks All

Pete Griffin

On Monday, July 25, 2016 at 1:59:53 PM UTC-4, Pete Griffin wrote:

MPI_step17_NewOld.png

step-17OldNew.txt

step-17_OldNew.cc

Wolfgang Bangerth

unread,

Jul 31, 2016, 12:52:11 AM7/31/16

to dea...@googlegroups.com

On 07/30/2016 05:08 AM, Pete Griffin wrote:
> I used the methodology of step-18 for step-17 which showed memory usage
> improvements. I have attached a new version of step-17 that allow selecting
> between the NEW and OLD. I also attached the results from both and another
> plot. I don't know whether this version will work with more than one processor
> I run only on 1. It might be helpful if someone would try to run it and report
> success or failure.

Pete -- the two versions are quite different. Have you found out what
*exactly* it is that makes the difference in memory consumption? I would like
to add a comment to this effect to the program.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@math.tamu.edu
www: http://www.math.tamu.edu/~bangerth/

Pete Griffin

unread,

Jul 31, 2016, 1:03:18 PM7/31/16

to deal.II User Group

Hello, Wolfgang.

I looked at the memory consumption of the system_matrix, the system_rhs and solution vectors. The difference appears, as Timo suggested, in the system_matrix.

The difference may be that the NEW version uses a DynamicSparsityPattern while the OLD only guesses on the size with, dof_handler.max_couplings_between_dofs(). Apparently with 3d problems the function overestimates. Presently DynamicSparsityPattern is used in step-8, but, not in step-17.

Thanks

Pete Griffin

=====================================================================

NEW Code

Cycle 4:

Number of active cells: 7484

dof_handler.n_dofs() 29277

solution.size() 29277

system_rhs.size() 29277

system_rhs.local_size() 29277

system_matrix.memory_consumption() 18320628 -> 18 MB

Number of degrees of freedom: 29277 (by partition: 29277)

Solver converged in 48 iterations.

Peak virtual memory: 832 MB, Peak resident memory: 102 MB

=====================================================================

OLD Code

Cycle 4:

Number of active cells: 7484

n_local_dofs 29277

solution.local_size() 29277

dof_handler.n_dofs() 29277

solution.size() 29277

system_rhs.size() 29277

system_rhs.local_size() 29277

system_matrix.memory_consumption() 361866548 -> 361 MB

Number of degrees of freedom: 29277 (by partition: 29277)

Solver converged in 48 iterations.

Peak virtual memory: 1142 MB, Peak resident memory: 414 MB

Martin Kronbichler

unread,

Jul 31, 2016, 2:23:54 PM7/31/16

to dea...@googlegroups.com

Hi Pete,

> The difference may be that the NEW version uses
> a DynamicSparsityPattern while the OLD only guesses on the size
> with, dof_handler.max_couplings_between_dofs(). Apparently with 3d
> problems the function overestimates. Presently DynamicSparsityPattern
> is used in step-8, but, not in step-17.

I definitely agree that you should NEVER use
max_couplings_between_dofs() in 3D. As you see, it overestimates by a
factor of 20 which is also my experience. I actually think we should put
an assertion into this function in 3D that forbids its use in 3D. People
should use dynamic sparsity patterns.

Best,
Martin

Reply all

Reply to author

Forward