long duration of the setup of step-40 like program

31 views
Skip to first unread message

Marek Čapek

unread,
Jul 28, 2016, 5:26:48 PM7/28/16
to deal.II User Group
Hello,
I am developing code in the Step-40 like setting - Cahn-
Hilliard Navier Stokes system. I am witnessing very poor
performance of startup for large problems.
For the system with
1458 dof for Cahn Hilliard
47000 dofs for Navier Stoke

I got the following results (4 computer cores)

Setup CH, wall time: 0.1601s.
Setup NS, wall time: 29.9953s.
Setup, wall time: 30.1752s.

The times deteriorate if I solve larger problems.
I guess, that there is something wrong with my setup function
for the Navier-Stokes part of the  system.
Could You help me please to find out, where is the performance flaw?
I use   namespace ::LinearAlgebraTrilinos      for backend of computations.


   LA::MPI::Vector locally_relevant_solution_nse;
    LA::MPI::Vector old_solution_nse;
LA::MPI::Vector solution_nse;
IndexSet locally_owned_dofs_nse;
    IndexSet locally_relevant_dofs_nse;


{
        computing_timer.enter_subsection("Setup NS");
        dof_handler_nse.distribute_dofs(fe_nse);
        DoFRenumbering::Cuthill_McKee(dof_handler_nse);

        locally_owned_dofs_nse = dof_handler_nse.locally_owned_dofs();
        DoFTools::extract_locally_relevant_dofs(dof_handler_nse,
                locally_relevant_dofs_nse);

        locally_relevant_solution_nse.reinit(locally_owned_dofs_nse,
                locally_relevant_dofs_nse, mpi_communicator);

        old_solution_nse.reinit(locally_owned_dofs_nse,
                locally_relevant_dofs_nse, mpi_communicator);
        old_solution_nse = 0;

        solution_nse.reinit(locally_owned_dofs_nse, locally_relevant_dofs_nse,
                mpi_communicator);
 
        system_rhs_nse.reinit(locally_owned_dofs_nse, mpi_communicator);
        system_rhs_nse = 0;
      
        constraint_matrix_nse.clear();
        constraint_matrix_nse.reinit(locally_relevant_dofs_nse);
        DoFTools::make_hanging_node_constraints(dof_handler_nse,
                constraint_matrix_nse);
        std::vector<bool> component_mask(dim + 1, false);
        for (int i = 0; i < dim; ++i)
            component_mask[i] = true; // velocities

        VectorTools::interpolate_boundary_values(dof_handler_nse, 0,
                ConstantFunction<dim>(0., dim + 1), constraint_matrix_nse,
                component_mask);

        VectorTools::interpolate_boundary_values(dof_handler_nse, 1,
                InflowVelocityBoundaryValues<dim>(), constraint_matrix_nse,
                component_mask);  
        constraint_matrix_nse.close();

        LA::Vector<double> vec_old_solution(dof_handler_nse.n_dofs());

        VectorTools::interpolate(dof_handler_nse, ZeroFunction<3>(dim + 1),
                vec_old_solution);

        old_solution_nse = vec_old_solution;


        DynamicSparsityPattern csp(locally_relevant_dofs_nse);
     
        DoFTools::make_sparsity_pattern(dof_handler_nse, csp,
                constraint_matrix_nse, false);


        SparsityTools::distribute_sparsity_pattern(csp,
                dof_handler_nse.n_locally_owned_dofs_per_processor(),
                mpi_communicator, locally_relevant_dofs_nse);

        system_matrix_nse.reinit(locally_owned_dofs_nse, locally_owned_dofs_nse,
                csp, mpi_communicator);
             
        computing_timer.leave_subsection();
    }


Thank You

Marek C

Timo Heister

unread,
Jul 29, 2016, 2:05:35 AM7/29/16
to dea...@googlegroups.com
> I am developing code in the Step-40 like setting - Cahn-
> Hilliard Navier Stokes system. I am witnessing very poor
> performance of startup for large problems.
> For the system with
> 1458 dof for Cahn Hilliard
> 47000 dofs for Navier Stoke
>
> I got the following results (4 computer cores)

Debug or release mode? How does that compare to setup for step-40 or
step-32 with a similar problem size?

It would also be helpful to know, which part of your setup is slow.
You can find out by adding more timing sections.

> LA::Vector<double> vec_old_solution(dof_handler_nse.n_dofs());
>
> VectorTools::interpolate(dof_handler_nse, ZeroFunction<3>(dim + 1),
> vec_old_solution);
>
> old_solution_nse = vec_old_solution;

Are you using a serial vector here, interpolate a zero function, and
then copy it over? That will be slow of course. You could just write
"old_solution_nse=0" instead.


--
Timo Heister
http://www.math.clemson.edu/~heister/

Vinetou Incucuna

unread,
Jul 29, 2016, 4:17:43 AM7/29/16
to dea...@googlegroups.com
Hello,
thank You for response and the advices.


>>         LA::Vector<double> vec_old_solution(dof_handler_nse.n_dofs());
>>
>>         VectorTools::interpolate(dof_handler_nse, ZeroFunction<3>(dim + 1),
>>                 vec_old_solution);
>>
>>         old_solution_nse = vec_old_solution;
>
> Are you using a serial vector here, interpolate a zero function, and
> then copy it over? That will be slow of course. You could just write
> "old_solution_nse=0" instead.

It doesnt take such a long time, but I will try it, see below



> It would also be helpful to know, which part of your setup is slow.
> You can find out by adding more timing sections.

I have added timing subsections:
 
  computing_timer.enter_subsection("Setup NS");

        computing_timer.enter_subsection("NS renumbering");
            dof_handler_nse.distribute_dofs(fe_nse);
            DoFRenumbering::Cuthill_McKee(dof_handler_nse);
        computing_timer.leave_subsection();


        locally_owned_dofs_nse = dof_handler_nse.locally_owned_dofs();
        DoFTools::extract_locally_relevant_dofs(dof_handler_nse,
                locally_relevant_dofs_nse);

        locally_relevant_solution_nse.reinit(locally_owned_dofs_nse,
                locally_relevant_dofs_nse, mpi_communicator);

        old_solution_nse.reinit(locally_owned_dofs_nse,
                locally_relevant_dofs_nse, mpi_communicator);
        old_solution_nse = 0;

        solution_nse.reinit(locally_owned_dofs_nse, locally_relevant_dofs_nse,
                mpi_communicator);

        system_rhs_nse.reinit(locally_owned_dofs_nse, mpi_communicator);
        system_rhs_nse = 0;


        computing_timer.enter_subsection("constraint matrix preparation");

            constraint_matrix_nse.clear();
            constraint_matrix_nse.reinit(locally_relevant_dofs_nse);
            DoFTools::make_hanging_node_constraints(dof_handler_nse,
                    constraint_matrix_nse);
            std::vector<bool> component_mask(dim + 1, false);
            for (int i = 0; i < dim; ++i)
                component_mask[i] = true; // velocities

            VectorTools::interpolate_boundary_values(dof_handler_nse, 0,
                    ConstantFunction<dim>(0., dim + 1), constraint_matrix_nse,
                    component_mask);

            VectorTools::interpolate_boundary_values(dof_handler_nse, 1,
                    InflowVelocityBoundaryValues<dim>(), constraint_matrix_nse,
                    component_mask);

            constraint_matrix_nse.close();
        computing_timer.leave_subsection();


        computing_timer.enter_subsection("interpolation zero nse solution");

            LA::Vector<double> vec_old_solution(dof_handler_nse.n_dofs());

            VectorTools::interpolate(dof_handler_nse, ZeroFunction<3>(dim + 1),
                    vec_old_solution);

            old_solution_nse = vec_old_solution;
        computing_timer.leave_subsection();


        computing_timer.enter_subsection("making sparsity pattern");

            DynamicSparsityPattern csp(locally_relevant_dofs_nse);
   
            DoFTools::make_sparsity_pattern(dof_handler_nse, csp,
                    constraint_matrix_nse, false);


            SparsityTools::distribute_sparsity_pattern(csp,
                    dof_handler_nse.n_locally_owned_dofs_per_processor(),
                    mpi_communicator, locally_relevant_dofs_nse);


        computing_timer.leave_subsection();

        computing_timer.enter_subsection("reinit of system matrix");

            system_matrix_nse.reinit(locally_owned_dofs_nse, locally_owned_dofs_nse,
                    csp, mpi_communicator);
        computing_timer.leave_subsection();



    computing_timer.leave_subsection();

The Navier-Stokes system has ~47000 dofs
I have compiled it in release mode, I ran it on 4-cores
Results from the log:

constraint matrix preparation, wall time: 6.41976s.
interpolation zero nse solution, wall time: 0.00220919s.
making sparsity pattern, wall time: 0.0638468s.
reinit of system matrix, wall time: 0.222504s.
Setup NS, wall time: 7.57601s.
Setup, wall time: 7.72225s.        (together with Cahn-Hilliard system setting)

So, apparently the constraint matrix preparation is dominant.
Maybe I am doing  something wrong

I will try to prepare some comparison with Step-40, Step-33 as you proposed.

Thank You

Marek

Vinetou Incucuna

unread,
Jul 30, 2016, 4:28:13 AM7/30/16
to dea...@googlegroups.com
Hello,
I believe I solved partially the problem, do
not bother to answer the thread without my further
specification.

M

Jean-Paul Pelteret

unread,
Jul 30, 2016, 6:09:31 AM7/30/16
to deal.II User Group
Hi Marek,

If you do manage to determine an acceptable solution, perhaps you'd be willing to post the solution for other to reference at some later stage?

Thanks,
J-P

Vinetou Incucuna

unread,
Jul 30, 2016, 7:06:30 AM7/30/16
to dea...@googlegroups.com
Hello,
problem was,
  • that I have some meaningless calls for
/*    current_solution_phase.reinit(locally_owned_dofs_phase,
                locally_relevant_dofs_phase, mpi_communicator);
        current_solution_phase = 0;
*/
        /*solution_update.reinit(locally_owned_dofs_phase,
                locally_relevant_dofs_phase, mpi_communicator);
        solution_update = 0;

These vectors remained in my solver from the  previous development.
  • I ran the setup in the debug mode, after the switch to release mode
    I reached the following results
started on 4x24 cores
549250 dofs for Cahn Hilliard system

dof renumbering, wall time: 0.194882s.
reinit of system matrix, wall time: 0.025027s.
Setup CH, wall time: 0.464638s.

21841796 dofs for Navier-Stokes system
constraint matrix preparation, wall time: 92.1981s.
        computing_timer.enter_subsection("constraint matrix preparation");
        constraint_matrix_nse.clear();
        constraint_matrix_nse.reinit(locally_relevant_dofs_nse);
        DoFTools::make_hanging_node_constraints(dof_handler_nse,
                constraint_matrix_nse);
        std::vector<bool> component_mask(dim + 1, false);
        for (int i = 0; i < dim; ++i)
            component_mask[i] = true; // velocities

        VectorTools::interpolate_boundary_values(dof_handler_nse, 0,
                ConstantFunction<dim>(0., dim + 1), constraint_matrix_nse,
                component_mask);

        VectorTools::interpolate_boundary_values(dof_handler_nse, 1,
                InflowVelocityBoundaryValues<dim>(), constraint_matrix_nse,
                component_mask);

        /*    std::set<types::boundary_id> bound_set;
         bound_set.insert(1);

         VectorTools::compute_no_normal_flux_constraints(dof_handler_nse, 0,
         bound_set, constraint_matrix_nse);
         */
        constraint_matrix_nse.close();
computing_timer.leave_subsection();



making sparsity pattern, wall time: 3.96472s.
reinit of system matrix NS, wall time: 3.58887s.
Setup NS, wall time: 134.87s.

Vinetou Incucuna

unread,
Jul 30, 2016, 7:09:33 AM7/30/16
to dea...@googlegroups.com
ah, sorry for interruption.
From the log it is evident, than the Navier-Stokes
equation setup times are reasonable and that the
constraint matrix preparation, including Boundary
conditions.
Is this evaluation sound, should I strive for better times?

Thank You


Marek

Vinetou Incucuna

unread,
Jul 30, 2016, 7:13:44 AM7/30/16
to dea...@googlegroups.com
Sorry, I wanted to say, that the imposition of boundary
conditions in the setup is apparently dominant.
Is this the general case in your applications?

Thank You

Marek
Reply all
Reply to author
Forward
0 new messages