Using LinearAlgebraTrilinos::MPI::Vector.l2_norm leads to an MPI_error for multiple nodes

49 views
Skip to first unread message

Maxi Miller

unread,
Sep 7, 2017, 5:01:10 PM9/7/17
to deal.II User Group
I tried to implement the example from step 15, but using MPI, so that I can get a better grasp on how to write such programs. My residual calculation looks quite similar to the original:

		
const QGauss<dim> quadrature_formula(fe.degree+1); FEValues<dim> fe_values (fe, quadrature_formula,
 update_gradients
|
 update_quadrature_points
|
 update_JxW_values
);
 

 
const unsigned int dofs_per_cell = fe.dofs_per_cell;
 
const unsigned int n_q_points = quadrature_formula.size();
 

 
Vector<double> cell_residual (dofs_per_cell);
 std
::vector<Tensor<1, dim> > gradients(n_q_points);
 

 std
::vector<types::global_dof_index> local_dof_indices (dofs_per_cell);
 

 print_status_update
(std::string("Starting looping over cells in residual\n"), false);
 
for (auto cell = dof_handler.begin_active(); cell!=dof_handler.end(); ++cell)
 
{
 
if(cell->is_locally_owned())
 
{
 cell_residual
= 0;
 fe_values
.reinit (cell);
 
 fe_values
.get_function_gradients (evaluation_point,
 gradients
);
 

 

 
for (unsigned int q_point=0; q_point<n_q_points; ++q_point)
 
{
 
const double coeff = 1/std::sqrt(1 +
 gradients
[q_point] *
 gradients
[q_point]);
 

 
for (unsigned int i = 0; i < dofs_per_cell; ++i)
 cell_residual
(i) -= (fe_values.shape_grad(i, q_point)
 
* coeff
 
* gradients[q_point]
 
* fe_values.JxW(q_point));
 
}
 

 cell
->get_dof_indices (local_dof_indices);
 hanging_node_constraints
.distribute_local_to_global(cell_residual, local_dof_indices, residual);
 
}
 
}

 

 residual
.compress(VectorOperation::add);//Have to call it after "distribute_local_to_global
 print_status_update
(std::string("Setting boundary dofs in residual calculation\n"), true);
 std
::vector<bool> boundary_dofs (dof_handler.n_locally_owned_dofs());
 
DoFTools::extract_boundary_dofs (dof_handler,
 
ComponentMask(),
 boundary_dofs
);
 print_status_update
(std::string("Boundary dofs extracted\n"), true);
 print_status_update
(std::string("Residual size is " + std::to_string(residual.size()) + " and boundary dofs size is " + std::to_string(boundary_dofs.size()) + "\n"), true);
 
for (unsigned int i=0; i<dof_handler.n_locally_owned_dofs(); ++i)
 
if (boundary_dofs[i] == true)
 residual
(i) = 0;
 residual
.compress(VectorOperation::insert);//Have to call it after setting the boundary elements
 

 print_status_update
(std::string("Returning l2 norm: " + std::to_string(residual.l2_norm()) + "\n"), true);//Crash here
 
// At the end of the function, we return the norm of the residual:
 
return residual.l2_norm();

At the noted line (when trying to access the l2_norm) I get the error
ERROR: Uncaught exception in MPI_InitFinalize on proc 0. Skipping MPI_Finalize() to avoid a deadlock.


----------------------------------------------------
Exception on processing:  
 
--------------------------------------------------------
An error occurred in line <1774> of file </opt/dealII/include/deal.II/lac/trilinos_vector.h> in function
    dealii
::TrilinosWrappers::MPI::Vector::real_type dealii::TrilinosWrappers::MPI::Vector::l2_norm() const
The violated condition was:  
    ierr
== 0
Additional information:  
   
An error with error number -1 occurred while calling a Trilinos function
--------------------------------------------------------
 
Aborting!
----------------------------------------------------
ERROR
: Uncaught exception in MPI_InitFinalize on proc 1. Skipping MPI_Finalize() to avoid a deadlock.
 
 
----------------------------------------------------
Exception on processing:  
 
--------------------------------------------------------
An error occurred in line <1774> of file </opt/dealII/include/deal.II/lac/trilinos_vector.h> in function
    dealii
::TrilinosWrappers::MPI::Vector::real_type dealii::TrilinosWrappers::MPI::Vector::l2_norm() const
The violated condition was:  
    ierr
== 0
Additional information:  
   
An error with error number -1 occurred while calling a Trilinos function
--------------------------------------------------------
 
Aborting!
----------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non
-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one
or more processes exited with non-zero status, thus causing
the job to be terminated
. The first process to do so was:
 
 
Process name: [[51822,1],0]
 
Exit code:    1
--------------------------------------------------------------------------

if running with more than one node (it works fine with one node).
I already know that I have to call l2_norm() in all the nodes I am running (as stated here: https://www.dealii.org/8.4.0/doxygen/deal.II/classTrilinosWrappers_1_1MPI_1_1Vector.html), and that I should call "compress()" before doing so, after I am setting certain elements in the vector, in order to prevent an MPI error (as described here: https://www.dealii.org/8.4.0/doxygen/deal.II/classTrilinosWrappers_1_1MPI_1_1Vector.html and here: https://www.dealii.org/8.4.0/doxygen/deal.II/classTrilinosWrappers_1_1VectorBase.html#af7b7a23734c0202578e8dd1421e8af5b). But nevertheless I still get that error. Which point did I forget?

Thanks!

Maxi Miller

unread,
Sep 8, 2017, 8:00:04 AM9/8/17
to deal.II User Group
In addition: It fails when declaring the vector residual as global vector via
LinearAlgebraTrilinos::MPI::Vector residual;
IndexSet solution_relevant_partitioning(dof_handler.n_dofs());
DoFTools::extract_locally_relevant_dofs(dof_handler, solution_relevant_partitioning);
residual.reinit(solution_relevant_partitioning, MPI_COMM_WORLD);
but works when declaring it locally in the function with 
LinearAlgebraTrilinos::MPI::Vector local_residual(dof_handler.locally_owned_dofs(), MPI_COMM_WORLD);

What is the difference here?

Bruno Turcksin

unread,
Sep 8, 2017, 8:23:27 AM9/8/17
to deal.II User Group
Hi,


On Thursday, September 7, 2017 at 5:01:10 PM UTC-4, Maxi Miller wrote:

 for (unsigned int i=0; i<dof_handler.n_locally_owned_dofs(); ++i)
 
if (boundary_dofs[i] == true)
 residual
(i) = 0;

This looks wrong. It shouldn't be residual(i) = 0; because there is only one residual(0) This code should throw in debug mode, are you running in release mode? If you do, I strongly suggest that you run in debug mode, you will catch early a lot of error.

Best,

Bruno

Maxi Miller

unread,
Sep 8, 2017, 8:35:55 AM9/8/17
to deal.II User Group
Running in debug mode -> make debug, or using deal in Debug configuration? If the former, it did not show anything for that. If the latter, I thought I compiled deal.II both in Release and Debug (ReleaseDebug in CMake).
Nevertheless, it did not fix the problem...
Thanks!

Bruno Turcksin

unread,
Sep 8, 2017, 8:48:34 AM9/8/17
to dea...@googlegroups.com
2017-09-08 8:35 GMT-04:00 'Maxi Miller' via deal.II User Group
<dea...@googlegroups.com>:
> Nevertheless, it did not fix the problem...
This won't fix any thing. I am just surprise that your code goes as
far as the l2_norm(), I think that it should crash earlier. What
happens if you comment these line

for (unsigned int i=0; i<dof_handler.n_locally_owned_dofs(); ++i)
if (boundary_dofs[i] == true)
residual(i) = 0;
residual.compress(VectorOperation::insert);//Have to call it after
setting the boundary elements


Best,

Bruno

Maxi Miller

unread,
Sep 8, 2017, 9:44:36 AM9/8/17
to deal.II User Group
Now it fails at the line 
evaluation_point.add (alpha, newton_update);

with the comment 
--------------------------------------------------------
An error occurred in line <1927> of file </opt/dealII/include/deal.II/lac/trilinos_vector.h> in function
   
void dealii::TrilinosWrappers::MPI::Vector::add(dealii::TrilinosScalar, const dealii::TrilinosWrappers::MPI::Vector&)
The violated condition was:  
   
!has_ghost_elements()
Additional information:  
   
You are trying an operation on a vector that is only allowed if the vector has no ghost elements, but the vector you are operating on does have ghost elements. Specifically, vectors with ghost elements are read-only and cannot appear in operations that write into the
se vectors
.
 
See the glossary entry on 'Ghosted vectors' for more information.
 
Stacktrace:
-----------
#0  main: dealii::TrilinosWrappers::MPI::Vector::add(double, dealii::TrilinosWrappers::MPI::Vector const&)
#1  main: Step15::MinimalSurfaceProblem<2>::compute_residual(double)
#2  main: Step15::MinimalSurfaceProblem<2>::run()
#3  main: main
--------------------------------------------------------
 
Calling MPI_Abort now.
To break execution in a GDB session, execute 'break MPI_Abort' before running. You can also put the following into your ~/.gdbinit:
 
set breakpoint pending on
 
break MPI_Abort
 
set breakpoint pending auto
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank
1 in communicator MPI_COMM_WORLD
with errorcode 255.

Looks as if I have to check both vectors for ghost elements before...
Thanks!
Reply all
Reply to author
Forward
0 new messages