step-77 in parallel - SUNDIALS::KINSOL

89 views
Skip to first unread message

jose.a...@gmail.com

unread,
Jun 8, 2022, 1:03:06 PM6/8/22
to deal.II User Group
Dear dealii community,

I am solving a nonlinear problem that has a regularized sign function. The classic Newton-Raphson method converges too slowly near the limit of the regularization parameter. Before of writing a line search algorithm I wanted to try the KINSOL solver out.

I modified step-77 using the parallel scheme I have always used but I get an error when running in parallel apparently at line 386 of kinsol.cc due to how I reinit my solution vector. I follow the reinit commands used in step-32:

solution.reinit(locally_relevant_dofs,
MPI_COMM_WORLD);

distributed_vector.reinit(locally_owned_dofs,
locally_relevant_dofs,
MPI_COMM_WORLD,
true);

It is not clear to me which reinit I should use in this case. I tried the second reinit method with the flag set to false for the solution vector as seen in another step but I then get an access error during the WorkStream::run call.

Attached is the aforementioned modification of step-77.

Cheers,
Jose
step-77_mod.cc

Bruno Turcksin

unread,
Jun 9, 2022, 8:53:56 AM6/9/22
to deal.II User Group
Jose,

What's the error that you get? Is there a message? Is it a segfault?

Best,

Bruno

jose.a...@gmail.com

unread,
Jun 29, 2022, 5:03:33 AM6/29/22
to deal.II User Group
Hi Bruno,

sorry for the late response. I get the following error:

An error occurred in line <1469> of file </calculate/temp/iwtm009/spack-stage-dealii-9.3.3-h7vrbckbqhxjllhyiwpmxjdp3242wjvv/spack-src/include/deal.II/lac/trilinos_vector.h> in function
    dealii::IndexSet dealii::TrilinosWrappers::MPI::Vector::locally_owned_elements() const
The violated condition was:
    owned_elements.size() == size()
Additional information:
    The locally owned elements have not been properly initialized! This
    happens for example if this object has been initialized with exactly
    one overlapping IndexSet.

at the bolded line in this code snippet of kinsol.cc

template <typename VectorType>
unsigned int
KINSOL<VectorType>::solve(VectorType &initial_guess_and_solution)
{
unsigned int system_size = initial_guess_and_solution.size();
// The solution is stored in
// solution. Here we take only a
// view of it.
# ifdef DEAL_II_WITH_MPI
{
const IndexSet is = initial_guess_and_solution.locally_owned_elements();
const unsigned int local_system_size = is.n_elements();
solution =
N_VNew_Parallel(communicator, local_system_size, system_size);
u_scale = N_VNew_Parallel(communicator, local_system_size, system_size);
N_VConst_Parallel(1.e0, u_scale);
f_scale = N_VNew_Parallel(communicator, local_system_size, system_size);
N_VConst_Parallel(1.e0, f_scale);
}
else

which is expected as how I reinit my solution vector and distributed vectors

solution.reinit(locally_relevant_dofs,
MPI_COMM_WORLD);

distributed_vector.reinit(locally_owned_dofs,
locally_relevant_dofs,
MPI_COMM_WORLD,
true);

Nonetheless this is my standard reinit call for my parallel applications following the scheme of step-32. I thought it might had to do with the reinit lambda passed to KINSOL

nonlinear_solver.reinit_vector = [&](dealii::LinearAlgebraTrilinos::MPI::Vector &x)
{
x.reinit(locally_relevant_dofs,
             MPI_COMM_WORLD);
};

but I get the above error even if I use the reinit call for distributed vectors.

So my question is how should one reinit the vectors in parallel applications using KINSOL. As a side question, how should one reinit the vector in the reinit_vector lambda? As the solution vector or as a distributed vector?

Attached is the source code for the modified step-77.

Cheers,
Jose
step-77_mod.cc

Bruno Turcksin

unread,
Jun 29, 2022, 5:01:40 PM6/29/22
to dea...@googlegroups.com
Jose,

First, I have never used KINSOL so I don't know what it requires. With that being said, this  x.reinit(locally_relevant_dofs, MPI_COMM_WORLD); is wrong. Either you only need the locally_owned_dofs and you write x.reinit(locally_owned_dofs, MPI_COMM_WORLD); or you need the locally_relevant_dofs and you need to write x.reinit(locally_owned_dofs, locally_relevant_dofs, MPI_COMM_WORLD); The problem with locally_relevant_dofs is that one degree of freedom will appear on multiple processors so you need locally_owned_dofs to know which processor owns the degree of freedom. This is basically what the error message is telling you. There is an overlap of the locally owned indices, ie two processors own the same index.

Best,

Bruno

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/pRRyUwcIFEk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/6298c47f-4cfb-44f5-ae27-0420701a862en%40googlegroups.com.

jose.a...@gmail.com

unread,
Jun 30, 2022, 6:27:09 AM6/30/22
to deal.II User Group
Hi Bruno,

I have been using those reinit calls for the solution vector and the right hand side in MPI applications following the scheme of step-32 and I had not encountered problems until now. Is that then a typo in the source code of step-32, which just happens to luckily work?

I have tried the following for step-77

solution.reinit(locally_owned_dofs,
locally_relevant_dofs,
MPI_COMM_WORLD);

...

nonlinear_solver.reinit_vector = [&](dealii::LinearAlgebraTrilinos::MPI::Vector &x)
{
x.reinit(locally_owned_dofs,
locally_relevant_dofs,
MPI_COMM_WORLD);
};

and I get

An error occurred in line <665> of file </calculate/temp/iwtm009/spack-stage-dealii-9.3.3-h7vrbckbqhxjllhyiwpmxjdp3242wjvv/spack-src/source/lac/trilinos_vector.cc> in function
    dealii::TrilinosScalar dealii::TrilinosWrappers::MPI::Vector::operator()(dealii::TrilinosWrappers::MPI::Vector::size_type) const
The violated condition was:
    false
Additional information:
    You are trying to access element 127 of a distributed vector, but this
    element is not stored on the current processor. Note: There are 152
    elements stored on the current processor from within the range
    [185,336] but Trilinos vectors need not store contiguous ranges on
    each processor, and not every element in this range may in fact be
    stored locally.
   
    A common source for this kind of problem is that you are passing a
    'fully distributed' vector into a function that needs read access to
    vector elements that correspond to degrees of freedom on ghost cells
    (or at least to 'locally active' degrees of freedom that are not also
    'locally owned'). You need to pass a vector that has these elements as
    ghost entries.

coming from inside the a get_function_gradients() call inside the lambdas passed on to KINSOL

nonlinear_solver.residual =
[&](const dealii::LinearAlgebraTrilinos::MPI::Vector &evaluation_point,
dealii::LinearAlgebraTrilinos::MPI::Vector &residual_vector)
{
assemble_rhs(evaluation_point, residual_vector);

return 0;
};

nonlinear_solver.setup_jacobian =
[&](const dealii::LinearAlgebraTrilinos::MPI::Vector &current_u,
const dealii::LinearAlgebraTrilinos::MPI::Vector &)
{
assemble_system_matrix(current_u);

return 0;
};

The vector being passed to KINSOL is solution (See reinit() above), so it should have ghosted entries. KINSOL then initiates an N_Vector using the local and total sizes of solution for internal use. The same error occurs if I reinit x and solution just with one IndexSex (locally_owned_dofs).

Cheers
Jose

Bruno Turcksin

unread,
Jun 30, 2022, 9:20:47 AM6/30/22
to deal.II User Group
Jose,

Sorry, you can give reinit a single argument to create ghosted vector when using a Trilinos vector. I usually use deal.II's own vector and they behave slightly differently. The problem when you give a single argument is that there is no way to know what is locally owned and thus you cannot call locally_owned_elements(). So you will have to pass the locally_owned IndexSet instead of getting it from the vector.

Best,

Bruno
Reply all
Reply to author
Forward
0 new messages