Integrating a deal.II-specific function in a NOX-Interface for MPI-threads

Maxi Miller

unread,

Apr 5, 2019, 11:47:30 AM4/5/19

to deal.II User Group

I tested the integration of NOX into my program, for future nonlinear solvers. For that, I wanted to re-use the existing function for assembling the right hand side, and wrote the following code for the interface:

template <int dim>
class ProblemInterface : public NOX::Epetra::Interface::Required,
        public NOX::Epetra::Interface::Preconditioner,
        public NOX::Epetra::Interface::Jacobian
{
public:
    ProblemInterface(std::function<void(const TrilinosWrappers::MPI::Vector&, const TrilinosWrappers::MPI::Vector&, TrilinosWrappers::MPI::Vector&)> residual_function,
                     std::function<void(Epetra_CrsMatrix &)> jacobian_function,
                     const MPI_Comm &mpi_communicator,
                     const IndexSet &locally_owned_dofs,
                     const IndexSet &locally_relevant_dofs) :
        residual_function(residual_function),
        jacobian_function(jacobian_function),
        mpi_communicator(mpi_communicator),
        locally_owned_dofs(locally_owned_dofs),
        locally_relevant_dofs(locally_relevant_dofs)
    {
        residual_vec.reinit(locally_owned_dofs, locally_relevant_dofs, mpi_communicator);
        temp_f_vec.reinit(locally_owned_dofs, locally_relevant_dofs, mpi_communicator);
        f_vec.reinit(locally_owned_dofs, locally_relevant_dofs, mpi_communicator);

    };

    ~ProblemInterface(){

    };

    bool computeF(const Epetra_Vector &x, Epetra_Vector &FVec, NOX::Epetra::Interface::Required::FillType)
    {

        Epetra_Vector temp_f_epetra_vec = Epetra_Vector(View, temp_f_vec.trilinos_vector(), 0);
        Epetra_Vector f_epetra_vec = Epetra_Vector(View, f_vec.trilinos_vector(), 0);
        rhs = &FVec;

        int err = 0;
        err = rhs->PutScalar(0.0);
        if(err != 0)
            throw;

        temp_f_vec = 0.;
        residual_vec = 0.;
        //std::cout << f_vec.size() << '\t' << locally_owned_dofs.size() << '\t' << x.MyLength() << '\n';
        f_epetra_vec = x;
        //Fill rhs
        //        Vector<double> f_vec(FVec.MyLength()), residual_vec(FVec.MyLength()), temp_f_vec(FVec.MyLength());
        //        for(size_t i = 0; i < f_vec.size(); ++i)
        //            f_vec[i] = x[i];
        residual_function(temp_f_vec, f_vec, residual_vec);
        FVec = Epetra_Vector(Copy, residual_vec.trilinos_vector(), 0);

        return true;
    }
private:
    std::function<void(const TrilinosWrappers::MPI::Vector&, const TrilinosWrappers::MPI::Vector&, TrilinosWrappers::MPI::Vector&)> residual_function;
    std::function<void(Epetra_CrsMatrix&)> jacobian_function;
    Epetra_Vector *rhs;

    MPI_Comm mpi_communicator;
    IndexSet locally_owned_dofs, locally_relevant_dofs;

    TrilinosWrappers::MPI::Vector f_vec, temp_f_vec, residual_vec;
};

The function "residual_function" is designed as

template <int dim>
void Step5<dim>::calculate_residual(const TrilinosWrappers::MPI::Vector &u, TrilinosWrappers::MPI::Vector &residual)
{
    QGauss<dim> quadrature_formula(fe.degree + 1);

    FEValues<dim> fe_values(fe,
                            quadrature_formula,
                            update_values | update_gradients |
                            update_quadrature_points | update_JxW_values);

    const unsigned int dofs_per_cell = fe.dofs_per_cell;
    const unsigned int n_q_points    = quadrature_formula.size();

    residual.reinit(locally_owned_dofs, locally_relevant_dofs, mpi_communicator);

    std::vector<Tensor<1, dim>> grad_u(quadrature_formula.size());
    std::vector<double> val_u(quadrature_formula.size());

    //pcout << "Create evaluation point\n";
    TrilinosWrappers::MPI::Vector local_evaluation_point = TrilinosWrappers::MPI::Vector(u);

    Vector<double> cell_rhs(dofs_per_cell);
    std::vector<types::global_dof_index> local_dof_indices(dofs_per_cell);
    //pcout << "Looping over cells\n";
    for (const auto &cell : dof_handler.active_cell_iterators())
    {
        if(cell->is_locally_owned())
        {
            cell_rhs    = 0.;

            fe_values.reinit(cell);

            fe_values.get_function_values(local_evaluation_point, val_u);
            fe_values.get_function_gradients(local_evaluation_point, grad_u);

            for (unsigned int q = 0; q < n_q_points; ++q)
            {
                for (unsigned int i = 0; i < dofs_per_cell; ++i)
                {
                    assemble_rhs(cell_rhs(i),
                                 -1,
                                 val_u[q],
                                 grad_u[q],
                                 fe_values.quadrature_point(q),
                                 fe_values.shape_value(i, q),
                                 fe_values.shape_grad(i, q),
                                 fe_values.JxW(q));
                }
            }

            cell->get_dof_indices(local_dof_indices);
            hanging_node_constraints.distribute_local_to_global(cell_rhs,
                                                                local_dof_indices,
                                                                residual);
        }
    }
    residual.compress(VectorOperation::add);
}

with

template <int dim>
void Step5<dim>::assemble_rhs(double &return_value,
                              const double factor,
                              const double &val_u,
                              const Tensor<1, dim> &grad_u,
                              const Point<dim> p,
                              const double &fe_values_value,
                              const Tensor<1, dim> &fe_gradient_value,
                              const double JxW_value)
{
    (void) val_u;
    return_value += factor
            * (coefficient(p)
               * fe_values_value
               + grad_u
               * fe_gradient_value
               )
            * JxW_value;
}

When running everything on one thread, it works fine. Running it on several threads leads to a crash when distributing the cell_rhs-values to the residual values, due to the different sizes (Trilinos-vectors are shrinked down, corresponding to the number of threads, while the deal.II vectors still operate with their full length). Is there a way do solve the problem without having to create two residual-functions?

main.cpp

Maxi Miller

unread,

Apr 25, 2019, 12:13:37 PM4/25/19

to deal.II User Group

After running some tests, I ended up reducing the problem to the transfer to and from the Epetra-vectors. I have to write an interface to the model (according to the instructions), and as shown in the code above. There I have Epetra-Vectors created by my interface, with a size of locally_relevant_dofs.size(), and TrilinosWrappers::MPI::Vectors. Transfer from the Epetra-Vectors to the MPI::Vectors works fine, but the way back does not work (The MPI::Vectors are larger than the Epetra_Vectors). Are there any hints in how I still could fit the data from the MPI::Vector into the Epetra_Vector? Or should I rather ask this on the Trilinos mailing list?

Thanks!

main.cpp

Wolfgang Bangerth

unread,

Apr 25, 2019, 11:12:39 PM4/25/19

to dea...@googlegroups.com

Good question. I think it would probably be very useful to have small testcase
others could look at. The program you have attached is still 1,300 lines,
which is far too much for anyone to understand. But since you have an idea of
what the problem is, do you think you could come up with a small testcase that
illustrates exactly the issue you have? It doesn't have to do anything useful
at all -- in your case, I think all that's necessary is to create a vector,
convert it as you describe above, and then output the sizes to show that they
are not the same. Run this on two processors, and you should have something
that could be done in 50 or 100 lines, and I think that might be small enough
for someone who doesn't know your code to understand what is necessary.

I've built these sorts of testcases either from scratch in the past, or by
just keep removing things from a program that (i) are run after the time the
problem happens, and (ii) then removing everything that is not necessary.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Maxi Miller

unread,

Apr 26, 2019, 3:59:39 AM4/26/19

to deal.II User Group

I tried to reduce it as much as possible while still being able to run, the result is attached. It still has ~600 lines, but I am not sure if I should remove things like the output-function? Furthermore the NOX-logic takes quite a lot of lines, and I would like to keep the program as usable as possible while removing unneccessary lines. Is that example usable?

main.cpp

Wolfgang Bangerth

unread,

Apr 26, 2019, 11:00:58 AM4/26/19

to dea...@googlegroups.com

On 4/26/19 1:59 AM, 'Maxi Miller' via deal.II User Group wrote:
> I tried to reduce it as much as possible while still being able to run,
> the result is attached. It still has ~600 lines, but I am not sure if I
> should remove things like the output-function? Furthermore the NOX-logic
> takes quite a lot of lines, and I would like to keep the program as
> usable as possible while removing unneccessary lines. Is that example
> usable?

Remove everything you can. If you get an error at one point, no output
will be created -- so remove the output function. If the error happens
before the linear system is solved, remove the code that computes the
entries of the matrix. Then remove the matrix object itself. If you are
currently solving a linear system before the error happens, try what
happens if you just comment out the solution step -- and if the error
still happens (because it probably doesn't depend on the actual values
of the vector), then remove the solution step and the assembly of the
matrix and the matrix. If you need a DoFHandler to set up the vector,
output the index sets and sizes you use for the processors involved and
build these index sets by hand instead -- then remove the DoFHandler and
the finite element and everything else related to this. Continue this
style of work until you really do have a small testcase :-)

Cheers

Maxi Miller

unread,

Apr 26, 2019, 4:28:10 PM4/26/19

to deal.II User Group

I think the current version is the smallest version I can get. It will work for a single thread, but not for two or more threads.

One observation I made was that if I not only copy the locally owned values, but also the locally relevant values from the deal.II-vector to the Epetra_Vector, the solver still does not converge, but the result is correct. When changing line 159 from locally_owned_elements to locally_relevant_elements, I get an access error, though, so there must be a different way to achieve that.

main.cpp

Wolfgang Bangerth

unread,

Apr 26, 2019, 11:43:47 PM4/26/19

to dea...@googlegroups.com

On 4/26/19 2:28 PM, 'Maxi Miller' via deal.II User Group wrote:
> I think the current version is the smallest version I can get. It will work
> for a single thread, but not for two or more threads.

I tried to run this on my machine, but I don't have NOX as part of my Trilinos
installation. That might have to wait for next week.

But I think there is still room for making the program shorter. The point is
that the parts of the program that run before the error happens do not
actually have to make sense. For example, could you replace your own boundary
values class by ZeroFunction? Instead of assembling something real, could you
just use the identity matrix on each cell?

Other things you don't need: Timers, convergence tables, etc. Which of the
parameters you set in solve_for_NOX() are necessary? (Necessary to reproduce
the bug; I know they're necessary for your application, but that's irrelevant
here.) Could you replace building the residual by just putting random stuff in
the vector?

Similarly, the computeJacobian and following functions really just check
conditions, but when you run the program to illustrate the bug you're chasing,
do these checks trigger? If not, remove them, then remove the calls to these
functions (because they don't do anything any more), then remove the whole
function.

I think there's still room to make this program smaller by at least a factor
of 2 or 3 or 4 :-)

Best

Maxi Miller

unread,

Apr 29, 2019, 4:19:44 AM4/29/19

to deal.II User Group

The functions computePreconditioner and computeJacobian are necessary, else I will get compilation problems. Thus I think the current version is the best MWE I can get. Running with one thread gives a "Test passed", using more threads will result in "Convergence failed".

Maxi Miller

unread,

Apr 29, 2019, 4:20:13 AM4/29/19

to deal.II User Group

Now with file...

main.cpp

Maxi Miller

unread,

Apr 29, 2019, 9:35:11 AM4/29/19

to deal.II User Group

Could get it to work, now the code works equally fine for one or several MPI threads. Still, the next open question is how to apply non-zero Dirichlet boundary conditions (as in the file, for cos(pi*x)*cos(pi*y). When applying them as I do in the file, I get the warning that the solver is not converged. When using Neumann-Conditions, the solver converges and I get the correct output. What would be the solution for that?

Thanks!

main.cpp

Wolfgang Bangerth

unread,

Apr 29, 2019, 4:36:55 PM4/29/19

to dea...@googlegroups.com

On 4/29/19 7:35 AM, 'Maxi Miller' via deal.II User Group wrote:
> Could get it to work, now the code works equally fine for one or several
> MPI threads.

What did you have to do? I was going to reproduce your problem today by
installing a Trilinos version that has NOX enabled. Is this moot now,
i.e., was it a bug in your code or did you just work around the issue in
some way that doesn't expose it?

> Still, the next open question is how to apply non-zero
> Dirichlet boundary conditions (as in the file, for cos(pi*x)*cos(pi*y).
> When applying them as I do in the file, I get the warning that the
> solver is not converged. When using Neumann-Conditions, the solver
> converges and I get the correct output. What would be the solution for that?

The question to ask is who is right. When you get the message that the
solver is not converged, is this correct? That is, does it converge if
you allow more iterations? What if you use a case where you know the
exact solution -- is the computed solution actively wrong?

Maxi Miller

unread,

Apr 29, 2019, 4:48:59 PM4/29/19

to deal.II User Group

Am Montag, 29. April 2019 22:36:55 UTC+2 schrieb Wolfgang Bangerth:

On 4/29/19 7:35 AM, 'Maxi Miller' via deal.II User Group wrote:
> Could get it to work, now the code works equally fine for one or several
> MPI threads.

What did you have to do? I was going to reproduce your problem today by
installing a Trilinos version that has NOX enabled. Is this moot now,
i.e., was it a bug in your code or did you just work around the issue in
some way that doesn't expose it?

NOX expects vectors containing only the local elements, but no ghost elements. Thus I had to initialize all vectors going in or out from any NOX-related function using locally_owned_dofs, and copy accordingly if external vectors contained ghost elements.

> Still, the next open question is how to apply non-zero
> Dirichlet boundary conditions (as in the file, for cos(pi*x)*cos(pi*y).
> When applying them as I do in the file, I get the warning that the
> solver is not converged. When using Neumann-Conditions, the solver
> converges and I get the correct output. What would be the solution for that?

The question to ask is who is right. When you get the message that the
solver is not converged, is this correct? That is, does it converge if
you allow more iterations? What if you use a case where you know the
exact solution -- is the computed solution actively wrong?

The solver does not converge, and the output looks as if it is using Dirichlet-conditions with u = 0, independently of the "real" boundary conditions.

Wolfgang Bangerth

unread,

Apr 29, 2019, 11:59:05 PM4/29/19

to dea...@googlegroups.com

Maxi,

> What did you have to do? I was going to reproduce your problem today by
> installing a Trilinos version that has NOX enabled. Is this moot now,
> i.e., was it a bug in your code or did you just work around the issue in
> some way that doesn't expose it?
>
> NOX expects vectors containing only the local elements, but no ghost elements.
> Thus I had to initialize all vectors going in or out from any NOX-related
> function using locally_owned_dofs, and copy accordingly if external vectors
> contained ghost elements.

I see. So this is a NOX requirement, not something that we could have done
anything about on the deal.II side?

> The solver does not converge, and the output looks as if it is using
> Dirichlet-conditions with u = 0, independently of the "real" boundary conditions.

I don't know NOX, but is it using an update that it adds to the solution in
each step? If so, you need to have the correct boundary conditions in the
initial guess already. What happens if you only allow NOX to do zero or one
iterations?

Maxi Miller

unread,

May 1, 2019, 7:58:26 AM5/1/19

to deal.II User Group

Am Dienstag, 30. April 2019 05:59:05 UTC+2 schrieb Wolfgang Bangerth:

Maxi,

> What did you have to do? I was going to reproduce your problem today by
> installing a Trilinos version that has NOX enabled. Is this moot now,
> i.e., was it a bug in your code or did you just work around the issue in
> some way that doesn't expose it?
>
> NOX expects vectors containing only the local elements, but no ghost elements.
> Thus I had to initialize all vectors going in or out from any NOX-related
> function using locally_owned_dofs, and copy accordingly if external vectors
> contained ghost elements.

I see. So this is a NOX requirement, not something that we could have done
anything about on the deal.II side?

No, as far as I understand. NOX needs those vectors to be in the correct way, else it will not work.

> The solver does not converge, and the output looks as if it is using
> Dirichlet-conditions with u = 0, independently of the "real" boundary conditions.

I don't know NOX, but is it using an update that it adds to the solution in
each step? If so, you need to have the correct boundary conditions in the
initial guess already. What happens if you only allow NOX to do zero or one
iterations?

Best
W.

Zero iterations is not allowed, the first iteration already goes way off (expected highest value: 1, calculated value: 1e6), regardless if I set boundary conditions already to the solution vector I feed to NOX, or if I set the boundary conditions during the assembly of the right hand side.

Wolfgang Bangerth

unread,

May 1, 2019, 11:28:23 AM5/1/19

to dea...@googlegroups.com

On 5/1/19 5:58 AM, 'Maxi Miller' via deal.II User Group wrote:
>
> I don't know NOX, but is it using an update that it adds to the
> solution in
> each step? If so, you need to have the correct boundary conditions
> in the
> initial guess already. What happens if you only allow NOX to do zero
> or one
> iterations?
>

> Zero iterations is not allowed, the first iteration already goes way off
> (expected highest value: 1, calculated value: 1e6), regardless if I set
> boundary conditions already to the solution vector I feed to NOX, or if
> I set the boundary conditions during the assembly of the right hand side.

I don't know NOX at all, so I'm afraid there is little else I can to
help you with this. As always, make the problem small and easy (e.g.,
try to find a constant solution where the nonlinear solver only has to
find *which* constant value you are looking for).

Best
W.

Reply all

Reply to author

Forward