Question about (parallel) linear solver

49 views
Skip to first unread message

Konrad

unread,
Sep 11, 2019, 9:04:33 AM9/11/19
to deal.II User Group
Dear deal.ii community,

Apologies for so many questions lately... but here is another one. This one is a bit technical and aims at good solvers. In a simplified form you can formulate it this way:

I implemented a PDE solver with a for the 3D vector Laplacian curl(curl u) - grad(div u) = f. I use a Nedelec-Raviart-Thomas pairing to solve the mixed form which reads
sigma - curl u = 0
curl sigma - grad(div u) = f

You can think of this as a Helmholtz decomposition of the vector field f. I am using Nedelec elements for sigma and Raviart-Thomas elements for u. Fine so far. Besides the fact that this is technically not so easy when it comes to certain domains with holes etc (forget about non-trivial harmonic forms if this tells you something - I don't have that here) it seems quite challenging to come up with a good solver (what is "good" may depend on f as well).

The system that needs to be solved formally reads
(P)   A sigma + B^T u = 0
        B sigma + C u = f

A is sort of a mass matrix so it is positive (corresponds to <tau,sigma> in H(curl)) and C is positive semi-definite (corresponds to <div v, div u> in H(div)).

First thing that crossed my mind was a Schur complement solver. Now one a priori has two options:

1.) Solve first for u and then for sigma, i.e. invert A and use the Schur complement S = -B*A^{-1}*B^T + C
2.) The other way around S = -B^T*C^{-1}*B + A. My gut feeling tells me that this could actually turn out to be not an option since either C is very ill-conditioned or not invertible at all. It is symmetric and contains at least one zero eigenvalue. Thinking of it: it formally corresponds to C~(grad div)^{-1} (and I am of course having convergence issues)

Using approach 1.) I get timing results like that:
(using MPI with Petsc, AMG preconditioning for A on one machine with four cores, no preconditioning for the Schur complement S = -B*A^{-1}*B^T + C from 1.)

Running using PETSc.
Number of active cells: 32768
Total number of cells: 21449 (on 6 levels)
Number of degrees of freedom: 205920 (104544+101376)
  Iterative Schur complement solver converged in 177 iterations.
  Outer solver completed.


+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |        37s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Schur complement solver (for u) |         1 |      27.2s |        74% |
| assembly                        |         1 |      6.61s |        18% |
| mesh generation                 |         1 |     0.163s |      0.44% |
| outer CG solver (for sigma)     |         1 |     0.136s |      0.37% |
| system and constraint setup     |         1 |      1.15s |       3.1% |
| vtu output                      |         1 |      1.68s |       4.5% |
+---------------------------------+-----------+------------+------------+


You will notice that the time for the Schur complement solver is dominating. So either I did something wrong in the implementation conceptually (see class for Schur complement below) or I need to do a better job in solving/preconditioning the system. Note that the Schur complement is usually not definite in contrast to Stokes or Darcy systems.

Does anyone have an idea of how to attack systems like (P)? Are there optimal preconditioners? Any hint would be appreciated. Are there relevant tutorials? Parallelization is an issue since I would like to solve very large problems in 3D.

Thanks in advance and best regards,
Konrad :-)

Here the Schur complement code:

template <typename BlockMatrixType, typename VectorType, typename PreconditionerType>
class SchurComplementMPI : public Subscriptor
{
private:
using BlockType = typename BlockMatrixType::BlockType;

public:
SchurComplementMPI (const BlockMatrixType &system_matrix,
   const InverseMatrix<BlockType, PreconditionerType> &relevant_inverse_matrix,
   const std::vector<IndexSet> &owned_partitioning,
   MPI_Comm mpi_communicator);

void vmult (VectorType       &dst,
  const VectorType &src) const;

private:
const SmartPointer<const BlockMatrixType > system_matrix;
const SmartPointer<const InverseMatrix<BlockType, PreconditionerType>> relevant_inverse_matrix;

const std::vector<IndexSet>& owned_partitioning;

MPI_Comm mpi_communicator;

mutable VectorType tmp1, tmp2, tmp3;
};


template <typename BlockMatrixType, typename VectorType, typename PreconditionerType>
SchurComplementMPI<BlockMatrixType, VectorType, PreconditionerType>::SchurComplementMPI(
const BlockMatrixType &system_matrix,
const InverseMatrix<BlockType, PreconditionerType> &relevant_inverse_matrix,
const std::vector<IndexSet> &owned_partitioning,
MPI_Comm mpi_communicator)
:
system_matrix (&system_matrix),
relevant_inverse_matrix (&relevant_inverse_matrix),
owned_partitioning(owned_partitioning),
mpi_communicator(mpi_communicator),
tmp1 (owned_partitioning[0],
mpi_communicator),
tmp2 (owned_partitioning[0],
mpi_communicator),
tmp3 (owned_partitioning[1],
mpi_communicator)
{}


template <typename BlockMatrixType, typename VectorType, typename PreconditionerType>
void
SchurComplementMPI<BlockMatrixType, VectorType, PreconditionerType>::vmult(
VectorType       &dst,
const VectorType &src) const
{
system_matrix->block(0,1).vmult (tmp1, src);
relevant_inverse_matrix->vmult (tmp2, tmp1);
system_matrix->block(1,0).vmult (dst, tmp2);
system_matrix->block(1,1).vmult(tmp3, src);
dst -= tmp3;
}


Wolfgang Bangerth

unread,
Sep 11, 2019, 11:46:21 AM9/11/19
to dea...@googlegroups.com
On 9/11/19 7:04 AM, Konrad wrote:
>
> The system that needs to be solved formally reads
> *(P)   A sigma + B^T u = 0
> *
> *        B sigma + C u = f*
>
> *A* is sort of a mass matrix so it is positive (corresponds to
> *<tau,sigma>* in *H(curl)*) and *C* is positive semi-definite
> (corresponds to *<div v, div u>* in *H(div)*).
>
> First thing that crossed my mind was a Schur complement solver. Now one
> a priori has two options:
>
> 1.) Solve first for u and then for sigma, i.e. invert *A* and use the
> Schur complement *S = -B*A^{-1}*B^T + C*
> 2.) The other way around *S = -B^T*C^{-1}*B + A*. My gut feeling tells
> me that this could actually turn out to be not an option since either
> *C* is very ill-conditioned or not invertible at all. It is symmetric
> and contains at least one zero eigenvalue. Thinking of it: it formally
> corresponds to *C~(grad div)^{-1}* (and I am of course having
> convergence issues)

Correct, your approach 2 will not work precisely because C has a kernal
(all divergence free vector fields) and is consequently not invertible.


> Using approach 1.) I get timing results like that:
> (using MPI with Petsc, AMG preconditioning for *A* on one machine with
> four cores, no preconditioning for the Schur complement *S =
> -B*A^{-1}*B^T + C* from 1.)
>
> Running using PETSc.
> Number of active cells: 32768
> Total number of cells: 21449 (on 6 levels)
> Number of degrees of freedom: 205920 (104544+101376)
>   Iterative Schur complement solver converged in 177 iterations.
>   Outer solver completed.
>
>
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |        37s |            |
> |                                             |            |            |
> | Section                         | no. calls |  wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Schur complement solver (for u) |         1 |      27.2s |        74% |
> | assembly                        |         1 |      6.61s |        18% |

That's actually an acceptable time: My rule of thumb, maybe a bit
outdated, is that it takes a minute to solve for 100,000 unknowns for
non-trivial problems. You've got twice that many unknowns and solve it
in half the time.


> You will notice that the time for the Schur complement solver is
> dominating. So either I did something wrong in the implementation
> conceptually (see class for Schur complement below) or I need to do a
> better job in solving/preconditioning the system. Note that the Schur
> complement is usually not definite in contrast to Stokes or Darcy systems.

Really? Is this an issue related to boundary conditions? Your matrix
corresponds to the operator
curl I curl + grad div
I thought there are no vector fields that are both divergence-free and
curl-free.


> Does anyone have an idea of how to attack systems like *(P)*? Are there
> optimal preconditioners? Any hint would be appreciated.

Take a look at step-20 and step-22. The idea of preconditioning a Schur
complement is to use some sort of approximation of the Schur complement
that is easier to solve for. I suspect that in your case, a simple
Laplace matrix would make for a nice approximation of S, and so using
one SSOR step with the Laplace matrix might be a good preconditioner for
S. Alternatively, an algebraic multigrid initialized with the Laplace
matrix might work.

step-20 plays with these ideas, solving for the Schur complement
directly. step-22 (in particular the code in the "Possibilities for
extensions" section) expands on this: If you have a preconditioner for
the Schur complement, then you can just use that as part of the
preconditioner for the overall system. It's the best approach we have
for the Stokes equation (used successfully on systems with billions of
unknowns) and worth understanding!

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Konrad

unread,
Sep 11, 2019, 1:03:19 PM9/11/19
to deal.II User Group
Hi Wolfgang,
 
> Using approach 1.) I get timing results like that:
> (using MPI with Petsc, AMG preconditioning for *A* on one machine with
> four cores, no preconditioning for the Schur complement *S =
> -B*A^{-1}*B^T + C* from 1.)
>
> Running using PETSc.
> Number of active cells: 32768
> Total number of cells: 21449 (on 6 levels)
> Number of degrees of freedom: 205920 (104544+101376)
>    Iterative Schur complement solver converged in 177 iterations.
>    Outer solver completed.
>
>
> +---------------------------------------------+------------+------------+
> | Total wallclock time elapsed since start    |        37s |            |
> |                                             |            |            |
> | Section                         | no. calls |  wall time | % of total |
> +---------------------------------+-----------+------------+------------+
> | Schur complement solver (for u) |         1 |      27.2s |        74% |
> | assembly                        |         1 |      6.61s |        18% |

That's actually an acceptable time: My rule of thumb, maybe a bit
outdated, is that it takes a minute to solve for 100,000 unknowns for
non-trivial problems. You've got twice that many unknowns and solve it
in half the time.
Thank you, that calms me down.
 
> You will notice that the time for the Schur complement solver is
> dominating. So either I did something wrong in the implementation
> conceptually (see class for Schur complement below) or I need to do a
> better job in solving/preconditioning the system. Note that the Schur
> complement is usually not definite in contrast to Stokes or Darcy systems.

Really? Is this an issue related to boundary conditions? Your matrix
corresponds to the operator
   curl I curl + grad div
I thought there are no vector fields that are both divergence-free and
curl-free.

That indeed depends on the geometry of the domain and it is not an obvious result. Example: Think of a 3D hyper-shell with an inner and outer radius. There is a one-dimensional space of non-trivial radial vector fields (with respect to the origin) that do not have neither a divergence nor a curl. Such fields are referred to as harmonic forms and the dimensionality of this space depends on the Betti numbers of the domain (buzz word is Hodge decomposition).

But this is actually not the case in here, so you are right here. If there are non-trivial harmonic forms one must have an additional condition and a Lagrange multiplier in the weak form.
 
> Does anyone have an idea of how to attack systems like *(P)*? Are there
> optimal preconditioners? Any hint would be appreciated.

Take a look at step-20 and step-22. The idea of preconditioning a Schur
complement is to use some sort of approximation of the Schur complement
that is easier to solve for. I suspect that in your case, a simple
Laplace matrix would make for a nice approximation of S, and so using
one SSOR step with the Laplace matrix might be a good preconditioner for
S. Alternatively, an algebraic multigrid initialized with the Laplace
matrix might work.

step-20 plays with these ideas, solving for the Schur complement
directly. step-22 (in particular the code in the "Possibilities for
extensions" section) expands on this: If you have a preconditioner for
the Schur complement, then you can just use that as part of the
preconditioner for the overall system. It's the best approach we have
for the Stokes equation (used successfully on systems with billions of
unknowns) and worth understanding!

This is great to know and I will then go in this direction. Again, many thanks :-)

Konrad

Wolfgang Bangerth

unread,
Sep 11, 2019, 1:54:28 PM9/11/19
to dea...@googlegroups.com
On 9/11/19 11:03 AM, Konrad wrote:
>
> That indeed depends on the geometry of the domain and it is not an
> obvious result. Example: Think of a 3D hyper-shell with an inner and
> outer radius. There is a one-dimensional space of non-trivial radial
> vector fields (with respect to the origin) that do not have neither a
> divergence nor a curl. Such fields are referred to as harmonic forms and
> the dimensionality of this space depends on the Betti numbers of the
> domain (buzz word is Hodge decomposition).

Ah, yes -- fun example! Thanks for the education!
Reply all
Reply to author
Forward
0 new messages