Hi Victor:
> I appreciate that they are tutorials, but given that I need a projection-step
> Navier-Stokes solver and that I lack the FEM background I thought I'd take
> that code.
That's not unreasonable, but perhaps not what we intend these tutorials to be
for if speed is a major consideration.
> Size of the problem: my main modification is reading in my own mesh. So I can
> have way more variables.
>
> ILU: I'm ashamed to admit that that obvious fact slipped by me. Ok, I've
> switched to a Block Jacobi. That does increase the thread activity. Yay.
> However, the pressure solve does not converge. Do I understand that
> BlockJacobi uses full inverses of the blocks? That ought to be pretty good.
You mean deal.II's BlockJacobi class? I had to look it up (it's a rarely used
class that is >25 years old). My understanding is that the class takes the
diagonal blocks, inverts them, and multiples the vector to be preconditioned
by these inverse blocks. If you take block_size=1, then that's just a regular
Jacobi preconditioner. If you took block_size=n_dofs, you'd get an exact
inverse, but that's of course impractical. Whether it's a good preconditioner
then depends on how big the block size is that you choose.
I looked into the implementation as well. It doesn't use threads, though at
least there's no *fundamental* obstacle in the algorithm to it: Everything
(computing the inverses, applying the inverses) should parallelize trivially.
It's just that the class was written before we started to think about
threading. I don't think it would be terribly difficult to make the class use
threads -- it's just that someone needs to feel strongly enough about it to
offer the time.
> "This costs some additional memory - for DG methods about 1/3 (for double
> inverses) or 1/6 (for float inverses) of that used for the matrix - but it
> makes the preconditioning much faster."
>
> I don't get that. Is DG so different from regular FEM? The diagonal block of a
> FEM matrix is itself a FEM matrix on a subdomain, so the inverse (even if you
> mean that you "invert" by doing a full LU) takes a ton more storage.
I think what it is trying to say (without trying to do the math about it) is
that if, for example, you had a Laplace problem and you choose 4x4 diagonal
blocks, then what this class needs to store are the inverses of these 4x4
blocks. That costs as much memory as the original 4x4 diagonal blocks. But the
matrix itself of course also has many entries *outside* these 4x4 blocks, and
so storing the matrix is much more expensive than only storing the (inverses
of the) diagonal blocks. The person who wrote that comment apparently did the
math and figured out that the matrix in totally has 3 times as many entries as
just the diagonal blocks.
For DG methods, the matrix really does consists of 4x4 blocks (in 2d) because
that's the number of unknowns per cell, and these are all coupled to each
other. They couple to neighboring 4x4 blocks via the DG flux terms. For CG,
the matrix does not decompose into 4x4 blocks because degrees of freedom live
on multiple cells.
(I'm dubious about the claim with the 1/3 because I would have expected the
exact value that depends on the block size and the dimension we're in, but the
comment doesn't say any thing about that. We *ought* to fix this, but again
that requires someone taking the time to think this through.)
> My problem doesn't get awfully large, and I have lots of memory. Should I use
> a direct solver for both pressure & velocity solve? Can I assume that UMFpack
> is suitably multi-threaded?
For small enough problems, say <100,000 unknowns, direct solvers are often
faster than iterative ones. They're also robust -- no need to play with many
different options, they just work.
As for UMFPACK: No, it's not multithreaded. No widely used open source direct
solver is, to the best of my knowledge. PARDISO is, but it costs money and we
don't have interfaces to it. We also have these two issues that may be of
interest in this context:
https://github.com/dealii/dealii/issues/10414
https://github.com/dealii/dealii/pull/16602
Best
W.