A_inverse in step-57

131 views
Skip to first unread message

Jie Cheng

unread,
Oct 17, 2017, 4:08:39 PM10/17/17
to deal.II User Group
Hi everyone

I'm playing around with step-57. A 3D grid which is globally refined for 4 times has 4096 cells and 112724 dofs. Solving such a case is unbearably slow on my computer. I suspect it is due to the direct solver for SparseDirectUMFPACK solver for A_inverse in the BlockSchurPreconditioner. So I am trying to replace it with another solver. My question is which solver and preconditioner should be used for the unsymmetric A (actually A tilde) when it becomes large?

I attempted to use FGMRES solver without preconditioner to replace Line 269 as following:

SolverControl solver_control (stokes_matrix.block(0,0).m(), 1e-6*utmp.l2_norm(), true);
SolverFGMRES<Vector<double> > gmres(solver_control);
gmres.solve(stokes_matrix.block(0,0), dst.block(0), utmp, PreconditionIdentity());

But it couldn't converge even in 2D. I also tried SolverBicgstab without preconditioning, it did converge but no improvement compared to direct solver was observed in the cases that I ran.

Thank you
Jie

Wolfgang Bangerth

unread,
Oct 18, 2017, 10:06:16 AM10/18/17
to dea...@googlegroups.com

Jie,

> I attempted to use FGMRES solver without preconditioner to replace Line 269 as
> following:
>
> SolverControl solver_control (stokes_matrix.block(0,0).m(),
> 1e-6*utmp.l2_norm(), true);
> SolverFGMRES<Vector<double> > gmres(solver_control);
> gmres.solve(stokes_matrix.block(0,0), dst.block(0), utmp, PreconditionIdentity());
>
> But it couldn't converge even in 2D. I also tried SolverBicgstab without
> preconditioning, it did converge but no improvement compared to direct solver
> was observed in the cases that I ran.

Yes, not using a preconditioner is not an option. You need to use *something*
or you will not be able to converge in any reasonable number of iterations.

If your problem is advection-dominated (i.e., a high Reynolds number) then I
would try to use downstream numbering of unknowns (via the corresponding
DoFRenumbering function) and then use SSOR as a preconditioner. There are many
better preconditioners for the N-S equations, but that's an easy start.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Timo Heister

unread,
Oct 18, 2017, 10:12:58 AM10/18/17
to dea...@googlegroups.com
Jie,

did you see the "Possibilities for extensions" section in step-57? For
large problems (especially in 3d), one would need to parallelize and
switch to algebraic of geometric multigrid. See step-55 and step-56.
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.math.colostate.edu_-7Ebangerth_&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=R5lvg9JC99XvuTgScgbY_QFS80R7PEA2q0EPwDy7VQw&m=uU9whAhMqT-pR4C08zqYt7CewVh_JJIJ1bnG4T1vLig&s=yTFbl6BueN9SAskjznG16VzTBmdIoIWbULpbDNjT75w&e=
> --
> The deal.II project is located at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.dealii.org_&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=R5lvg9JC99XvuTgScgbY_QFS80R7PEA2q0EPwDy7VQw&m=uU9whAhMqT-pR4C08zqYt7CewVh_JJIJ1bnG4T1vLig&s=AMjdmsQWdgeDKpbHZuDMpbISzaSJjFn5Is7GfbBdfAY&e=
> For mailing list/forum options, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_forum_dealii-3Fhl-3Den&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=R5lvg9JC99XvuTgScgbY_QFS80R7PEA2q0EPwDy7VQw&m=uU9whAhMqT-pR4C08zqYt7CewVh_JJIJ1bnG4T1vLig&s=N263oHkyuceyJeW21Kye7uazp1_iJ2RJm-Xms0lczvc&e=
> --- You received this message because you are subscribed to the Google
> Groups "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+un...@googlegroups.com.
> For more options, visit
> https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_d_optout&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=R5lvg9JC99XvuTgScgbY_QFS80R7PEA2q0EPwDy7VQw&m=uU9whAhMqT-pR4C08zqYt7CewVh_JJIJ1bnG4T1vLig&s=3Y5ZXSFU7VEOtEd72-RWoDIjFNvIER7VX75FoZ3c_WE&e=
> .



--
Timo Heister
http://www.math.clemson.edu/~heister/

Jie Cheng

unread,
Nov 5, 2017, 10:31:03 PM11/5/17
to deal.II User Group
Dear Timo

It's been a month since I started this thread, during this time I managed to write a time-dependent navier-stokes solver with Grad-Div stablization using IMEX scheme. My major references are your PhD thesis (beautiful work by the way), Dr. Kronbichler's paper and steps 20, 22, 57. I used SparseDirectUMFPACK to solve for A_inverse and it works well in 2D. I tried to use the same trick as in step-22, where an InnerPreconditioner was defined to switch between SparseDirectUMFPACK in 2D and SparseILU in 3D. However I found that SparseILU won't work even when in small 2D cases. To verify this I tried to replace the SparseDirectUMFPACK in step-57 with SparseILU, same thing happened. Without additional off-diagonals the GMRES solver won't converge; with additional off-diagonals, the program runs unbearably slow. Why is the performance of SparseILU so much worse than SparseDirectUMFPACK? Am I using it the right way?

At this point I totally understand parallelization is necessary for larger problems. But I still want to enable my toy program to run problems slightly out the scope of SparseDirectUMFPACK. After that I will try parallelize it.

Thanks
Jie

Timo Heister

unread,
Nov 6, 2017, 11:01:35 AM11/6/17
to dea...@googlegroups.com
> I tried
> to use the same trick as in step-22, where an InnerPreconditioner was
> defined to switch between SparseDirectUMFPACK in 2D and SparseILU in 3D.
> However I found that SparseILU won't work even when in small 2D cases. To
> verify this I tried to replace the SparseDirectUMFPACK in step-57 with
> SparseILU, same thing happened. Without additional off-diagonals the GMRES
> solver won't converge; with additional off-diagonals, the program runs
> unbearably slow. Why is the performance of SparseILU so much worse than
> SparseDirectUMFPACK? Am I using it the right way?

Do you have an inner inner solve or do you just use a single SparseILU
application?

ILU is not great in 3d. This is even true if you consider the velocity
block of the Stokes problem (which is Laplace-like). See the tables in
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dealii.org_8.5.0_doxygen_deal.II_step-5F56.html-23Results&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmEjX38S7JmlS9Vw&m=--nG2SHIoX884s0M9k-XskKi6u9iVsUaE8GaIsVH_Yw&s=PXVWQ3PvRgczuV6a0GPJS7sH7uTPr9z0O-LNq7vuRtA&e= for
example. You can clearly see that ILU is h-dependent. So ILU should
"work" but I am not surprised that it degrades for larger problem
sizes. Your system is even harder to solve because of Grad-Div and the
non-symmetric convective term.

> At this point I totally understand parallelization is necessary for larger
> problems. But I still want to enable my toy program to run problems slightly
> out the scope of SparseDirectUMFPACK. After that I will try parallelize it.

I am afraid this will be difficult. While not easy to do, step-56
allows you to solve larger problems using geometric multigrid.
Extending this to Navier-Stokes is not easy either (and we are getting
close to state-of-the-art research).

Best,
Timo

Jie Cheng

unread,
Nov 6, 2017, 2:30:15 PM11/6/17
to deal.II User Group
Hi Timo

Thanks for your answer!

 
Do you have an inner inner solve or do you just use a single SparseILU
application? 
 
I do not have an inner inner solve, I just use a single SparseILU<double> initialized with system_matrix.block(0,0) to approximate its inverse. It is the trick used in step-22. 


ILU is not great in 3d. This is even true if you consider the velocity
block of the Stokes problem (which is Laplace-like). See the tables in
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dealii.org_8.5.0_doxygen_deal.II_step-5F56.html-23Results&d=DwIBaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=4k7iKXbjGC8LfYxVJJXiaYVu6FRWmEjX38S7JmlS9Vw&m=--nG2SHIoX884s0M9k-XskKi6u9iVsUaE8GaIsVH_Yw&s=PXVWQ3PvRgczuV6a0GPJS7sH7uTPr9z0O-LNq7vuRtA&e=  for
example. You can clearly see that ILU is h-dependent. So ILU should
"work" but I am not surprised that it degrades for larger problem
sizes. Your system is even harder to solve because of Grad-Div and the
non-symmetric convective term.

Looks like the direct solver is the best choice except for multigrid. The size of problem will certainly be limited on a single machine because of the memory usage. But what if I parallelize my code using distributed-memory? Does dealii have a parallel version of direct solver? My ultimate problem size will have less than 10 million dofs. The cluster I have access to has 16G memory on each node, I can probably use 64 nodes or even more. Will I be able to solve such a problem using direct solver for system_matrix.block(0,0)?


I am afraid this will be difficult. While not easy to do, step-56
allows you to solve larger problems using geometric multigrid.
Extending this to Navier-Stokes is not easy either (and we are getting
close to state-of-the-art research).

 I have no experience in multigrid. Forgive me if I am asking the wrong question: is algebraic multigrid method a black box comparing to the geometric one? By black box I mean it requires no additional information other than the matrix itself hence no additional coding.

Thank you
Jie

Timo Heister

unread,
Nov 6, 2017, 5:22:59 PM11/6/17
to dea...@googlegroups.com
> I do not have an inner inner solve, I just use a single SparseILU<double>
> initialized with system_matrix.block(0,0) to approximate its inverse. It is
> the trick used in step-22.

You might get SparseILU(0) to work with an inner solver like in
step-56. It might not be fast, but it should at least work.

> Looks like the direct solver is the best choice except for multigrid.

Really? For the medium mesh ILU is a lot faster (16s vs 237s) and uses
less memory. If you extrapolate from there, large problems will not
only require a lot of memory but also be slower. This is because a
full LU factorization takes O(n^3) operations in general.

> The
> size of problem will certainly be limited on a single machine because of the
> memory usage. But what if I parallelize my code using distributed-memory?

There are special direct solvers that work okay for medium sized
problems in parallel.

> Does dealii have a parallel version of direct solver? My ultimate problem
> size will have less than 10 million dofs. The cluster I have access to has
> 16G memory on each node, I can probably use 64 nodes or even more. Will I be
> able to solve such a problem using direct solver for
> system_matrix.block(0,0)?

You can try parallel direct solvers like Mumps, SuperLU, or KLU with
interfaces exposed through PETSc or Trilinos. I assume they will work
reasonable well up to that size.

> I have no experience in multigrid. Forgive me if I am asking the wrong
> question: is algebraic multigrid method a black box comparing to the
> geometric one? By black box I mean it requires no additional information
> other than the matrix itself hence no additional coding.

Correct. See step-55 or step-40 for example. The problem is tuning AMG
parameters to get good performance especially if your problem is not
Laplace-like.

Wolfgang Bangerth

unread,
Nov 6, 2017, 10:34:17 PM11/6/17
to dea...@googlegroups.com, Jie Cheng

Jie,

> Looks like the direct solver is the best choice except for multigrid. The size
> of problem will certainly be limited on a single machine because of the memory
> usage. But what if I parallelize my code using distributed-memory? Does dealii
> have a parallel version of direct solver?

There is an interface to MUMPS via PETSc, which is a parallel direct solver.

But that's not a solution: if you can apply a direct solver to the top left
block, you may as well apply it to the global system which is not that much
larger. And parallel direct solvers still need far more memory than iterative
solvers.

I'm no expert on the N-S equations, so I don't know what people use as
preconditioners. You'll have to read the literature. Maybe Timo also has
suggestions.


> My ultimate problem size will have
> less than 10 million dofs. The cluster I have access to has 16G memory on each
> node, I can probably use 64 nodes or even more. Will I be able to solve such a
> problem using direct solver for system_matrix.block(0,0)?

I suspect that that might work. You'll have ~150,000 unknowns per node,
leaving about 100kB per unknown on each node. That should suffice, though I
have to admit that I don't know how much memory parallel direct solvers need
in 3d. I'd expect a small multiple of 10kB per unknown.


> I have no experience in multigrid. Forgive me if I am asking the wrong
> question: is algebraic multigrid method a black box comparing to the geometric
> one? By black box I mean it requires no additional information other than the
> matrix itself hence no additional coding.

That is generally (almost) true if the matrix you apply it to is symmetric and
positive definite. But that's not the case for you.

Best
W.


--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Jie Cheng

unread,
Nov 6, 2017, 11:46:35 PM11/6/17
to Wolfgang Bangerth, Timo Heister, dea...@googlegroups.com
Hi Timo and Wolfgang

I wrapped up my A preconditioner with an inner CG solver (I am using IMEX scheme which moves the convection term to the RHS so A is symmetric) but the SparseILU still seems not to work. I attached my code here, if you could take a look, it is very easy to reproduce the issue:
1. compile and run it, you will see everything works as expected
2. comment out line 221 and uncomment line 222, the program just hangs.

Wolfgang does have a point: if I were able to use a direct solver on A then I might as well apply it on the whole system. I think I should get SparseILU working at least.

Thank you again for your helpful comments.

Jie



 
block_preconditioner.cc

Wolfgang Bangerth

unread,
Nov 7, 2017, 8:56:48 AM11/7/17
to Jie Cheng, Timo Heister, dea...@googlegroups.com

Jie,

> I wrapped up my A preconditioner with an inner CG solver (I am using IMEX
> scheme which moves the convection term to the RHS so A is symmetric) but the
> SparseILU still seems not to work.

But if A is symmetric, then you can use AMG again, and there is likely no
better preconditioner for that other than a geometric multigrid.


> I attached my code here, if you could take
> a look, it is very easy to reproduce the issue:
> 1. compile and run it, you will see everything works as expected
> 2. comment out line 221 and uncomment line 222, the program just hangs.
>
> Wolfgang does have a point: if I were able to use a direct solver on A then I
> might as well apply it on the whole system. I think I should get SparseILU
> working at least.

The problem with the SparseILU class is that it (i) is only an approximation
of an inverse matrix and, as a preconditioner far worse than the direct
solver, (ii) our implementation is neither very good nor optimized for
performance. There is also the issue that *any* ILU implementation that is
good will have *many* parameters, such as how to deal with fill-in,
strengthening the diagonal, pivoting, etc.

The deal.II SparseILU class is almost certainly not good enough to deal with
complicated problems such as yours. You may have better luck with the
implementations in PETSc or Trilinos, but in the end as Timo already said: the
question of what preconditioners to use for the N-S equations (at least if you
move the advection term to the left hand side) is an active research question
where the answer is neither obvious nor settled.

This is also part of why projection schemes are so attractive: They don't
require solving the coupled problem.

Timo Heister

unread,
Nov 7, 2017, 3:22:35 PM11/7/17
to Jie Cheng, Wolfgang Bangerth, dea...@googlegroups.com
> SparseILU still seems not to work. I attached my code here, if you could
> take a look, it is very easy to reproduce the issue:
> 1. compile and run it, you will see everything works as expected
> 2. comment out line 221 and uncomment line 222, the program just hangs.

Please try to debug the problem some more. Is the inner or outer
solver not converging? How are the residuals changing over the
iterations? In which step does it hang? Can you open it in a debugger?
What happens with a simpler and smaller testproblem, etc.. While there
might be a bug in deal.II that you are running into, I doubt that this
is the case here.

Jie Cheng

unread,
Nov 8, 2017, 11:59:48 PM11/8/17
to Timo Heister, Wolfgang Bangerth, dea...@googlegroups.com
Hi Timo and Wolfgang

It turns out the SparseILU needs to be tuned very carefully. I set the AdditioalData to (0.03, 0) the solver now works fine. Although it is still slower than UMFPACK. Maybe using just the right parameters in AdditionalData can further improve it. I observed similar thing in step-57 too. Using ILU + GMRES for A inverse won't even converge in reasonable time with default AdditionalData. Playing with the AdditionalData helps... In a word, SparseILU seems to be sensitive to the AdditionalData and how to choose the right parameter is unclear.

Wolfgang, would you like me to contribute this code to the code gallery? Is there anything in this code that I should polish?

Thank you
Jie

Wolfgang Bangerth

unread,
Nov 9, 2017, 8:28:19 AM11/9/17
to Jie Cheng, Timo Heister, dea...@googlegroups.com

Jie,

> It turns out the SparseILU needs to be tuned very carefully. I set the
> AdditioalData to (0.03, 0) the solver now works fine. Although it is still
> slower than UMFPACK. Maybe using just the right parameters in AdditionalData
> can further improve it. I observed similar thing in step-57 too. Using ILU +
> GMRES for A inverse won't even converge in reasonable time with default
> AdditionalData. Playing with the AdditionalData helps... In a word, SparseILU
> seems to be sensitive to the AdditionalData and how to choose the right
> parameter is unclear.

Yes, that's what I meant to say in my last email: the ILU is quite sensitive
to choosing parameters.

Do you have a set of parameters that works significantly better than what we
currently use in step-57? Would you like to propose we make changes to step-57
in this direction?


> Wolfgang, would you like me to contribute this code to the code gallery? Is
> there anything in this code that I should polish?

You mean your time-dependent N-S solver? Yes, we would absolutely love to have
it as part of the code gallery! If you contribute it as a pull request as
explained on the website, we can take a look and see what should/needs to be
polished, though we're typically quite relaxed about code gallery programs!

Best
Wolfgang
Reply all
Reply to author
Forward
0 new messages