> * Try how the run time of both the direct and iterative solvers
> change as you increase the number of unknowns. (E.g., start with a
> 10x10x10 mesh, then try a 20x20x20, ... mesh.)
>
> As suggested, I have added more tests, which are all executed with MPI in
> parallel:
> (*in the following, I only show the timing for solution*)
>
> *Case 1*. 10*10*10 cells:
>
> Case 1.1 mpirun -np 2 ./xxxx
> cg: 0.00621s MUMPS (with symmetric setting): 0.0932s MUMPS (without
> symmetric setting): 0.122s
>
> Case 1.2 mpirun -np 6 ./xxxx
> cg: 0.00479s MUMPS (with symmetric setting): 0.0774s MUMPS (without
> symmetric setting): 0.169s
>
> *Case 2.* 20*20*20 cells:
>
> Case 2.1 mpirun -np 2 ./xxxx
> cg: 0.0884s MUMPS (with symmetric setting): 3.71s MUMPS
> (without symmetric setting): 6.44s
>
> Case 2.2 mpirun -np 6 ./xxxx
> cg: 0.087s MUMPS (with symmetric setting): 2.16s MUMPS
> (without symmetric setting): 4.29s
>
> *Case 3*. 30*30*30 cells:
>
> Case 3.1 mpirun -np 2 ./xxxx
> cg: 0.39s MUMPS (with symmetric setting): 26.2s MUMPS
> (without symmetric setting): 50.8s
>
> Case 3.2 mpirun -np 6 ./xxxx
> cg: 0.372s MUMPS (with symmetric setting): 23.4s MUMPS
> (without symmetric setting): 43.8s
I have to admit that I find the CG times too small to be credible. The last
case should have about 200,000 unknowns. It seems implausible to me that you
can solve that in 0.4 seconds on 2 processors. What preconditioner do you use,
and do you include the time to build the preconditioner in this time?
My rule of thumb has always been that to solve a problem with 100,000 unknowns
on one processor, it takes about a minute. If you have a fast processor, then
maybe you can get that done in 20 or 30 seconds, so the times you quote for
MUMPS seem not out of the ordinary to me.
> *I have another question:*
> *For problem with millions of unknowns, the same Dirichlet boundary condition
> and different right hand sides (e.g. rhs1, rhs2, ..., rhs8). How can I speed
> up the solution process with (maybe) iterative solver?*
> I think for a small number of unknowns, maybe I can use parallel direct
> solver, which can reuse the factorization of the system matrix for rhs2-rhs8
> after I solve the solution with rhs1.
> But for a problem with millions of unknowns, maybe I have to use iterative
> solver for efficiency. So what solver or what technique should I use to speed
> up the solution of such a multiple load case problem?
Bruno already gave the correct answer: Build an expensive preconditioner
because you only need to build it once. Of course, the best preconditioner is
an LU decomposition of the matrix, which is what a direct solver computes.
But you will need to expect that fundamentally, solving N problems with an
iterative solver requires N times as many operations as solving one (once you
have built the preconditioner). There are "block" variants of solvers such as
GMRES or CG that can be more efficient because they group these operations in
a more efficient way through vectorization or grouping communication, but they
fundamentally still have to do N times as many operations.