global dof renumbering with parallel triangulation and trilinos direct solver (Bug ?)

64 views
Skip to first unread message

Daniel Jodlbauer

unread,
Aug 8, 2016, 10:24:12 AM8/8/16
to deal.II User Group
Hi all !

I am trying to renumber the degrees of freedom globally using code like this:

   vector<unsigned int> new_number(dof_handler.n_dofs());
   for (unsigned int i = 0; i < dof_handler.n_dofs(); i++)
      new_number[i] = dof_handler.n_dofs() - i - 1; // simple example

   vector<unsigned int> local_new_number;
   for (unsigned int dof : info.locally_owned)
      local_new_number.push_back(new_number[dof]);

   dof_handler.renumber_dofs(local_new_number);

   info.locally_owned = dof_handler.locally_owned_dofs();
   DoFTools::extract_locally_relevant_dofs(dof_handler, info.locally_relevant);

with a dofhandler built upon a parallel::shared::Triangulation.

However, this seems to break the solution of TrilinosWrappers::SolverDirect:

   LA::MPI::Vector tmp_newton, tmp_rhs;

   tmp_newton.reinit(info.locally_owned, MPI_COMM_WORLD);
   tmp_rhs.reinit(info.locally_owned, MPI_COMM_WORLD);

   tmp_newton = newton_update;
   tmp_rhs = system_rhs;

   solver.solve(system_matrix, tmp_newton, tmp_rhs);

   cout << fmt::format("[{:d}] mat = {:e}", rank, system_matrix.l1_norm()) << endl;
   cout << fmt::format("[{:d}] rhs = {:e}", rank, tmp_rhs.l2_norm()) << endl;
   cout << fmt::format("[{:d}] sol = {:e}", rank, tmp_newton.l2_norm()) << endl;

which returns 0 for for the solution of the linear system (the other two values are the same as without the renumbering step).

Also, I am not sure if I set up the vectors and matrices correct:
   solution.reinit(info.locally_owned, info.locally_relevant, MPI_COMM_WORLD); // ghosted for fe_values
   old_timestep_solution.reinit(info.locally_owned, info.locally_relevant, MPI_COMM_WORLD); // same as solution
   newton_update.reinit(info.locally_owned, info.locally_relevant, MPI_COMM_WORLD); // ghosted, bc of solution += newton_update

   system_rhs.reinit(info.locally_owned, MPI_COMM_WORLD); // ghosted / non-ghosted ?

   DynamicSparsityPattern dsp(info.locally_relevant);
   DoFTools::make_flux_sparsity_pattern(dof_handler, dsp, constraints, false); 

   SparsityTools::distribute_sparsity_pattern(dsp, dof_handler.n_locally_owned_dofs_per_processor(), MPI_COMM_WORLD, info.locally_relevant);

   system_matrix.reinit(info.locally_owned, info.locally_owned, dsp, MPI_COMM_WORLD);


I am a bit clueless on where to look for the error, so any suggestions are welcome.


Best regards

Daniel Jodlbauer

Denis Davydov

unread,
Aug 8, 2016, 3:09:17 PM8/8/16
to deal.II User Group
Hi Daniel,

First, did you try running in debug? If not,please do so as more checks will be triggered and it would be easier to debug.
Second, can you please try it with usual Triangulation, not parallel::shared? If the problem persists then at least we 
know that it's not related to the triangulation class.

p.s RHS and solution vector are cetainly non-ghosted. As a rule of thumb, if you use a vector in FEValues, 
you need it ghosted. Other way around, if it's a part of  SLAE -- it's not ghosted.

Regards,
Denis.

Daniel Jodlbauer

unread,
Aug 8, 2016, 3:38:20 PM8/8/16
to deal.II User Group
Hi,

- Debug is enabled (at least for dealii, I will have to rebuild trilinos with debug later)
- I am not sure if I got you correctly, but If I use a regular Triangulation, then every rank owns all dofs and finally the initialization of the distributed vectors fails (as expected)

What I additionally tried is: (with 2 ranks)

1) assemble the rhs / matrix in serial,
2) create a partition by hand: [0, n/2), [n/2, n)
3) copy/distribute
3) solve in parallel

which works.

However, when I changed the partition into something like 
 { 0, 2, 4, ... } , { 1, 3, 5, ... } 
it fails, which makes me believe that non-contiguous partitions are not (completely) supported by dealii or trilinos.

Denis Davydov

unread,
Aug 8, 2016, 4:18:21 PM8/8/16
to deal.II User Group


On Monday, August 8, 2016 at 9:38:20 PM UTC+2, Daniel Jodlbauer wrote:
- Debug is enabled (at least for dealii, I will have to rebuild trilinos with debug later)
i mean deal.II. No need to recompile trilinos.
 
- I am not sure if I got you correctly, but If I use a regular Triangulation, then every rank owns all dofs and finally the initialization of the distributed vectors fails (as expected)
 
you just need to use it carefully. For both parallel::shared::Tria and general Tria all mpi cores own the complete triangulation. Same for DoFs, each cell on each core 
have info about local DoFs. The difference between the two tria's is that internally p::s::Tria calls Metis to partition shared triangulation into number of processes and 
then calculates locally owned and relevant dofs. But you can also do it manually with normal Triangulation.


 

What I additionally tried is: (with 2 ranks)

1) assemble the rhs / matrix in serial,
2) create a partition by hand: [0, n/2), [n/2, n)
3) copy/distribute
3) solve in parallel

which works.

However, when I changed the partition into something like 
 { 0, 2, 4, ... } , { 1, 3, 5, ... } 
it fails, which makes me believe that non-contiguous partitions are not (completely) supported by dealii or trilinos

should be fine for Trilinos and deal.II. 

Regards,
Denis.

Wolfgang Bangerth

unread,
Aug 8, 2016, 4:38:50 PM8/8/16
to dea...@googlegroups.com
On 08/08/2016 02:18 PM, Denis Davydov wrote:
>
> However, when I changed the partition into something like
> { 0, 2, 4, ... } , { 1, 3, 5, ... }
> it fails, which makes me believe that non-contiguous partitions are
> not (completely) supported by dealii or trilinos
>
>
> should be fine for Trilinos and deal.II.

That shouldn't be a problem for storing matrices and vectors, and I
suspect also not for the iterative solvers. I'm not sure the direct
solvers in Trilinos can handle that, though. That should be relatively
simple to find out with a little test program, though.

Best
W.

Denis Davydov

unread,
Aug 10, 2016, 5:08:25 AM8/10/16
to dea...@googlegroups.com
Good point!
I did not notice that direct solvers are used.
Obviously for starters one could try iterative solver with non-contiguous range. If it works,
then the issue is in Trilinos’s direct solvers.

Regards,
Denis.

Daniel Jodlbauer

unread,
Aug 10, 2016, 7:47:19 AM8/10/16
to deal.II User Group
Ok, if I use SolverGMRES<>, it reports the error "Column map of matrix does not fit with vector map!", however, TrilinosWrappers::SolverGMRES seems to work. 

Denis Davydov

unread,
Aug 10, 2016, 8:06:21 AM8/10/16
to dea...@googlegroups.com
Are you sure that you reinitialise all involved vectors and matrices after renumbering?

Regards,
Denis.

On 10 Aug 2016, at 13:47, Daniel Jodlbauer <jdsc...@gmx.at> wrote:

Ok, if I use SolverGMRES<>, it reports the error "Column map of matrix does not fit with vector map!", however, TrilinosWrappers::SolverGMRES seems to work. 

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/ncEIt6y7EHg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Jodlbauer

unread,
Aug 10, 2016, 8:37:43 AM8/10/16
to deal.II User Group
Yes, the initialization is done as above and after the renumbering.

Denis Davydov

unread,
Aug 10, 2016, 8:46:40 AM8/10/16
to dea...@googlegroups.com
Could you post the Minimum Working Example (MWE)? (try to make it as small as possible).

Daniel Jodlbauer

unread,
Aug 10, 2016, 10:19:36 AM8/10/16
to deal.II User Group
Here it is.

First run is without renumbering, second one with dof renumbering.

In serial mode (mpirun -np 1) both tests complete (just the error from the Subscriptor class).
For mpirun -np 2 the first one finishes while the second one fails.

Now it returns an error message that did not occur previously, but I cant extract any useful information from that (Trilinos solver returns error code -1 in file "trilinos_solver.h" line 485).
test_renumbering.cc

Denis Davydov

unread,
Aug 10, 2016, 11:59:53 AM8/10/16
to dea...@googlegroups.com
You test worked for me with 

DoFRenumbering::Cuthill_McKee(dof_handler);

I will see if I can spot any issues with renubmer() part...

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/ncEIt6y7EHg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<test_renumbering.cc>

Daniel Jodlbauer

unread,
Aug 10, 2016, 12:09:30 PM8/10/16
to deal.II User Group
I think DoFRenumbering::Cuthill_McKee(dof_handler) does the renumbering only on the locally owned dofs, therefore these index sets wont change.

Denis Davydov

unread,
Aug 10, 2016, 12:41:38 PM8/10/16
to dea...@googlegroups.com
I debugged locally owned DoFs for 2 cores and 1 global refinement and I see:

core=1:

DEAL::n_dofs = 9
DEAL::before renumbering:
DEAL::{[0,2]}
DEAL::after renumbering:
DEAL::{[6,8]}

core=2:

DEAL::n_dofs = 9
DEAL::before renumbering:
DEAL::{[3,8]}
DEAL::after renumbering:
DEAL::{[0,5]}

which looks reasonable to me. It does run with CG, but the solution is different.

Regards,
Denis.

On 10 Aug 2016, at 18:09, Daniel Jodlbauer <jdsc...@gmx.at> wrote:

I think DoFRenumbering::Cuthill_McKee(dof_handler) does the renumbering only on the locally owned dofs, therefore these index sets wont change.

Denis Davydov

unread,
Aug 15, 2016, 4:14:07 AM8/15/16
to deal.II User Group
FYI, I created a GitHub issue to further track this problem: https://github.com/dealii/dealii/issues/2966

Regards,
Denis.
Reply all
Reply to author
Forward
0 new messages