MPI processes: use only a part of the available processes for dealii

167 views
Skip to first unread message

Daniel

unread,
Jan 19, 2019, 3:09:53 PM1/19/19
to deal.II User Group
I would like to limit the number of processes that deal uses to a subset of the one initialized/managed by 

Utilities::MPI::MPI_InitFinalize

To do so, I generated a new group with does not include  the process with rank 3 of the world_group:
(started with mpirun -np 4 ./step-18)

     MPI_Comm_group(MPI_COMM_WORLD,&world_group);
      //
      int ranks[1];
      int size_fem,size_fem_comm;
      ranks[0]=3;
      MPI_Group_excl( world_group,
     1,
     ranks,
     &fem_group);
      MPI_Group_size( fem_group,
     &size_fem);
      MPI_Comm_create( MPI_COMM_WORLD,
      fem_group,
      &FEM_Comm);

I replaced the MPI_COMM_WORLD in the triangulation(FEM_Comm) and mpi_communicator (see source code).

After compiling I get the following behavior:

bash-3.2$ mpirun -np 4 ./step-18

fem group size: 3 comm:3

fem group size: 3 comm:3

fem group size: 3 comm:3

fem group size: 3 comm:32766

rank and size 0,4

rank and size 1,4

rank and size 2,4

rank and size 3,4

ERROR: Uncaught exception in MPI_InitFinalize on proc 3. Skipping MPI_Finalize() to avoid a deadlock.



----------------------------------------------------

Exception on processing: 


--------------------------------------------------------

An error occurred in line <79> of file </Applications/deal.II-9.0.0.app/Contents/Resources/spack/src/deal.II-9.0.0/source/base/mpi.cc> in function

    unsigned int dealii::Utilities::MPI::this_mpi_process(const MPI_Comm &)

The violated condition was: 

    ierr == MPI_SUCCESS

Additional information: 

deal.II encountered an error while calling an MPI function.

The description of the error provided by MPI is "MPI_ERR_COMM: invalid communicator".

The numerical value of the original error code is 5.

--------------------------------------------------------


Aborting!

----------------------------------------------------

Timestep 1 at time 1

  Cycle 0:

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

    Number of active cells:       3712 (by partition: 1360+1286+1066)

    Number of degrees of freedom: 17226 (by partition: 6651+5922+4653)

    Assembling system...--------------------------------------------------------------------------

mpirun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:


  Process name: [[53719,1],3]

  Exit code:    1


======END


the meshing uses three processes; 


I am not sure howto ensure that all other involved parts (e.g. PETSC) are working with the FEM_Comm instead of the MPI_WORLD_COMM used per default.



Thanks for your help.

DW.





step-18.cc

Wolfgang Bangerth

unread,
Jan 20, 2019, 11:41:42 PM1/20/19
to dea...@googlegroups.com
On 1/19/19 1:09 PM, Daniel wrote:
>
> After compiling I get the following behavior:
>
> bash-3.2$ mpirun -np 4 ./step-18
>
> fem group size: 3 comm:3
>
> fem group size: 3 comm:3
>
> fem group size: 3 comm:3
>
> fem group size: 3 comm:32766

The problem here is that only 3 of the processes participate in the
communicator. But you ask the question of the group size on all processes:
MPI_Comm_size( FEM_Comm ,&size_fem_comm);
Here, FEM_Comm is a communicator object that is really only initialized on the
three participating processes, whereas it is uninitialized on the fourth and
consequently you are getting pointless answers there.

You have the same problem again at the end of the same block, where every
process creates a TopLevel object using FEM_Comm, but really only on three of
the processes is this object valid.

After creating the subset communicator, you need to have an if statement of
the form
if (this process is participating in the subset communicator)
{
do everything else
}

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Jean-Paul Pelteret

unread,
Jan 21, 2019, 1:08:55 AM1/21/19
to dea...@googlegroups.com
Dear Daniel,

If appropriate, you could also consider using MPI_Comm_split() to create a new communicator based off of a subset of the world / master communicator, and pass that along to all classes / functions in your FEM code. 

Best,
Jean-Paul

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel

unread,
Jan 21, 2019, 3:26:30 AM1/21/19
to deal.II User Group
Dear Wolfgang,
  thanks for pointing this out; I completely missed that aspect. 

  assuming all processes with rank smaller 4 in the FEM_Comm Communicator participate works..
 
if(rank_fem<4) {
     
TopLevel<3> elastic_problem;
          elastic_problem
.run ();
     
}
     
else{
         std
::cout<<" I am not allowed to work on this problem ;-) "<<std::endl;
 
     
}
      MPI_Barrier
(MPI_COMM_WORLD);

Thanks,
Daniel
Reply all
Reply to author
Forward
0 new messages