MPI.COMM_SELF.Spawn not wait for the finish of executable

237 views
Skip to first unread message

RITE Zhang

unread,
Sep 11, 2020, 5:17:49 AM9/11/20
to mpi4py
Hi 

I now try to use MPI.COMM_SELF.Spawn to run another compiled program. The execution command of the program something like "executor   -i   a_text_file". How should I use MPI.COMM_SELF.Spawn to run this with the args?

I have tried to write the command in a bash file "run.bash", then run it as follows.

comm = MPI.COMM_SELF.Spawn("./run.bash",args=None, maxprocs=2)

However, I find the comm just goes through and does not wait for the finish of "executor".
Could you please tell me how to deal with the problem?

Thank you!
Best regards, 
YZ



RITE Zhang

unread,
Sep 13, 2020, 9:26:56 PM9/13/20
to mpi4py
If I use  comm.barrier(), there will be no finish of the run, even the actually the program run has been finished.

RITE Zhang

unread,
Sep 14, 2020, 3:17:39 AM9/14/20
to mpi4py
I did a simple hello world example. 
The problem is the program does not end when using "barrier()"
Could anyone point out the cause of the errors? I called through mpiexec -n 1 python master.py

master.py:

from mpi4py import MPI
import os

mpi_comm = MPI.COMM_WORLD
mpi_rank = mpi_comm.Get_rank()  
executable = "./mpi_hello_world"

intercomm = MPI.COMM_SELF.Spawn(executable, args=[""], maxprocs=2)
intercomm.barrier()
intercomm.Disconnect()
mpi_comm.Disconnect()

mpi_hello_world.c code:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {

  MPI_Comm intercomm; 
  MPI_Init(&argc, &argv);
  MPI_Comm_get_parent(&intercomm);
  printf("%d \n",MPI_Comm_get_parent(&intercomm));
  int world_size;
  MPI_Comm_size(MPI_COMM_WORLD, &world_size);
  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

  char processor_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(processor_name, &name_len);

  printf("Hello world from processor %s, rank %d out of %d processors\n",
         processor_name, world_rank, world_size);

  MPI_Comm_get_parent(&intercomm);
  if (intercomm != MPI_COMM_NULL) MPI_Comm_disconnect(&intercomm); 
  MPI_Finalize();
}

Lisandro Dalcin

unread,
Sep 15, 2020, 2:48:51 PM9/15/20
to mpi...@googlegroups.com
Two problems in your code:

1) In the Python code, remove mpi_comm.Disconnect(). You do not have to (actually, you cannot, it is an error) disconnect from COMM_WORLD.

2) In the C code, you are missing a `MPI_Barrier(intercomm)` call that matches the `intercomm.barrier()` you have in the Python side. 

The program does not end because you are in a textbook deadlock scenario: 
a) The master process blocks in the barrier() call, waiting for the worker processes to also enter a matching barrier, and that barrier call never happens on the worker side.
b) The worker processes run up to the point of MPI_Finalize(), and they block waiting for the master to enter the (collective) MPI_Finalize() call [when using mpi4py, that happens at the Python process termination]. But the master never reaches finalization, as it is blocking in the barrier call.


--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/9bbd0459-446d-4969-9dbb-e51692a2b629n%40googlegroups.com.


--
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
Reply all
Reply to author
Forward
0 new messages