mpi4py and subprocess.Popen

765 views
Skip to first unread message

Pavel Ponomarev

unread,
Mar 20, 2015, 3:48:31 PM3/20/15
to mpi...@googlegroups.com
Hello,

I can't get execution of external programs from an mpi process. I try to write a script to run embarrassingly parallel optimization routine on a cluster, when a million of independent computations are executed using several hundreds of MPI workers.

Construction
    args = ['FEMSolver', 'parameters.cfg']
    pro3 = subprocess.Popen(args, stdout = subprocess.PIPE, stderr = subprocess.PIPE, cwd=path)
    pro3.wait()
 does not work. It just sends first line of stdout of my FEMSolver, terminates execution of the FEMSolver, and continues the script further, so I can't get results from my simulation.

Do you know how to reliably execute an external program within an MPI-script? Any solutions or examples?

Yury V. Zaytsev

unread,
Mar 21, 2015, 4:56:41 AM3/21/15
to mpi...@googlegroups.com
On Fri, 2015-03-20 at 12:47 -0700, Pavel Ponomarev wrote:
>
> Do you know how to reliably execute an external program within an
> MPI-script? Any solutions or examples?

You can't use fork() to reliably execute an external process from within
an MPI program. Instead, use MPI_Comm_spawn() if your MPI implementation
supports it; look for "spawn" in mpi4py demo codes.

However, in many cases, when this question comes up, there is usually
some misunderstanding on how MPI is supposed to be used. Why do you need
an MPI-enabled wrapper around your solver in the first place?

If it's just some convenience orchestration code, which you decided to
write in Python instead of shell, then simply don't run it through MPI,
and start the solver as you would have started it normally from your job
script, e.g.:

args = ['mpiexec', '-np', '64', 'FEMSolver', 'parameters.cfg']

--
Sincerely yours,
Yury V. Zaytsev


zhang...@gmail.com

unread,
Jun 21, 2020, 6:20:21 AM6/21/20
to mpi4py
Hi Pavel

Has the problem been solved?
If yes, could you tell me a solution?

Thank you!
Yi

Eric Irrgang

unread,
Jun 21, 2020, 8:18:54 AM6/21/20
to mpi...@googlegroups.com
In general, I find that this sort of thing works fine under mpi4py, but may be affected by the configuration of mpiexec or the resource scheduler on an HPC cluster. Also, the example script assigns stdout and stderr to pipes, but does not read the pipes, which seems like an error (at least use communicate() instead of wait()).

mpiexec may produce warnings or errors when processes attempt to fork, and the resource manager on a computing cluster may automatically terminate processes that are launched beyond the number of processes allocated for the job.

Is there error output from the subprocess, mpiexec, or the job manager? What does the exit code of the subprocess indicate?
> --
> You received this message because you are subscribed to the Google Groups "mpi4py" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/095098bd-62ec-4d2e-a741-8515625d70f0o%40googlegroups.com.

zhang...@gmail.com

unread,
Jun 21, 2020, 10:29:33 AM6/21/20
to mpi4py
Hi Eric, 

Thank you for the prompt response. I may just find a solution through trails (though I don't know the reason).

I tried to use the "map" of  "MPIPoolExecutor" to distributed jobs (FEM executor) to CPUs in parallel.
Each CPU is expected to run only one job (no consideration of paralleling here) and executed by the subprocess function, e.g., "p = subprocess.run(["./executor"],shell=True)"

The errors are as following. 

[cli_7]: write_line error; fd=33 buf=:cmd=init pmi_version=1 pmi_subversion=1
system msg for write_line failure : Bad file descriptor
[cli_7]: Unable to write to PMI_fd
[cli_7]: write_line error; fd=33 buf=:cmd=get_appnum

The newly worked way is calling the executor by "p = subprocess.run(["mpiexec -n 1 ./executor"],shell=True)", even when use only one CPU for each job.
The program seems can sucessfully run in parallel. 
------------------------

I run the program by "mpiexec -n 312 python -u -m mpi4py.futures  program.py" using 6 nodes (each has 58 cpus and here 52 are used).
Here is another problem. While the above function of "embarrassingly parallel" has no problem now, there is another function within "program.py", e.g., "p = subprocess.run(["mpiexec -n 20 ./executor"],shell=True)", becoming very slow. 

How to solve this? Thank you.




On Sunday, June 21, 2020 at 9:18:54 PM UTC+9, Eric Irrgang wrote:
In general, I find that this sort of thing works fine under mpi4py, but may be affected by the configuration of mpiexec or the resource scheduler on an HPC cluster. Also, the example script assigns stdout and stderr to pipes, but does not read the pipes, which seems like an error (at least use communicate() instead of wait()).

mpiexec may produce warnings or errors when processes attempt to fork, and the resource manager on a computing cluster may automatically terminate processes that are launched beyond the number of processes allocated for the job.

Is there error output from the subprocess, mpiexec, or the job manager? What does the exit code of the subprocess indicate?

> On Jun 21, 2020, at 1:19 PM, zhan...@gmail.com wrote:
>
> Hi Pavel
>
> Has the problem been solved?
> If yes, could you tell me a solution?
>
> Thank you!
> Yi
>
> On Saturday, March 21, 2015 at 4:48:31 AM UTC+9, Pavel Ponomarev wrote:
> Hello,
>
> I can't get execution of external programs from an mpi process. I try to write a script to run embarrassingly parallel optimization routine on a cluster, when a million of independent computations are executed using several hundreds of MPI workers.
>
> Construction
>     args = ['FEMSolver', 'parameters.cfg']
>     pro3 = subprocess.Popen(args, stdout = subprocess.PIPE, stderr = subprocess.PIPE, cwd=path)
>     pro3.wait()
>  does not work. It just sends first line of stdout of my FEMSolver, terminates execution of the FEMSolver, and continues the script further, so I can't get results from my simulation.
>
> Do you know how to reliably execute an external program within an MPI-script? Any solutions or examples?
>
> --
> You received this message because you are subscribed to the Google Groups "mpi4py" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mpi...@googlegroups.com.

Eric Irrgang

unread,
Jun 21, 2020, 11:14:55 AM6/21/20
to mpi...@googlegroups.com
I don't believe that mpiexec allows for recursive calls or subdivision of resources. It sounds like what happens is that you probably have 312 processes that are each launching 20 more, so you are running 6240 processes in a 312-core allocation.

You might accomplish what you want by launching program.py in a 3-node allocation while specifying to the outer mpiexec that it should use only ncores/task_size processes per node. https://www.open-mpi.org/doc/v3.0/man1/mpiexec.1.php
But, again, I don't think mpiexec supports nested calls, so it's probably not a good idea.
It doesn't appear to me that mpi4py.futures.MPIPoolExecutor is intended to support tasks that are themselves multi-process, but I'm not well acquainted with it or the MPI-2 dynamic process management features it relies on. If it was supported, I would think you would provide the number of task processes as an argument to `map` and not use `mpiexec` in the subprocess call. (Does MPI_Comm_spawn() allow a fixed number of child processes to share a new MPI_COMM_WORLD?)

You might also consider using a package that doesn't use MPI to coordinate MPI-based workloads, such as Parsl, or a workload management system based on pilot jobs.
> To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/3f80a9fa-9c30-44a1-9776-e94e3a8eca60o%40googlegroups.com.

zhang...@gmail.com

unread,
Jun 21, 2020, 11:38:30 AM6/21/20
to mpi4py

I still don't understand why the program runs OK when calling the executor by "p = subprocess.run(["mpiexec -n 1 ./executor"],shell=True)".
Though looks OK, the exectution speed becomes quite slow even for a single CPU run "mpiexec -n 1 " through the "map" of  "MPIPoolExecutor" (1000 seconds),  compared with shell executation of "mpiexec -n 1 " (250 seconds). 

I will have a try of Parsl. It looks useful.

Thanks again.





On Monday, June 22, 2020 at 12:14:55 AM UTC+9, Eric Irrgang wrote:
I don't believe that mpiexec allows for recursive calls or subdivision of resources. It sounds like what happens is that you probably have 312 processes that are each launching 20 more, so you are running 6240 processes in a 312-core allocation.

You might accomplish what you want by launching program.py in a 3-node allocation while specifying to the outer mpiexec that it should use only ncores/task_size processes per node. https://www.open-mpi.org/doc/v3.0/man1/mpiexec.1.php
But, again, I don't think mpiexec supports nested calls, so it's probably not a good idea.
It doesn't appear to me that mpi4py.futures.MPIPoolExecutor is intended to support tasks that are themselves multi-process, but I'm not well acquainted with it or the MPI-2 dynamic process management features it relies on. If it was supported, I would think you would provide the number of task processes as an argument to `map` and not use `mpiexec` in the subprocess call. (Does MPI_Comm_spawn() allow a fixed number of child processes to share a new MPI_COMM_WORLD?)

You might also consider using a package that doesn't use MPI to coordinate MPI-based workloads, such as Parsl, or a workload management system based on pilot jobs.

Eric Irrgang

unread,
Jun 21, 2020, 1:37:37 PM6/21/20
to mpi...@googlegroups.com

> I still don't understand why the program runs OK when calling the executor by "p = subprocess.run(["mpiexec -n 1 ./executor"],shell=True)".
> Though looks OK, the exectution speed becomes quite slow even for a single CPU run "mpiexec -n 1 " through the "map" of "MPIPoolExecutor" (1000 seconds), compared with shell executation of "mpiexec -n 1 " (250 seconds).

I don't know, but it definitely sounds wrong to call mpiexec from MPIPoolExecutor.map(), even with `-n 1`. I would think that the individual programs should run in about the same time (maybe a second or two more for large jobs) if you did not put `mpiexec -n 1` in the mapped subprocess call.

Maybe there is other acceleration in the executable that is available for a single standalone process, but not available for a batch of processes that fill up the node, like vector instructions / SIMD, hyper threading, or internal multithreading. Tough to conjecture.

Reply all
Reply to author
Forward
0 new messages