Intranode communication with mpi4py

Zeki Zeybek

unread,

Dec 11, 2020, 2:34:07 PM12/11/20

to mpi4py

Hi all! I am trying to configure a piece of code so that it can run on multiple nodes on a cluster. On my personal computer I have been utilizing multiprocessing module with Pool class and the performance increase was immense thus I believe that the problem itself has a parallel character, so that's set. Then I completely discard the multiprocessing module of python and rewrite the code in order to use scatter/gather methods in mpi4py to perform collective communication between nodes (I first used MPIPoolExecutor because it seemed easier since it is very much like that pool process of python and did not lead to major changes in the code, but it did not perform well compare to Pool class of multiprocessing module of python). After trial and error, I realized that when there are fewer tasks per node the code performs better. How can I overcome this issue?

Zeki Zeybek

unread,

Dec 11, 2020, 4:12:24 PM12/11/20

to mpi4py

Basically, how can I keep the speed improvement provided by the multiprocessing module of python?

Lisandro Dalcin

unread,

Dec 15, 2020, 2:54:38 AM12/15/20

to mpi...@googlegroups.com

On Sat, 12 Dec 2020 at 00:12, Zeki Zeybek <zekii...@gmail.com> wrote:

Basically, how can I keep the speed improvement provided by the multiprocessing module of python?

mpi4py.futures do indeed have a executor setup cost (spawning the worker processes with MPI), while multiprocessing uses fork(), which on Linux and macOS in way faster than lauching brand-new processes from scratch, even more if that process involves a Python runtime.

How much time does each of your individual tasks require to run? If your individual tasks run very quickly, and you just have a few of them, then the setup cost of mpi4py.futures will hit you. Otherwise, mpi4py.futures should have negligible cost compared to multiprocessing. If that's not the case, my guess is that something is wrong on your side with MPI. But without more information, I can only guess. You should really provide reproducing code, defining tasks that take approximately the same time to run as in our real application, but with simple time.sleep() call to introduce an artificial delay. That way I can test the mpi4py.futures and multiprocessing versions of your code on my side, and try to figure out how things behave.

--

Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

Reply all

Reply to author

Forward