combining mpi4py and multiprocessing

760 views

Skip to first unread message

afylot

unread,

May 3, 2021, 10:11:28 AM5/3/21

to mpi4py

I have a project that needs to be parallelized, and I need an opinion on the parallelization strategy and choice of python modules.

I have a project that is intensive on the side of data handling. I am working on a cluster.

Let's say that

each node has a memory of 128 GB and 16 CPUs
I would like the read 500 GB of data
each node will read a fraction of the data
each node will communicate to the other nodes which piece of data it needs and collect information for each other node.

I would implement this part with mpi4py. After that,

I would like to do a computation using all the CPUs in the node using multiprocessing module.

I was wondering if this strategy is viable, or if multiprocessing my interfere with mpi4py. I would like to know if there are other parallelization modules that I may consider for this problem.

Lisandro Dalcin

unread,

May 4, 2021, 3:43:14 AM5/4/21

to mpi...@googlegroups.com

On Mon, 3 May 2021 at 17:11, afylot <afy...@gmail.com> wrote:

I was wondering if this strategy is viable, or if multiprocessing my interfere with mpi4py.

Multiprocessing may interfere with mpi4py. Not because of mpi4py, but because some backend MPI implementations do not like the fork() system calls that multiprocessing uses to create worker processes. I would say: give it a quick try, but if it fails, you were warned, don't come back to us asking for a fix, the solution is not in mpi4py's hands. If things fail, using a different MPI implementation may fix the problem.

I would like to know if there are other parallelization modules that I may consider for this problem.

Have you ever looked at `concurrent.futures` from Python 3 stdlib?. I find it much nicer than multiprocessing, although it does not have support for advanced multiprocessing features like shared memory (something your application could surely benefit from). If the approach of concurrent.futures fits your needs, then DO NOT use it directly, but look at the mpi4py.futures packages, which is a drop-in MPI-based implementation https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html. Communication with worker processes will involve pickle serialization and may be a bit slow (memory copies within the pickle module). I still have to find some time to implement a more efficient copy-free communication approach for large data applications.

Or perhaps use a pure-MPI approach. Split COMM_WORLD in subcomms able to use shared memory [ node_comm = MPI.COMM_WORLD.Split_type(MPI.COMM_TYPE_SHARED) ], and then use MPI.Win.Allocate_shared() within each node-local subcomm to easily access shared memory. If your data involves large NumPy arrays, my guess is this approach will give you the most performant solution. Of course, the implementation requires some knowledge of MPI.

Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

Reply all

Reply to author

Forward

0 new messages