Performance of Ray plasma vs MPI communication

64 views
Skip to first unread message

Iaroslav Igoshev

unread,
Jan 30, 2023, 7:17:47 PM1/30/23
to mpi...@googlegroups.com
Hi mpi4py team,

Me and my team have been measuring the performance of Ray plasma (shared object storage) vs MPI communication. It was found out that MPI is faster than Ray with respect to data transfer for size <= ~1 MB from one process to another. However, when increasing the data size > ~1 MB MPI gets slower than Ray.

Ray example (python script1.py)

```python
import numpy as np
import ray
import time
import sys


ray.init(num_cpus=2)

sizes = [
        1024 * 128,  # ~128 KB
        1024 * 256,  # ~256 KB
        1024**2,  # ~1 MB
        1024**2 * 128,  # ~128 MB
        1024**2 * 256,  # ~256 MB
    ]
type_size = 8 # because numpy array has int64 dtype by default
sizes = [size // type_size for size in sizes]

input_obj_refs = []
for size in sizes:
    arr = np.zeros(size)
    t0 = time.time()
    ref = ray.put(arr) # data copy to be put in Ray Plasma object store
    t1 = time.time()
    print(f"time of ray.put for data size {sys.getsizeof(arr)} =", t1 - t0)
    input_obj_refs.append(ref)

@ray.remote
def foo(o_refs):
    t0 = time.time()
    arr = ray.get(o_refs[0]) # data retrieving with zero-copy
    t1 = time.time()
    print(f"time of ray.get for data size {sys.getsizeof(arr)} =", t1 - t0)
    return arr

output_obj_refs = [foo.remote([ref]) for ref in input_obj_refs]

# output
time of ray.put for data size 131184 = 0.0011148452758789062
time of ray.put for data size 262256 = 0.0005316734313964844
time of ray.put for data size 1048688 = 0.0011167526245117188
time of ray.put for data size 134217840 = 0.02989053726196289
time of ray.put for data size 268435568 = 0.0584867000579834
time of ray.get for data size 112 = 0.0009331703186035156
time of ray.get for data size 112 = 0.00015425682067871094
time of ray.get for data size 112 = 0.0001499652862548828
time of ray.get for data size 112 = 0.0009522438049316406
time of ray.get for data size 112 = 0.00015735626220703125
```

As can be seen from the output above, time of ray.put depends on the data size as Ray creates a copy of data to be put into plasma. Time of ray.get is more or less the same as Ray retrieves the data from the shared object storage with zero copy.

MPI example (mpiexec -n 2 python script2.py)

```python
import time
import numpy as np
import sys
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

sizes = [
    1024 * 128,  # ~128 KB
    1024 * 256,  # ~256 KB
    1024**2,  # ~1 MB
    1024**2 * 128,  # ~128 MB
    1024**2 * 256,  # ~256 MB
]
type_size = 8 # because numpy array has int64 dtype by default
sizes = [size // type_size for size in sizes]
arrays = [np.zeros(size) for size in sizes]

if rank == 0:
    for i in range(len(arrays)):
        t0 = time.time()
        comm.Send(arrays[i], dest=1)
        t1 = time.time()
        print(f"Send time for data size {sys.getsizeof(arrays[i])} =", t1 - t0)
elif rank == 1:
    for i in range(len(arrays)):
        t0 = time.time()
        comm.Recv(arrays[i], source=0)
        t1 = time.time()
        print(f"Recv time for data size {sys.getsizeof(arrays[i])} =", t1 - t0)

# output
Send time for data size 131184 = 0.00021719932556152344
Send time for data size 262256 = 0.00012946128845214844
Send time for data size 1048688 = 0.00046062469482421875
Send time for data size 134217840 = 0.07068681716918945
Send time for data size 268435568 = 0.14120078086853027
Recv time for data size 131184 = 0.00021886825561523438
Recv time for data size 262256 = 0.00014495849609375
Recv time for data size 1048688 = 0.0005681514739990234
Recv time for data size 134217840 = 0.07068228721618652
Recv time for data size 268435568 = 0.14120888710021973
```

As can be seen from the output above, MPI gets slower than Ray when data sizes get increased.

My questions are the following.

1. Why does MPI get slower when data sizes get increased? Is it related to interprocess communication, i.e. two processes have to be synchronized with each other to exchange the data?
2. Is it possible to speed up MPI communication so it gets faster than the shared object store of Ray or any other shared object store?

I would really appreciate your responses and probably some thoughts on how things can be sped up for MPI.

Kind regards,
Iaroslav

Iaroslav Igoshev

unread,
Feb 15, 2023, 7:10:04 AM2/15/23
to mpi4py
Hi mpi4py team,

This is a friendly reminder regarding my email above.

Kind regards,
Iaroslav

вторник, 31 января 2023 г. в 01:17:47 UTC+1, Iaroslav Igoshev:
Reply all
Reply to author
Forward
0 new messages