I have seen this on the net a few times …
"… mpi4py now provides (experimental) support for passing CuPy arrays to MPI calls, provided that mpi4py is built against a CUDA-aware MPI implementation"
Are there any examples or instructions for how to build mpi4py against cuda? I am working on a gpu node that does not have direct access to the net and downloaded an mpi4py release but so far, while the hello world works (and 53/56 of the unit tests) it always throws a segv when the mpi4py.MPI.Intracomm calls Allgatherv(). My guess is that its not built correctly.