Thanks for that datapoint -- I ran off to install openMPI 1.8.4 but still getting the same problem. For reference, I'm attaching the test code that I'm trying to run.
As is, the program runs fine and gives the expected output, but if I change lines 40 and 46 from Send and Recv to Isend and Irecv, respectively, I get the following output
[ghosthost:02640] CUDA: cuCtxGetDevice failed: res=201
[ghosthost:02640] *** Process received signal ***
[ghosthost:02640] Signal: Aborted (6)
[ghosthost:02640] Signal code: (-6)
[ghosthost:02640] CUDA: Error in cuMemcpy: res=-1, dest=0x706d40800, src=0x7fc9fbd9f7a6, size=40
[ghosthost:02640] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fca10685340]
[ghosthost:02640] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7fca102e6cc9]
[ghosthost:02640] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fca102ea0d8]
[ghosthost:02640] [ 3] /usr/local/lib/libopen-pal.so.6(+0x45ad9)[0x7fca0eb8ead9]
[ghosthost:02640] [ 4] /usr/local/lib/libopen-pal.so.6(opal_convertor_unpack+0x10a)[0x7fca0eb871aa]
[ghosthost:02640] [ 5] /usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x450)[0x7fca05040f40]
[ghosthost:02640] [ 6] /usr/local/lib/openmpi/mca_btl_smcuda.so(mca_btl_smcuda_component_progress+0x4b5)[0x7fca0699c9c5]
[ghosthost:02640] [ 7] /usr/local/lib/libopen-pal.so.6(opal_progress+0x4a)[0x7fca0eb7221a]
[ghosthost:02640] [ 8] /usr/local/lib/libmpi.so.1(ompi_mpi_finalize+0x24d)[0x7fca0f0f1d5d]
[ghosthost:02640] [ 9] /usr/local/lib/python2.7/dist-packages/mpi4py/MPI.so(+0x2f694)[0x7fca0f3b7694]
[ghosthost:02640] [10] python(Py_Finalize+0x1a6)[0x42fb0f]
[ghosthost:02640] [11] python(Py_Main+0xbed)[0x46ac10]
[ghosthost:02640] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fca102d1ec5]
[ghosthost:02640] [13] python[0x57497e]
[ghosthost:02640] *** End of error message ***
--------------------------------------------------------------------------
The call to cuMemcpy failed. This is highly unusual and should
not happen. Please report this error to the Open MPI developers.
Hostname: ghosthost
cuMemcpy return value: 201
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 2640 on node ghosthost exited on signal 6 (Aborted).
--------------------------------------------------------------------------