Thanks for your help. I am running a cuda aware MPI example. It works fine on one node, but gives the following error when run across a couple of nodes in pytools.preform.Execerror:
Traceback (most recent call last):
File "test_mpi_pycuda.py", line 64, in <module>
x_gpu_part.fill(1)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.py", line 525, in fill
func = elementwise.get_fill_kernel(self.dtype)
File "<string>", line 2, in get_fill_kernel
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/tools.py", line 423, in context_dependent_memoize
result = func(*args)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/elementwise.py", line 488, in get_fill_kernel
"fill")
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/elementwise.py", line 157, in get_elwise_kernel
arguments, operation, name, keep, options, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/elementwise.py", line 143, in get_elwise_kernel_and_types
keep, options, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/elementwise.py", line 71, in get_elwise_module
options=options, keep=keep)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 251, in __init__
arch, code, cache_dir, include_dirs)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 241, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 73, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 47, in preprocess_source
result, stdout, stderr = call_capture_output(cmdline, error_on_nonzero=False)
File "/usr/lib/python2.7/dist-packages/pytools/prefork.py", line 196, in call_capture_output
return forker[0].call_capture_output(cmdline, cwd, error_on_nonzero)
File "/usr/lib/python2.7/dist-packages/pytools/prefork.py", line 53, in call_capture_output
% ( " ".join(cmdline), e))
pytools.prefork.ExecError: error invoking 'nvcc --preprocess -arch sm_52 -I/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/cuda /tmp/tmpl0WyOY.cu --compiler-options -P': [Errno 2] No such file or directory
[max1:06760] *** Process received signal ***
[max1:06760] Signal: Segmentation fault (11)
[max1:06760] Signal code: Address not mapped (1)
[max1:06760] Failing at address: (nil)
[max1:06760] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fa4ba69a340]
[max1:06760] [ 1] /usr/lib/libcuda.so.1(+0x1fb0e5)[0x7fa4ae3f30e5]
[max1:06760] [ 2] /usr/lib/libcuda.so.1(+0x1727d6)[0x7fa4ae36a7d6]
[max1:06760] [ 3] /usr/lib/libcuda.so.1(cuEventDestroy_v2+0x52)[0x7fa4ae346f42]
[max1:06760] [ 4] /usr/local/lib/libmca_common_cuda.so.1(mca_common_cuda_fini+0xa3)[0x7fa4b60e6993]
[max1:06760] [ 5] /usr/local/lib/openmpi/mca_btl_tcp.so(+0x4f06)[0x7fa4b4e30f06]
[max1:06760] [ 6] /usr/local/lib/libopen-pal.so.6(mca_base_component_close+0x19)[0x7fa4b8bf9709]
[max1:06760] [ 7] /usr/local/lib/libopen-pal.so.6(mca_base_components_close+0x42)[0x7fa4b8bf9782]
[max1:06760] [ 8] /usr/local/lib/libmpi.so.1(+0x7d365)[0x7fa4b9186365]
[max1:06760] [ 9] /usr/local/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x7fa4b8c02a23]
[max1:06760] [10] /usr/local/lib/libopen-pal.so.6(mca_base_framework_close+0x63)[0x7fa4b8c02a23]
[max1:06760] [11] /usr/local/lib/libmpi.so.1(ompi_mpi_finalize+0x56d)[0x7fa4b914c9cd]
[max1:06760] [12] /usr/local/lib/python2.7/dist-packages/mpi4py/MPI.so(+0x28e04)[0x7fa4b9407e04]
[max1:06760] [13] python(Py_Finalize+0x1a6)[0x42fb0f]
[max1:06760] [14] python(Py_Main+0xbed)[0x46ac10]
[max1:06760] [15] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fa4ba2e5ec5]
[max1:06760] [16] python[0x57497e]
[max1:06760] *** End of error message ***