Hello everyone,
first of all, I have to thank you for the amazing job you all are doing.
I'm trying to implement a simple multiplication operation using cuda. I'm encountering an odd problem in Numba, and I hope you can help me in someway.
This are the line of code I'm using:
import numpy as np
from timeit import default_timer as timer
from numba import jit, guvectorize, int32, int64, float64
from numba import cuda
@guvectorize(['void(int32[:,:], int32[:,:])'], '(m,n)->(m,n)', target='cuda', nopython=True)
def f_vec_loops(x, ret):
nx = len(ret)
ny = len(ret[0])
for k in range(1000):
for i in range(nx):
for j in range(ny):
ret[i, j] += x[i, j]
x = 300
y = 400
a = np.ones([x, y], dtype='int32')
ret = np.zeros([x, y], dtype='int32')
a_cuda = cuda.to_device(a)
ret_cuda = cuda.to_device(ret)
s = timer()
f_vec_loops(a_cuda, ret_cuda)
e = timer()
print(e-s)
hary = ret_cuda.copy_to_host()
print(hary)
It works well for small values of x and y, i.e 30/40. However, when I increase the values, such as 300/400 in the above code, I receive this error:
Traceback (most recent call last):
File "test.py", line 29, in <module>
hary = ret_cuda.copy_to_host()
File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 237, in copy_to_host
_driver.device_to_host(hostary, self, self.alloc_size, stream=stream)
File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1606, in device_to_host
fn(host_pointer(dst), device_pointer(src), size, *varargs)
File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 288, in safe_cuda_api_call
self._check_error(fname, retcode)
File "C:\ProgramData\Anaconda3\envs\ncuda\lib\site-packages\numba\cuda\cudadrv\driver.py", line 323, in _check_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemcpyDtoH results in UNKNOWN_CUDA_ERROR
It seems that the problem is *.copy_to_host(). I have to admit, I don't understand where is the problem.
I working on a Windows 10 PC, with an Nvidia GeForce GTX 970.
Thank you in advance for your help.
Andrea