numpy within a cuda kernel

floria...@t-online.de

unread,

Jan 13, 2017, 9:57:36 AM1/13/17

to Numba Public Discussion - Public

Hi,

i wondered wether i can use numpy within a cuda kernel?

@cuda.jit
def matmul(A, B, C):
    """Perform square matrix multiplication of C = A * B
    """
    i, j = cuda.grid(2)
    if i < C.shape[0] and j < C.shape[1]:
        tmp = 0.
        for k in range(A.shape[1]):
            tmp += A[i, k] * B[k, j]
        C[i, j] = tmp
        a = math.sin(1.23456)
        b = np.sin(1.23456)

I added the last two lines, it workes fine with "a = math.sin(1.23456)" but i get an error for b = np.sin(1.23456)

numba.errors.UntypedAttributeError: Failed at nopython (nopython frontend)
Unknown attribute 'sin' of type Module(<module 'numpy' from 'C:\\Python\\WinPython-64bit-3.4.4.1\\python-3.4.4.amd64\\lib\\site-packages\\numpy\\__init__.py'>)
File "test numba.py", line 80
[1] During: typing of get attribute at C:\...\test numba.py (80)

I want to work with arrays within a kernel or within a divice. It should look like this:

@cuda.jit(device=True)
def cross(a, b):
    c = np.empty((3,))
    c[0] = a[1]*b[2] - a[2]*b[1]
    c[1] = a[2]*b[0] - a[0]*b[2]
    c[2] = a[0]*b[1] - a[1]*b[0]
    return c

This gives me an error for "c = np.empty((3,))"
As far as i understood, all the variables (like tmp) will be in the 32-bit registers.
Is it possible to put (numpy) arrays on the registers, like

@cuda.jit
def test(A, B, C):
    a = np.ones((3, 3)) #array in the register

Many thanks!

Siu Kwan Lam

unread,

Jan 13, 2017, 12:01:41 PM1/13/17

to Numba Public Discussion - Public

Numba supports for CUDA is rather low-level and maps closely to the CUDA-C usage. This mapping means the high-level Numpy style API is very inefficient on CUDA hardware; thus, they are disabled. There are special functions to allocate resources on the CUDA hardware without dynamic memory allocation; see http://numba.pydata.org/numba-doc/latest/cuda/memory.html#cuda-shared-memory and http://numba.pydata.org/numba-doc/latest/cuda/memory.html#local-memory.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/78f0f4cd-00f5-4570-90c8-b7d5ab03528d%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--

Siu Kwan Lam

Software Engineer

Continuum Analytics

florian arbes

unread,

Jan 23, 2017, 9:38:10 AM1/23/17

to Numba Public Discussion - Public

Thank you!

If i want to force the compiler to allocate the variables on the registers (as long as there is no register spilling) I'll have to go like this: (?)

@cuda.jit(device=True)
def cross(ax, ay, az, bx, by, bz):
    cx = ay*bz - az*by
    cy = az*bx - ax*bz
    cz = ax*by - ay*bx
    return cx, cy, cz

As numpy is not supported, is there a way to use funktions existing in C++ as device function?

Thanks!

Siu Kwan Lam

unread,

Jan 24, 2017, 4:17:52 PM1/24/17

to Numba Public Discussion - Public

On Monday, January 23, 2017 at 8:38:10 AM UTC-6, florian arbes wrote:

Thank you!

If i want to force the compiler to allocate the variables on the registers (as long as there is no register spilling) I'll have to go like this: (?)

@cuda.jit(device=True)
def cross(ax, ay, az, bx, by, bz):
    cx = ay*bz - az*by
    cy = az*bx - ax*bz
    cz = ax*by - ay*bx
    return cx, cy, cz

This will certainly work. Alternatively, you can operation on array slices if the result can be passed as an argument. Basic array slicing works.

As numpy is not supported, is there a way to use funktions existing in C++ as device function?

Technically, yes, but the feature is not well developed. See this post: https://groups.google.com/a/continuum.io/d/msg/numba-users/KMdYnTYx_GM/oLFvnHo0AgAJ

Reply all

Reply to author

Forward