Weird Behaviour with CUDA and Shared Memory

0 views
Skip to first unread message

Simeon Verzijl

unread,
Aug 21, 2016, 8:09:14 AM8/21/16
to Numba Public Discussion - Public
Hi All,

@cuda.jit
def kurt_gpu(weight,var_np,out):
    sA
= cuda.shared.array(shape=1024, dtype=float32)
   
    tid
= cuda.threadIdx.x
    inc
= tid // weight.shape[0]
    mul
= 1024 // weight.shape[0]
   
    w
= int(cuda.blockIdx.x * mul + inc)
    x
= cuda.blockIdx.y
    y
= cuda.blockIdx.z
    z
= tid % weight.shape[0]
       
    cuda
.syncthreads()
   
   
if w < weight.shape[0]:
        m
= float32(0)
       
for n in range(var_np.shape[1]):
            m
+= weight[w]*weight[x]*weight[y]*weight[z]*var_np[w,n]*var_np[x,n]*var_np[y,n]*var_np[z,n]
        m
/= var_np.shape[1]
        sA
[tid] = m
     
else:
        sA
[tid] = 0
   
    cuda
.syncthreads()
   
    s
= int(cuda.blockDim.x/2)
   
while s > 0:
       
if tid < s:
            sA
[tid] += sA[tid + s]
        cuda
.syncthreads()
        s
>>= 1
   
   
if tid == 0:
       
out[cuda.blockIdx.x,x,y] = sA[0]


This is the function that calls it:
@jit(nopython=False)
def wrap2():
    mul = int(1024/b.shape[0])
    threadsperblock= mul*b.shape[0]
    bX = (int(b.shape[0]/mul)+1,b.shape[0],b.shape[0])
    out = cuda.device_array(bX)
    kurt_gpu[bX,threadsperblock](b,c,out)
    out_host = out.copy_to_host()
    return out_host.sum()

And this is the data:
a = np.random.rand(60,40)
b = np.random.rand(60)
c=a-np.expand_dims(a.mean(axis=-1),axis=-1)


The first problem I have is this:
`    if w < weight.shape[0]:
        m = float32(0)
        for n in range(var_np.shape[1]):
            m += weight[w]*weight[x]*weight[y]*weight[z]*var_np[w,n]*var_np[x,n]*var_np[y,n]*var_np[z,n]
        m /= var_np.shape[1]
        sA[tid] = m
     else:
        sA[tid] = 0

The program seems to ignore the else statement in about half of the time its suppose to call it. sA[tid] ends up having no value written to it and what I see is garbage data from previous blocks.

The only way I could fix it is as follows though I'm not sure why:
    if w < weight.shape[0]:
        m = float32(0)
        for n in range(var_np.shape[1]):
            m += weight[w]*weight[x]*weight[y]*weight[z]*var_np[w,n]*var_np[x,n]*var_np[y,n]*var_np[z,n]
        m /= var_np.shape[1]
        sA[tid] = m

    if w > weight.shape[0]:
        sA[tid] = 0
        
    if w == weight.shape[0]:
        sA[tid] = 0

I am now having problems with this:
    s = int(cuda.blockDim.x/2)
    
while s > 0:
        
if tid < s:
            sA
[tid] += sA[tid + s]
        cuda
.syncthreads()
        s 
>>= 1
    
    
if tid == 0:
        
out[cuda.blockIdx.x,x,y] = sA[0]

All this is supposed to do is sum all the information in sA. All the values of 'out' are correct, except for out[-1,0,0] which is wrong and appears to be garbage data again as it changes every time I run the code. I'm very confused.

Any assistance would be much appreciated, I am using Numba 0.27 with Miniconda.

Siu Kwan Lam

unread,
Aug 22, 2016, 12:04:37 PM8/22/16
to Numba Public Discussion - Public
I suspect that the threadperblock calculation may not be what you intended.  You will be using 1020 threads-per-block for your sample data size, but I think you are intending to have 1024 threads-per-block.  

I suggest using the cuda-simulator feature to debug.  It is enabled by setting the environment variable NUMBA_ENABLE_CUDASIM=1.  This will simulate the cuda kernel execution in the Python interpreter; thus, you will be able to print and debug it like regular python code.  I'd suggest reducing the data-size and grid/block size first so that the kernel does not run too long in the simulator.  See http://numba.pydata.org/numba-doc/latest/cuda/simulator.html for more info.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/c07416f6-bd23-4900-973a-54bc2f3cf540%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.
--
Siu Kwan Lam
Software Engineer
Continuum Analytics
Reply all
Reply to author
Forward
0 new messages