CUDA_ERROR_OUT_OF_MEMORY

2 views
Skip to first unread message

Chris Uchytil

unread,
Jun 18, 2017, 9:01:28 PM6/18/17
to Numba Public Discussion - Public
I am trying to transfer a 700x700x700x3 float32 filled matrix to the device. It is 1029000000 in size. Using nbcuda.current_context().get_memory_info() it indicates that I have 7328563200 memory available before I attempt to transfer it (I have a GTX 1080). When I try to transfer the array I get this error indicating I am out of memory. I have CUDA toolkit 8.0, numba 0.33.0, and NVIDIA driver 375.66.

File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/devices.py", line 212, in _require_cuda_context
    return fn(*args, **kws)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/api.py", line 58, in to_device
    to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 459, in auto_device
    devobj = from_array_like(obj, stream=stream)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 422, in from_array_like
    writeback=ary, stream=stream, gpu_data=gpu_data)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/devicearray.py", line 96, in __init__
    gpu_data = devices.get_context().memalloc(self.alloc_size)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 674, in memalloc
    self._attempt_allocation(allocator)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 664, in _attempt_allocation
    allocator()
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 672, in allocator
    driver.cuMemAlloc(byref(ptr), bytesize)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 288, in safe_cuda_api_call
    self._check_error(fname, retcode)
  File "/home/uchytilc/anaconda2/lib/python2.7/site-packages/numba/cuda/cudadrv/driver.py", line 323, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

Stanley Seibert

unread,
Jun 21, 2017, 2:50:31 PM6/21/17
to Numba Public Discussion - Public
Wouldn't a 700x700x700x3 float32 array be 4116000000 bytes?  That should still fit, but it is now greater than 50% of the free space, which might help us understand if there is some temporary double allocation causing problems.

There are old rumors of 2 GB per allocation limits on Windows with devices also running the display, but that doesn't explain what you are seeing.  How much smaller do you have to go for it to work?


--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/fe971cde-9696-45e1-af9c-5f98615c111f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Chris Uchytil

unread,
Jun 21, 2017, 3:36:05 PM6/21/17
to numba...@continuum.io
You're correct, I forgot to multiply by the 4 bytes per value. I am away for the week but i will respond with what I find when I get back.

Chris Uchytil

unread,
Jul 1, 2017, 9:26:33 PM7/1/17
to Numba Public Discussion - Public
I looked into the issue more. It seems that the CUDA_ERROR_OUT_OF_MEMORY error prompts when the array totals about 3GB in size, I got to 630x630x630x3 before it stopped working. I also created some test code that transferred an array to the GPU, but this wasn't limited by the 3GB cap. It got to 7GB (the total free memory on the GPU at the time) before prompting me with an error. This is on Ubuntu 16.04 btw.

stuart

unread,
Jul 6, 2017, 1:29:02 PM7/6/17
to Numba Public Discussion - Public
Hi,

A few questions:

1. Is this a dedicated card for GPU compute?
2. If you run the code with nvprof does the call count of cuMemAlloc and cuMemcpyHtoD tally with what is expected for a copy to host?
3. Do you have a reproducer?

Thanks,

-- 
stuart

Chris Uchytil

unread,
Jul 7, 2017, 5:48:56 PM7/7/17
to Numba Public Discussion - Public
I took the code that was causing the issue and built a tester around it. The tester code isn't limited by the 630x630x630x3 like the main code is. The main code is running in interop with OpenGL within PySide so I don't think I can run nvprof on it live. The 1080 that is running the code is also running my monitor. It seems like it might be an interop issue, as I can't reproduce it using just the function that cause the issue in the tester code. I'll just limit the size of the objects I am dealing with for the time being.
Reply all
Reply to author
Forward
0 new messages