Multiprocessing - to_device and to_host launched by different process

0 views
Skip to first unread message

Brunno Goldstein

unread,
Aug 7, 2016, 3:53:24 PM8/7/16
to Numba Public Discussion - Public

Hi!


I'm working with numba and multiprocessing, where a process starts the device computation and sends the dA pointer (result of cuda.to_device call) to another process. The second process then will call copy_to_host() and work with the resulting data.


The problem that I am facing is that gpu_data holds some ctype pointers and they cannot be pickled by the queue that I am using. Do you have any idea about how can I handle that?


Thanks!


Best Regards,


Brunno Goldstein

Stanley Seibert

unread,
Aug 8, 2016, 10:15:25 AM8/8/16
to Numba Public Discussion - Public
You're bumping into a bigger issue, which is that CUDA device allocations are not portable between processes.  (In fact, pointers generally are not portable between processes, which is why ctypes pointers can't be serialized.)

There is a mechanism in CUDA (but not exposed in Numba) for interprocess communication of device allocations (cudaIpc*):

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539

We've been talking with the Dask developers about how to handle this more broadly, but don't have an ETA at the moment.  Siu might be able to propose a workaround in the meantime.  (I'll ping him to take a look at this thread.)

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/1b2d358a-5692-4e84-9a4e-05ce35f54f3d%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Brunno Goldstein

unread,
Aug 8, 2016, 5:17:48 PM8/8/16
to Numba Public Discussion - Public
Hi Stanley,
thank you for your quick response.

I'll take a look at the cudaIpc and wait to see if Siu has a workaround.

Best Regards,
Brunno

Em segunda-feira, 8 de agosto de 2016 11:15:25 UTC-3, Stanley Seibert escreveu:
You're bumping into a bigger issue, which is that CUDA device allocations are not portable between processes.  (In fact, pointers generally are not portable between processes, which is why ctypes pointers can't be serialized.)

There is a mechanism in CUDA (but not exposed in Numba) for interprocess communication of device allocations (cudaIpc*):

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g8a37f7dfafaca652391d0758b3667539

We've been talking with the Dask developers about how to handle this more broadly, but don't have an ETA at the moment.  Siu might be able to propose a workaround in the meantime.  (I'll ping him to take a look at this thread.)
On Sun, Aug 7, 2016 at 2:53 PM, Brunno Goldstein <dae...@gmail.com> wrote:

Hi!


I'm working with numba and multiprocessing, where a process starts the device computation and sends the dA pointer (result of cuda.to_device call) to another process. The second process then will call copy_to_host() and work with the resulting data.


The problem that I am facing is that gpu_data holds some ctype pointers and they cannot be pickled by the queue that I am using. Do you have any idea about how can I handle that?


Thanks!


Best Regards,


Brunno Goldstein

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.

Siu Kwan Lam

unread,
Aug 9, 2016, 3:02:27 PM8/9/16
to Numba Public Discussion - Public

--
Siu Kwan Lam
Software Engineer
Continuum Analytics

Brunno Goldstein

unread,
Aug 9, 2016, 11:10:42 PM8/9/16
to Numba Public Discussion - Public
Siu,

thanks for the code!

Unfortunately, I got the following error while running the cuda_ipc.py example:

Traceback (most recent call last):
  File "cuda_ipc.py", line 60, in <module>
    main()
  File "cuda_ipc.py", line 56, in main
    parent()
  File "cuda_ipc.py", line 12, in parent
    darr = cuda.to_device(arr)
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 257, in _require_cuda_context
    get_context()
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 240, in get_context
    return _runtime.get_or_create_context(devnum)
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 202, in get_or_create_context
    return self.push_context(self.gpus[devnum])
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 40, in __getitem__
    return self.lst[devnum]
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 26, in __getattr__
    numdev = driver.get_device_count()
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/driver.py", line 292, in get_device_count
    self.cuDeviceGetCount(byref(count))
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/driver.py", line 234, in __getattr__
    self.initialize()
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/driver.py", line 199, in initialize
    self._initialize_extras()
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/driver.py", line 213, in _initialize_extras
    call_cuIpcOpenMemHandle)
  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+81.g0a9d560-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/driver.py", line 250, in _wrap_api_call
    @functools.wraps(libfn)
  File "/home/goldstein/anaconda2/lib/python2.7/functools.py", line 33, in update_wrapper
    setattr(wrapper, attr, getattr(wrapped, attr))
AttributeError: 'CFunctionType' object has no attribute '__name__'

I don't know if I missed something when compiling the source code or if it's another thing.

Best Regards,
Brunno
 

Siu Kwan Lam

unread,
Aug 10, 2016, 9:55:58 AM8/10/16
to Numba Public Discussion - Public
Oh no, that's a python2.7 specific bug.  

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.

Siu Kwan Lam

unread,
Aug 10, 2016, 11:28:23 AM8/10/16
to Numba Public Discussion - Public
Should be fixed now.

Brunno Goldstein

unread,
Aug 14, 2016, 12:04:03 AM8/14/16
to Numba Public Discussion - Public
Siu,
sorry for the delay and thanks for the commit!


Since I'm using python 2.7+, I've made some changes into your cuda_ipc example code. Now, the problem is that I'm getting the CudaDriverError exception when the child process tries to work with the darr pointer.

Here is the traceback:


Traceback (most recent call last):

  File "/home/goldstein/anaconda2/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap

    self.run()

  File "/home/goldstein/anaconda2/lib/python2.7/multiprocessing/process.py", line 114, in run

    self._target(*self._args, **self._kwargs)

  File "tst.py", line 14, in worker

    with ipch as darr:

  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+82.g0d6fe0c-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devicearray.py", line 402, in __enter__

    return self.open()

  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+82.g0d6fe0c-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devicearray.py", line 392, in open

    dptr = self._ipc_handle.open(devices.get_context())

  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+82.g0d6fe0c-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 240, in get_context

    return _runtime.get_or_create_context(devnum)

  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+82.g0d6fe0c-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 199, in get_or_create_context

    return self.current_context

  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+82.g0d6fe0c-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/devices.py", line 125, in current_context

    assert driver.get_context().value == top.handle.value, (

  File "/home/goldstein/anaconda2/lib/python2.7/site-packages/numba-0.28.0.dev0+82.g0d6fe0c-py2.7-linux-x86_64.egg/numba/cuda/cudadrv/driver.py", line 313, in get_context

    raise CudaDriverError("CUDA initialized before forking")

CudaDriverError: CUDA initialized before forking




Best Regards,

Brunno

Siu Kwan Lam

unread,
Aug 15, 2016, 11:08:48 AM8/15/16
to Numba Public Discussion - Public
CUDA does not support forking after it is initialized.  You will need to delay the CUDA init (by avoiding any CUDA features) until you have forked the process.  Alternatively, spawn new process instead of fork().  

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
Reply all
Reply to author
Forward
0 new messages