numba/cuda driver error on GeForce GTX 1080 Ti

Rémi Lehe

unread,

Jul 18, 2017, 4:33:57 PM7/18/17

to Numba Public Discussion - Public

Hi,

I am getting a strange error with numba/cuda on a GeForce GTX 1080 Ti GPU (with driver version 381.22):

Most of the time, my code compiles and runs fine. But sometimes (maybe 1 out of 6 ; apparently completely randomly) it does not and I get the following error:

Traceback (most recent call last):
File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 127
8, in add_ptx
    ptxbuf, len(ptx), namebuf, 0, None, None)
File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 259
, in safe_cuda_api_call
    self._check_error(fname, retcode)
File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 296
, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuLinkAddData results in CUDA_ERROR_UNKNOWN

(The error is then caught by some of the numba code and finally returns a numba.cuda.cudadrv.driver.LinkerError.)

Note that I also have access to K20 and K80 GPUs, and that the above error never occurred on these types of GPU. Any clue as to what is happening here?

Thanks for your help!

PS: For what it's worth, the above error occurs both with cudatoolkit 8.0-0 and cudatoolkit 7.5-1 (from the numba channel).

Stanley Seibert

unread,

Jul 18, 2017, 5:12:23 PM7/18/17

to Numba Public Discussion - Public

This seems pretty strange. We've been testing Numba with a GTX 1080 (not Ti) and haven't run into something like this.

Just to check: Are you using multiple CPU threads?

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/8bdf794d-06fd-4893-8b07-a7a7775a33e1%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Rémi Lehe

unread,

Jul 18, 2017, 6:36:56 PM7/18/17

to Numba Public Discussion - Public

Thanks for the quick answer.

I am using only one CPU thread. Sometimes I use several MPI processes (one per GPU on a 4-GPU node ; each MPI process selects a different GPU with cuda.select_device), but it seems that the error happens whether or not MPI is used.

Btw, what is the driver version on the GTX 1080 that you have been using. Could it be a problem with the actual driver version that I am using?

Thanks for your help!

Remi

On Tuesday, July 18, 2017 at 2:12:23 PM UTC-7, Stanley Seibert wrote:

This seems pretty strange. We've been testing Numba with a GTX 1080 (not Ti) and haven't run into something like this.

Just to check: Are you using multiple CPU threads?

On Tue, Jul 18, 2017 at 3:33 PM, Rémi Lehe <remi...@gmail.com> wrote:

Hi,

I am getting a strange error with numba/cuda on a GeForce GTX 1080 Ti GPU (with driver version 381.22):

Most of the time, my code compiles and runs fine. But sometimes (maybe 1 out of 6 ; apparently completely randomly) it does not and I get the following error:

Traceback (most recent call last):
File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 127
8, in add_ptx
    ptxbuf, len(ptx), namebuf, 0, None, None)
File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 259
, in safe_cuda_api_call
    self._check_error(fname, retcode)
File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 296
, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuLinkAddData results in CUDA_ERROR_UNKNOWN

(The error is then caught by some of the numba code and finally returns a numba.cuda.cudadrv.driver.LinkerError.)

Note that I also have access to K20 and K80 GPUs, and that the above error never occurred on these types of GPU. Any clue as to what is happening here?

Thanks for your help!

PS: For what it's worth, the above error occurs both with cudatoolkit 8.0-0 and cudatoolkit 7.5-1 (from the numba channel).

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.

To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.

Rémi Lehe

unread,

Jul 23, 2017, 4:07:17 PM7/23/17

to Numba Public Discussion - Public

It seems that this is indeed due to the driver version. I opened a new post to reflect this and provided more details.

Reply all

Reply to author

Forward