Numba-cuda randomly fails to compile with the Nvidia driver version 381.22

5 views
Skip to first unread message

Rémi Lehe

unread,
Jul 23, 2017, 4:04:53 PM7/23/17
to Numba Public Discussion - Public
Note: this is an update of a previous post ("numba/cuda driver error on GeForce GTX 1080 Ti"), upon further investigation of the issue.

When using the Nvidia driver version 381.22, I get occasionally get the following error:
"""
  File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 296, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuLinkAddData results in CUDA_ERROR_UNKNOWN
"""
I now am pretty confident that this is due to the driver version, since I happen to have acess to a K80 GPU with driver version 381.22, and to another K80 GPU with driver version 375.26. The above error never occurs on the GPU with version 375.26, but does happen on the GPU with version 381.22. I also have access to a GeForce GTX 1080 Ti with driver version 381.22, and the same bug happens there. (hence the original title of the post, which erroneously attributed this bug to the GPU model)

Unfortunately the error is difficult to reproduce since it happens quite randomly. It typically happens when trying to compile a function, but (for our code which contains maybe 20 to 40 cuda-jitted functions) it does not always crash at the same function. (And quite often it does not crash at all.) Still, I thought I would still report it here, in case someone got the same error, or actually knows what causes it and how to fix it.

Thanks in advance for your help,

Abhinav Singh

unread,
Nov 6, 2017, 8:41:10 AM11/6/17
to Numba Public Discussion - Public

I am facing the same Issue. I cannot even compile once. The code was fine a few weeks back.

Stanley Seibert

unread,
Nov 6, 2017, 10:19:14 AM11/6/17
to Numba Public Discussion - Public
What CUDA driver version do you have?  Was it updated recently?

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/dfe87a8e-85c0-4f4d-b14e-8ba65d428948%40continuum.io.

lom...@southkentschool.org

unread,
Nov 6, 2017, 3:15:12 PM11/6/17
to Numba Public Discussion - Public
I have the same issue.  i tried both   382.33 and 388.13   and both have the [999] error


On Monday, November 6, 2017 at 10:19:14 AM UTC-5, Stanley Seibert wrote:
What CUDA driver version do you have?  Was it updated recently?
On Sun, Nov 5, 2017 at 12:11 PM, Abhinav Singh <abhinav...@gmail.com> wrote:

I am facing the same Issue. I cannot even compile once. The code was fine a few weeks back.
On Monday, 24 July 2017 01:34:53 UTC+5:30, Rémi Lehe wrote:
Note: this is an update of a previous post ("numba/cuda driver error on GeForce GTX 1080 Ti"), upon further investigation of the issue.

When using the Nvidia driver version 381.22, I get occasionally get the following error:
"""
  File "/global/scratch/rlehe/miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py", line 296, in _check_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [999] Call to cuLinkAddData results in CUDA_ERROR_UNKNOWN
"""
I now am pretty confident that this is due to the driver version, since I happen to have acess to a K80 GPU with driver version 381.22, and to another K80 GPU with driver version 375.26. The above error never occurs on the GPU with version 375.26, but does happen on the GPU with version 381.22. I also have access to a GeForce GTX 1080 Ti with driver version 381.22, and the same bug happens there. (hence the original title of the post, which erroneously attributed this bug to the GPU model)

Unfortunately the error is difficult to reproduce since it happens quite randomly. It typically happens when trying to compile a function, but (for our code which contains maybe 20 to 40 cuda-jitted functions) it does not always crash at the same function. (And quite often it does not crash at all.) Still, I thought I would still report it here, in case someone got the same error, or actually knows what causes it and how to fix it.

Thanks in advance for your help,

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.

To post to this group, send email to numba...@continuum.io.

Stanley Seibert

unread,
Nov 6, 2017, 3:27:17 PM11/6/17
to Numba Public Discussion - Public
OK, I just checked, and our Linux development systems have NVIDIA driver versions 375.66 and 367.57.  We'll try upgrading one of our machines to a driver both before and after CUDA 9 to see if we can reproduce it.


To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.

To post to this group, send email to numba...@continuum.io.

Stanley Seibert

unread,
Nov 6, 2017, 4:05:04 PM11/6/17
to Numba Public Discussion - Public
I've updated our Linux test system to NVIDIA driver 384.90 (most recent packaged for Ubuntu 16.04) and verified the Numba CUDA unit tests pass with both cudatoolkit 7.5 and 8.0.

Can anyone who has seen the problem reported in this thread check if the unit tests pass for you?  You can run the CUDA tests with the following command if you have numba installed:

python -m numba.runtests numba.cuda.tests

I'm also checking our Windows system to see which driver versions it is running, but upgrading that system is not something I can do immediately.

Stanley Seibert

unread,
Nov 6, 2017, 5:03:50 PM11/6/17
to Numba Public Discussion - Public
Updating our Windows 7 test system to NVIDIA Driver 385.08, it also seems to pass all the Numba CUDA tests.  (Note that we can only run the tests on a Tesla K40 card on Windows, which uses a different driver subsystem than GeForce cards on Windows.)

I'm curious if anyone can trigger this failure with the Numba CUDA unit tests.

Abhinav Singh

unread,
Nov 7, 2017, 5:59:11 AM11/7/17
to Numba Public Discussion - Public
Hi,
The test failed for me. I am attaching the results of the test.

numba_test.txt

Rémi Lehe

unread,
Nov 7, 2017, 10:59:04 AM11/7/17
to Numba Public Discussion - Public
In my case, the driver was recently upgraded from version 381.22 to 384.59.
With this new version (i.e. 384.59), the tests do pass. (I get: "Ran 372 tests in 94.104s OK (skipped=2)")

Abhinav Singh

unread,
Nov 7, 2017, 12:05:41 PM11/7/17
to Numba Public Discussion - Public
I have tried everything. All available drivers 378,381,384,387.  with CUDA 7.5,8,9.
I have tried reinstalling CUDA and Anaconda. 
I thought Nvidia is no longer supporting my GPU-GTX870M-kepler
I installed anaconda and numba on Windows and everything is working as expected.

Rémi Lehe

unread,
Oct 30, 2018, 7:58:45 PM10/30/18
to Numba Public Discussion - Public
I recently encounter the same problem again. ("Call to cuLinkAddData results in CUDA_ERROR_UNKNOWN")
Thanks to the help of the staff for the GPU cluster on which I was running, I was able to fix this by simply removing the directory ".nv" in my $HOME folder.
I wonder if this solution also works for other people.

Stanley Seibert

unread,
Oct 31, 2018, 11:44:09 AM10/31/18
to numba...@continuum.io
Huh, this is not something I was aware of.  I've opened a github issue to put this in the FAQ:


Reply all
Reply to author
Forward
0 new messages