Captain?? Cuda Error "(8 vs. 0) invalid device function" when using GPU (GeForce GTX 970)?!

Skip to first unread message

Ben Jones

Oct 21, 2015, 11:10:12 AM10/21/15
to Caffe Users

I activated GPU mode using

caffe.set_device( 0 )

but when running the scripts, it had the following error:

F1020 14:54:03.374338 10546] Check failed: error == cudaSuccess (8 vs. 0)  invalid device function 
*** Check failure stack trace: *** is simply "CUDA_POST_KERNEL_CHECK", so the error seems to occurr in the lines above.

Does someone has an idea whats the problem?

CPU mode works fine. I use a python script to get the stuff running.

CUDA DeviceQuery gives:

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 970"
  CUDA Driver Version / Runtime Version          7.0 / 6.5
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 4095 MBytes (4294246400 bytes)
  (13) Multiprocessors, (128) CUDA Cores/MP:     1664 CUDA Cores
  GPU Max Clock rate:                            1253 MHz (1.25 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 1835008 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

I installed cuBLAS, OpenCV etc. via Synaptic Package Manager of ubuntu, so that should be fine.
If you need more infos just ask!

Thanks for help!

Ben Jones

Oct 26, 2015, 6:04:12 AM10/26/15
to Caffe Users
I posted the same problem on StackOverflow where I'm updating my steps that (hopefully) solve the problem:

Reply all
Reply to author
0 new messages