Caffe Windows: pooling_layer.cu:212 Check failed: error==cudaSuccess(8vs.0) invalid device function

170 views
Skip to first unread message

luis.lop...@gmail.com

unread,
Apr 20, 2017, 4:22:37 AM4/20/17
to Caffe Users
Dear all,
I am getting this problem when I run Caffe prebuild binaries "Visual Studio 2015,CUDA 8.0, Python 2.7" from Windows branch. I have a Nvidia GTX 1060 GPU updated to the latest driver available, cuda 8.0 and cuDNN 5.1 installed. I read a lot of information about this issue on Linux OS but nothing to solve it on Windows. Both cuda and caffe device queries runs ok. Here the outputs:


caffe device_query -gpu 0

I0420 10:20:35.530796  1116 caffe.cpp:138] Querying GPUs 0
I0420 10:20:37.060575  1116 common.cpp:187] Device id:                     0
I0420 10:20:37.061079  1116 common.cpp:188] Major revision number:         6
I0420 10:20:37.061578  1116 common.cpp:189] Minor revision number:         1
I0420 10:20:37.061578  1116 common.cpp:190] Name:                          GeForce GTX 1060
I0420 10:20:37.061578  1116 common.cpp:191] Total global memory:           3221225472
I0420 10:20:37.061578  1116 common.cpp:192] Total shared memory per block: 49152
I0420 10:20:37.062592  1116 common.cpp:193] Total registers per block:     65536
I0420 10:20:37.062592  1116 common.cpp:194] Warp size:                     32
I0420 10:20:37.063583  1116 common.cpp:195] Maximum memory pitch:          2147483647
I0420 10:20:37.063583  1116 common.cpp:196] Maximum threads per block:     1024
I0420 10:20:37.064587  1116 common.cpp:197] Maximum dimension of block:    1024, 1024, 64
I0420 10:20:37.064587  1116 common.cpp:200] Maximum dimension of grid:     2147483647, 65535, 65535
I0420 10:20:37.065589  1116 common.cpp:203] Clock rate:                    1670500
I0420 10:20:37.065589  1116 common.cpp:204] Total constant memory:         65536
I0420 10:20:37.066090  1116 common.cpp:205] Texture alignment:             512
I0420 10:20:37.066090  1116 common.cpp:206] Concurrent copy and execution: Yes
I0420 10:20:37.066090  1116 common.cpp:208] Number of multiprocessors:     10
I0420 10:20:37.066591  1116 common.cpp:209] Kernel execution timeout:      Yes

nvidia devicequery:

Device 0: "GeForce GTX 1060"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 3072 MBytes (3221225472 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1671 MHz (1.67 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 1060
Result = PASS

Can some one help me with this issue? Thanks for your support
BR

Paul Delamusica

unread,
Apr 20, 2017, 10:19:05 AM4/20/17
to Caffe Users
Saw that on windows. Fixed by re-building. Mine is Visual Studio 2013,CUDA 7.5, Python 2.7.
Reply all
Reply to author
Forward
0 new messages