Hi,
My GPU is Tesla K20C, and when I do cudaGetDeviceProperties(), my "major" is 3 and "minor" is 5. So it's sm_35 architecture (Am I correct?).
But in the "device_alternate.hpp":
// CUDA: thread number configuration.
// Use 1024 threads per block, which requires cuda sm_2x or above,
// or fall back to attempt compatibility (best of luck to you).
#if __CUDA_ARCH__ >= 200
const int CAFFE_CUDA_NUM_THREADS = 1024;
#else
const int CAFFE_CUDA_NUM_THREADS = 512;
#endif
My "__CUDA_ARCH__ " is less than 200, so my "CAFFE_CUDA_NUM_THREADS" becomes 512. What is wrong here? Why it thinks my architecture is below sm_2x?
Thanks,
Cui