Why my __CUDA_ARCH__ is less than 200 on my sm_35 architecture GPU?

214 views
Skip to first unread message

Henggang Cui

unread,
Oct 8, 2015, 3:02:18 PM10/8/15
to Caffe Users
Hi,

My GPU is Tesla K20C, and when I do cudaGetDeviceProperties(), my "major" is 3 and "minor" is 5. So it's sm_35 architecture (Am I correct?).

But in the "device_alternate.hpp":

// CUDA: thread number configuration.
// Use 1024 threads per block, which requires cuda sm_2x or above,
// or fall back to attempt compatibility (best of luck to you).
#if __CUDA_ARCH__ >= 200
    const int CAFFE_CUDA_NUM_THREADS = 1024;
#else
    const int CAFFE_CUDA_NUM_THREADS = 512;
#endif

My "__CUDA_ARCH__ " is less than 200, so my "CAFFE_CUDA_NUM_THREADS" becomes 512. What is wrong here? Why it thinks my architecture is below sm_2x?

Thanks,
Cui

Felix Abecassis

unread,
Oct 10, 2015, 6:54:41 PM10/10/15
to Caffe Users
Hello,

It's a known bug, this macro should not be used in host code.
https://github.com/BVLC/caffe/issues/418
But AFAIK, this does not impact performance.

Henggang Cui

unread,
Oct 10, 2015, 7:08:55 PM10/10/15
to Caffe Users
I see. So will I get better performance if I use 1024 threads per block?

Thanks,
Cui

Felix Abecassis

unread,
Oct 10, 2015, 7:14:42 PM10/10/15
to Caffe Users
I don't think so. But feel free to try :)

Jonathan L Long

unread,
Nov 5, 2015, 12:09:39 AM11/5/15
to Felix Abecassis, Caffe Users

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/e759ce35-3105-4aef-994f-8d41ff9968d6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages