Copying data from GPU is slow

56 views
Skip to first unread message

Cooper Benson

unread,
Jan 11, 2017, 4:43:33 PM1/11/17
to Caffe Users
I'm benchmarking a slightly modified version of the C++ classification example, and I'm getting some odd results. Converting and loading a 1920x1200 image onto the GPU and running the network forward take 0.06s and 0.02s respectively. However, copying the data off the GPU takes around 0.10s. For reference, the output layer copied is a 12x36x58 array, so about 100Kb of data. 

According to the little bandwidth tester included in the CUDA samples, my 970 is consistently doing over 12Gb/s in Device to Host transfers. So why is Caffe only moving data at 1Kb/s? It's a huge performance constraint, and I can't figure out where it's coming from. If anyone can offer some clarity on this issue I'd appreciate it.

Cooper Benson

unread,
Jan 11, 2017, 6:55:27 PM1/11/17
to Caffe Users
I just noticed two typos. Preprocessing and loading takes 0.006s and running the net forward takes 0.002s. I also just tested with NVIDIA's fork and there was no improvement.

Daniel Moodie

unread,
Jan 12, 2017, 3:48:40 PM1/12/17
to Caffe Users
How are you measuring these values?
It's possible that the GPU -> CPU copy is the first blocking call which means you wont get accurate timing from previous calls.
Reply all
Reply to author
Forward
0 new messages