some performance questions about lab 2b gpu part

Skip to first unread message

Junjie Wu

Aug 6, 2010, 3:12:55 PM8/6/10
to VSCSE Many-core Processors 2010
Hello everyone,

I just got some results from my lab 2b 3.1:

Number of extra atoms: 424
Length of neighborhood bins list: 343
IO: 0.181133
GPU: 0.022072
Copy: 0.123798
Driver: 0.022071
Compute: 1.181536
CPU/GPU Overlap: 0.022072

My first question is that how I could determine the overall runtime
from these results. Is that the max(Compute, GPU) + IO + Copy +

I observed that the cpu computing time is much larger than the gpu
time. Under such situations, if in really application, it won't make
sense to further optimize the gpu code by tiling, sharing the bins,

Heeseok Kim

Aug 6, 2010, 6:18:26 PM8/6/10
Congratulations on successfully completing the lab 2b.

The binning process and getting neighborhood offset list are done on CPU. Also calculating atoms that are not binned will also be part of the CPU execution time(part 2 of the pseudo code of step 3). The "Compute" time below includes all of these. In this lab, CPU/GPU overlapping time is not meaningful as much, because the code is serialized. So for the overall execution time, IO + GPU + Copy + Compute would be very close approximation.

Your comment on the optimization is generally correct according to Amdahl's law. However, the lab is deliberately designed not to be a huge burden to the system. You might still need to optimize the kernel further if you understand how big the realistic data could be. An example is lab 3, where even with a few gigabytes of memory, it failed to work.

Further optimization is an open question to lab 2b. By the time you get the idea of the algorithm, it is up to you to play with it. In that sense, I believe you are on the right track.
Reply all
Reply to author
0 new messages