Nasser Anssari
unread,Aug 7, 2010, 3:37:13 AM8/7/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to VSCSE Many-core Processors 2010
Hopefully these comments will help sharpen the big picture of Lab 3.
The first part of the lab sheds some light on the limitations of the
binning approach from Lab 2B when used with non-uniform data
distributions. Experimenting with bin capacities starting at 16 with
increments of 32, this part explores the trade-off between the memory
requirements of that binning approach and the size of the data portion
processed on the GPU, and thus the execution time. The high memory
overhead of this approach proves to limit its performance or even
hinder its execution completely as is the case with a bin capacity of
112 in Table 2.
The second part of the lab shows that a different binning approach
which amortizes this memory overhead can offer significantly better
performance. The relatively smaller memory requirements expand the
range of bin capacities which can be used in this part beyond the
limits of the first one, and Table 4 lists a subset of values to prove
the concept.
The execution information to be recorded in Table 2 and Table 4 for
the two parts are symmetrical to make the comparison readily
meaningful.