efficient compaction algorithm

4 views

Skip to first unread message

Lung-Sheng Chien

unread,

Mar 4, 2011, 8:09:30 AM3/4/11

to pfacForum

PFAC r1.0 provides a function PFAC_matchFromDeviceReduce() which will
compress matched results and corresponding positions. The first
version of PFAC_matchFromDeviceReduce() is combination of
PFAC_matchFromDevice() and Thrust:inclusive_scan,
Thrust::reduce_by_key.
However memory usage of Thrust is pretty high, we can not run 128MB
input on GTX480.
Second version of PFAC_matchFromDeviceReduce() uses another
PFAC_matchFromDevice() which will compress matched results locally
(inside thread block) and then do global compression again by
Thrust::inclusive_scan and a kernel which can do in-place compaction.

Question: does Thrust provide an in-place compaction routine? If not,
could we extend our kernel to Thrust?

Reply all

Reply to author

Forward

0 new messages