efficient compaction algorithm

4 views
Skip to first unread message

Lung-Sheng Chien

unread,
Mar 4, 2011, 8:09:30 AM3/4/11
to pfacForum
PFAC r1.0 provides a function PFAC_matchFromDeviceReduce() which will
compress matched results and corresponding positions. The first
version of PFAC_matchFromDeviceReduce() is combination of
PFAC_matchFromDevice() and Thrust:inclusive_scan,
Thrust::reduce_by_key.
However memory usage of Thrust is pretty high, we can not run 128MB
input on GTX480.
Second version of PFAC_matchFromDeviceReduce() uses another
PFAC_matchFromDevice() which will compress matched results locally
(inside thread block) and then do global compression again by
Thrust::inclusive_scan and a kernel which can do in-place compaction.

Question: does Thrust provide an in-place compaction routine? If not,
could we extend our kernel to Thrust?
Reply all
Reply to author
Forward
0 new messages