You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to pfacForum
PFAC r1.0 provides a function PFAC_matchFromDeviceReduce() which will
compress matched results and corresponding positions. The first
version of PFAC_matchFromDeviceReduce() is combination of
PFAC_matchFromDevice() and Thrust:inclusive_scan,
Thrust::reduce_by_key.
However memory usage of Thrust is pretty high, we can not run 128MB
input on GTX480.
Second version of PFAC_matchFromDeviceReduce() uses another
PFAC_matchFromDevice() which will compress matched results locally
(inside thread block) and then do global compression again by
Thrust::inclusive_scan and a kernel which can do in-place compaction.
Question: does Thrust provide an in-place compaction routine? If not,
could we extend our kernel to Thrust?