Large operator in scan

18 views
Skip to first unread message

Shiva Kaul

unread,
Dec 21, 2020, 4:26:07 PM12/21/20
to cub-users
In all the CUB examples, the ScanOps are elementary operations on scalar types T. In my problem, T is a matrix type. The operation is not quite as expensive as matrix multiplication, but it's nonetheless preferable to utilize multiple threads (in the same block or perhaps even warp) to compute it in parallel. Is there an elegant way to accomplish this while utilizing CUB for the overall scan?

Thanks,
Shiva
Reply all
Reply to author
Forward
0 new messages