You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cub-users
In all the CUB examples, the ScanOps are elementary operations on scalar types T. In my problem, T is a matrix type. The operation is not quite as expensive as matrix multiplication, but it's nonetheless preferable to utilize multiple threads (in the same block or perhaps even warp) to compute it in parallel. Is there an elegant way to accomplish this while utilizing CUB for the overall scan?