cuFloatComplex cub::BlockReduce from within CUDA kernel

45 views
Skip to first unread message

rhaney

unread,
Apr 15, 2019, 9:12:15 AM4/15/19
to cub-users
Hello all,

I know that you can define a custom functor when calling CUB device reduce from HOST function, but I am trying to call sum from within a CUDA kernel and am unable to do something similar. Can anyone tell me if and how to do this? The following is code snippet from a defined CUDA kernel where I would like to replace all float-based call(s) with cuFloatComplex-based call(s) if possible:

__global__ myKernel(...){
 
// would like a single cuFloatComplex 'mag' variable rather than float 'mag_x' and 'mag_y'
  cuFloatComplex mag
;
 
float mag_x = 0.0f;
 
float mag_y = 0.0f;
 
...
 
typedef cub::BlockReduce<float, 256> BlockReduceT;  // would like to make this 'cuFloatComplex' rather than 'float'

  __shared__
typename BlockReduceT::TempStorage temp_storage;
 
...
 
// The following 'Sum' is where I would like to sum a single cuFloatComplex variable 'mag'
 
float aggregatex = BlockReduceT(temp_storage).Sum(mag_x);
 
float aggregatey = BlockReduceT(temp_storage).Sum(mag_y);
 
...
}


Thank you for any help or hints.


rhaney

unread,
Apr 15, 2019, 12:16:22 PM4/15/19
to cub-users
I figured it out - nothing special but I will post the code below for anyone who may be looking for something similar.

struct customSum{
    __device__ __forceinline__
    cuFloatComplex
operator()(const cuFloatComplex &a, const cuFloatComplex &b) {
     
return make_cuFloatComplex(cuCrealf(a)+cuCrealf(b),cuCimagf(a)+cuCimagf(b));
   
}
};
...
__global__
void myKernel(...){
  customSum sum_op
;
  cuFloatComplex mag
;
  mag
.x = mag.y = 0.0f;
 
...
 
typedef cub::BlockReduce<cuFloatComplex, 256> BlockReduceT;

 __shared__
typename BlockReduceT::TempStorage temp_storage;
 
...

  cuFloatComplex aggregate
= BlockReduce(temp_storage).Reduce(mag, sum_op);  ...
}
Reply all
Reply to author
Forward
0 new messages