cuFloatComplex cub::BlockReduce from within CUDA kernel

45 views

Skip to first unread message

rhaney

unread,

Apr 15, 2019, 9:12:15 AM4/15/19

to cub-users

Hello all,

I know that you can define a custom functor when calling CUB device reduce from HOST function, but I am trying to call sum from within a CUDA kernel and am unable to do something similar. Can anyone tell me if and how to do this? The following is code snippet from a defined CUDA kernel where I would like to replace all float-based call(s) with cuFloatComplex-based call(s) if possible:

__global__ myKernel(...){
  // would like a single cuFloatComplex 'mag' variable rather than float 'mag_x' and 'mag_y'
  cuFloatComplex mag;
  float mag_x = 0.0f;
  float mag_y = 0.0f;
  ...
  typedef cub::BlockReduce<float, 256> BlockReduceT;  // would like to make this 'cuFloatComplex' rather than 'float'

  __shared__ typename BlockReduceT::TempStorage temp_storage;
  ...
  // The following 'Sum' is where I would like to sum a single cuFloatComplex variable 'mag'
  float aggregatex = BlockReduceT(temp_storage).Sum(mag_x);
  float aggregatey = BlockReduceT(temp_storage).Sum(mag_y);
  ...
}

Thank you for any help or hints.

rhaney

unread,

Apr 15, 2019, 12:16:22 PM4/15/19

to cub-users

I figured it out - nothing special but I will post the code below for anyone who may be looking for something similar.

struct customSum{
    __device__ __forceinline__ 
    cuFloatComplex operator()(const cuFloatComplex &a, const cuFloatComplex &b) {
      return make_cuFloatComplex(cuCrealf(a)+cuCrealf(b),cuCimagf(a)+cuCimagf(b));
    }
};
...
__global__ void myKernel(...){
  customSum sum_op;
  cuFloatComplex mag;
  mag.x = mag.y = 0.0f;
  ...
  typedef cub::BlockReduce<cuFloatComplex, 256> BlockReduceT;


 __shared__ typename BlockReduceT::TempStorage temp_storage;
  ...


  cuFloatComplex aggregate = BlockReduce(temp_storage).Reduce(mag, sum_op);  ...
}

Reply all

Reply to author

Forward

0 new messages