CUB with cuComplex

97 views
Skip to first unread message

rhaney

unread,
Apr 5, 2019, 10:35:36 AM4/5/19
to cub-users
Hello all,

I am new to CUDA CUB and have a quick question regarding DeviceReduce. Does CUB have a definition for using DeviceReduce with cuComplex data type(s)? If so, can anyone point me to a link example/help on how this is accomplished?

Thank you for any help.

Robert Crovella

unread,
Apr 5, 2019, 11:03:33 AM4/5/19
to cub-users, rhaney
If I were trying this, I would try it with thrust::complex<T> type.

This may get you started:

#include <cub/cub.cuh>  
#include <thrust/complex.h>

// reduction functor
struct CustomSum
{
    template <typename T>
    __device__ __forceinline__
    T operator()(const T &a, const T &b) const {
        return a+b;
    }
};
int main(){

// Declare, allocate, and initialize device-accessible pointers for input and output
int          num_items;
thrust::complex<float>          *d_in;
thrust::complex<float>          *d_out;
CustomSum    sum_op;
thrust::complex<float>          init;
// Determine temporary device storage requirements
void     *d_temp_storage = NULL;
size_t   temp_storage_bytes = 0;
cub::DeviceReduce::Reduce(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items, sum_op, init);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run reduction
cub::DeviceReduce::Reduce(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items, sum_op, init);
}



Note that certain types of "reductions" won't be possible on complex types, e.g. max/min reductions.  Complex types have no concept of ordering like real types do.

--
http://nvlabs.github.com/cub
---
You received this message because you are subscribed to the Google Groups "cub-users" group.
To post to this group, send email to cub-...@googlegroups.com.
Visit this group at https://groups.google.com/group/cub-users.

rhaney

unread,
Apr 5, 2019, 2:59:40 PM4/5/19
to cub-users
Thank you for the quick response. You are correct, Thrust would be a good way to go. Unfortunately I have been restricted in design to avoid using Thrust - large overhead?

Would the way to go be run a reduction on the x portion of cuComplex then y portion? I was hoping for a more direct method already defined in CUB.

Thanks again.

Robert Crovella

unread,
Apr 5, 2019, 6:24:16 PM4/5/19
to cub-users, rhaney
On general principle, its better to use well engineered libraries than to write your own code.  That is basic software engineering mantra.  Presumably why you have an interest in cub.  So I would question a vague claim of "large overhead".  The concern ought to be quantified.  This usage of thrust::complex is essentially just importing a datatype (class definition) into your code.  It should not include any thrust algorithms or container entities (which is the bulk of what thrust is).


Anyway it's not hard to do this using cuComplex.  It just requires an appropriate reduction functor definition:

#include <cub/cub.cuh>   
#include <cuComplex.h>


// CustomMin functor
struct CustomSum
{
    __device__ __forceinline__
    cuFloatComplex operator()(const cuFloatComplex &a, const cuFloatComplex &b) const {
        return make_cuFloatComplex(cuCrealf(a)+cuCrealf(b), cuCimagf(a)+cuCimagf(b));
    }
};

int main(){

// Declare, allocate, and initialize device-accessible pointers for input and output
  int          num_items;
  cuFloatComplex    *d_in;
  cuFloatComplex    *d_out;
  CustomSum    sum_op;
  cuFloatComplex    init;
  // Determine temporary device storage requirements
  void     *d_temp_storage = NULL;
  size_t   temp_storage_bytes = 0;
  cub::DeviceReduce::Reduce(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items, sum_op, init);
  // Allocate temporary storage
  cudaMalloc(&d_temp_storage, temp_storage_bytes);
  // Run reduction
  cub::DeviceReduce::Reduce(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items, sum_op, init);
}

I don't know what you mean by "a more direct method already defined in CUB."  If this isn't what you're looking for, I probably won't be able to help you.


cub::DeviceReduce::Reduce(d_ temp_storage, temp_storage_bytes, d_in, d_out, num_items, sum_op, init);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run reduction
cub::DeviceReduce::Reduce(d_ temp_storage, temp_storage_bytes, d_in, d_out, num_items, sum_op, init);
}



Note that certain types of "reductions" won't be possible on complex types, e.g. max/min reductions.  Complex types have no concept of ordering like real types do.

On Friday, April 5, 2019, 9:35:37 AM CDT, rhaney <comps...@gmail.com> wrote:


Hello all,

I am new to CUDA CUB and have a quick question regarding DeviceReduce. Does CUB have a definition for using DeviceReduce with cuComplex data type(s)? If so, can anyone point me to a link example/help on how this is accomplished?

Thank you for any help.

--
http://nvlabs.github.com/cub
---
You received this message because you are subscribed to the Google Groups "cub-users" group.
To post to this group, send email to cub-...@googlegroups.com.
Visit this group at https://groups.google.com/ group/cub-users.

rhaney

unread,
Apr 5, 2019, 6:38:40 PM4/5/19
to cub-users
Thank you for the help. The code sample you posted looks exactly like what I was looking for.

I agree on principle as well. You should always use available resources when you can - why "reinvent the wheel". However, I am unable to use Thrust as per the program lead. 

Thank you again.
Reply all
Reply to author
Forward
0 new messages