Pitched memory support

34 views
Skip to first unread message

Martin Dušek

unread,
Jan 4, 2015, 4:44:49 PM1/4/15
to cub-...@googlegroups.com
Hi, 

What about a pitched memory support for device level functions like reduction, sum, histogram, etc. 
I'm currently doing a memory copy (pitched to continuous)... 
:)

Could this be simply implemented only as an Input/Output iterator (without modifying the algorithms)?

Thanks,
Martin

Apostolis Glenis

unread,
Jan 5, 2015, 8:29:51 AM1/5/15
to Martin Dušek, cub-...@googlegroups.com
Wouldn't the input output iterator be the same as doing a memory copy?

--
http://nvlabs.github.com/cub
---
You received this message because you are subscribed to the Google Groups "cub-users" group.
To post to this group, send email to cub-...@googlegroups.com.
Visit this group at http://groups.google.com/group/cub-users.

Martin Dušek

unread,
Jan 5, 2015, 8:49:00 AM1/5/15
to cub-...@googlegroups.com, dus...@gmail.com
I don't understand what you exactly mean...
I think input/output iterator could allocate no additional memory, so it should be not same as memory copy.

This is a simple sum example from docs:

#include <cub/cub.cuh>   // or equivalently <cub/device/device_radix_sort.cuh>
// Declare, allocate, and initialize device pointers for input and output
int  num_items;      // e.g., 7
int  *d_in;          // e.g., [8, 6, 7, 5, 3, 0, 9]
int  *d_out;         // e.g., [ ]
...
// Determine temporary device storage requirements
void     *d_temp_storage = NULL;
size_t   temp_storage_bytes = 0;
cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_sum, num_items);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run sum-reduction
cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_sum, num_items);


I would like to allocate d_in as 2D pitched memory and I think it could be possible to implement an iterator which only "translate an address" (skipping invalid parts of aligned memory) for cub::DeviceReduce::Sum.
So there is no need for a copy and there is no need for additional memory allocation! 


Dne pondělí, 5. ledna 2015 14:29:51 UTC+1 Apostolis Glenis napsal(a):

Jack Morrison

unread,
Aug 19, 2015, 11:02:26 AM8/19/15
to cub-users
I'd like to put in a second vote for pitched memory support. It's a feature that's missing in most CUDA-supporting libraries (including Thrust), but using pitched memory is crucial to image processing applications.

I assembled a custom iterator wrapper for Thrust, but it requires a transform_iterator computing an integer mod and an integer division from a counting_iterator's input for each increment which considerably hurts performance, though it allows rapid prototyping with Thrust.

Cheers,
Jack

Eyal Hirsch

unread,
Aug 13, 2017, 11:32:43 AM8/13/17
to cub-users
Hi, 
 Was this feature ever implemented?
  I need a pitched input for the cub::DeviceReduce::ArgMin function.
  Also, it would be nice if there was a simple example on how to use the cub::DeviceReduce::ArgMin function...

thanks
Eyal
Reply all
Reply to author
Forward
0 new messages