Pitched memory support

Martin Dušek

unread,

Jan 4, 2015, 4:44:49 PM1/4/15

to cub-...@googlegroups.com

Hi,

What about a pitched memory support for device level functions like reduction, sum, histogram, etc.

I'm currently doing a memory copy (pitched to continuous)...

:)

Could this be simply implemented only as an Input/Output iterator (without modifying the algorithms)?

Thanks,

Martin

Apostolis Glenis

unread,

Jan 5, 2015, 8:29:51 AM1/5/15

to Martin Dušek, cub-...@googlegroups.com

Wouldn't the input output iterator be the same as doing a memory copy?

--
http://nvlabs.github.com/cub
---
You received this message because you are subscribed to the Google Groups "cub-users" group.
To post to this group, send email to cub-...@googlegroups.com.
Visit this group at http://groups.google.com/group/cub-users.

Martin Dušek

unread,

Jan 5, 2015, 8:49:00 AM1/5/15

to cub-...@googlegroups.com, dus...@gmail.com

I don't understand what you exactly mean...

I think input/output iterator could allocate no additional memory, so it should be not same as memory copy.

This is a simple sum example from docs:

#include <cub/cub.cuh> // or equivalently <cub/device/device_radix_sort.cuh>

// Declare, allocate, and initialize device pointers for input and output

int num_items; // e.g., 7

int *d_in; // e.g., [8, 6, 7, 5, 3, 0, 9]

int *d_out; // e.g., [ ]

...

// Determine temporary device storage requirements

void *d_temp_storage = NULL;

size_t temp_storage_bytes = 0;

cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_sum, num_items);

// Allocate temporary storage

cudaMalloc(&d_temp_storage, temp_storage_bytes);

// Run sum-reduction

cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_sum, num_items);

I would like to allocate d_in as 2D pitched memory and I think it could be possible to implement an iterator which only "translate an address" (skipping invalid parts of aligned memory) for cub::DeviceReduce::Sum.

So there is no need for a copy and there is no need for additional memory allocation!

Dne pondělí, 5. ledna 2015 14:29:51 UTC+1 Apostolis Glenis napsal(a):

Jack Morrison

unread,

Aug 19, 2015, 11:02:26 AM8/19/15

to cub-users

I'd like to put in a second vote for pitched memory support. It's a feature that's missing in most CUDA-supporting libraries (including Thrust), but using pitched memory is crucial to image processing applications.

I assembled a custom iterator wrapper for Thrust, but it requires a transform_iterator computing an integer mod and an integer division from a counting_iterator's input for each increment which considerably hurts performance, though it allows rapid prototyping with Thrust.

Cheers,

Jack

Eyal Hirsch

unread,

Aug 13, 2017, 11:32:43 AM8/13/17

to cub-users

Hi,

Was this feature ever implemented?

I need a pitched input for the cub::DeviceReduce::ArgMin function.

Also, it would be nice if there was a simple example on how to use the cub::DeviceReduce::ArgMin function...