prefix scan on large input arrays

182 views

Skip to first unread message

dibid

unread,

Apr 21, 2015, 9:39:44 PM4/21/15

to cub-...@googlegroups.com

Hi all,

I want to use prefix scan on an array of 35,000,000, but I keep getting core dump. I used the prefix scan for array on 1 million, but when I changed my data type to long I got a core dump.

Does anyone have the same problem?

Thanks,

Robert Crovella

unread,

Apr 21, 2015, 10:42:17 PM4/21/15

to dibid, cub-...@googlegroups.com

core dump usually indicates a problem in host code.

perhaps you are not allocating the host data storage correctly?

If you are trying to use a stack-based array, you will have trouble with large data sizes.

The following modification of the code here:

https://devtalk.nvidia.com/default/topic/826914/cuda-programming-and-performance/cub-library/

seems to work correctly for me (cuda 7, latest CUB 1.4.1, Quadro5000 GPU, Fedora 20):

$ cat t736.cu
#include <cub/cub.cuh>
#include <stdio.h>

typedef int mytype;

int main(){

// Declare, allocate, and initialize device pointers for input and output
size_t num_items = 35000000;
mytype *d_in;
mytype *h_in;
mytype *d_out;
size_t sz = num_items*sizeof(mytype);
h_in = (mytype *)malloc(sz);
if (!h_in) {printf("malloc fail\n"); return -1;}
cudaMalloc(&d_in, sz);
cudaMalloc(&d_out, sz);
for (size_t i = 0; i < num_items; i++) h_in[i] = 1;
cudaMemcpy(d_in, h_in, sz, cudaMemcpyHostToDevice);
printf("\nInput:\n");
for (int i = 0; i < 10; i++) printf("%d ", h_in[i]);
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceScan::InclusiveSum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run inclusive prefix sum
cub::DeviceScan::InclusiveSum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
//
cudaMemcpy(h_in, d_out, sz, cudaMemcpyDeviceToHost);
printf("\nOutput:\n");
for (int i = 0; i < 10; i++) printf("%d ", h_in[i]);
printf("\n");
return 0;
}
$ nvcc -o t736 t736.cu
$ cuda-memcheck ./t736
========= CUDA-MEMCHECK

Input:
1 1 1 1 1 1 1 1 1 1
Output:
1 2 3 4 5 6 7 8 9 10
========= ERROR SUMMARY: 0 errors
$

I get exactly the same result if I change the typedef from int to unsigned long long. (arguably the printf statements should be changed in this case)

From: dibid <sepid....@gmail.com>
To: cub-...@googlegroups.com
Sent: Tuesday, April 21, 2015 8:39 PM
Subject: [cub-users: 202] prefix scan on large input arrays

--
http://nvlabs.github.com/cub
---
You received this message because you are subscribed to the Google Groups "cub-users" group.
To post to this group, send email to cub-...@googlegroups.com.
Visit this group at http://groups.google.com/group/cub-users.

Reply all

Reply to author

Forward

0 new messages