prefix scan on large input arrays

182 views
Skip to first unread message

dibid

unread,
Apr 21, 2015, 9:39:44 PM4/21/15
to cub-...@googlegroups.com
Hi all,

I want to use prefix scan on an array of 35,000,000, but I keep getting core dump. I used the prefix scan for array on 1 million, but when I changed my data type to long I got a core dump.

Does anyone have the same problem?

Thanks,

Robert Crovella

unread,
Apr 21, 2015, 10:42:17 PM4/21/15
to dibid, cub-...@googlegroups.com
core dump usually indicates a problem in host code.
 
perhaps you are not allocating the host data storage correctly?
 
If you are trying to use a stack-based array, you will have trouble with large data sizes.
 
The following modification of the code here:
 
 
seems to work correctly for me (cuda 7, latest CUB 1.4.1, Quadro5000 GPU, Fedora 20):
 
 $ cat t736.cu
#include <cub/cub.cuh>
#include <stdio.h>
typedef int mytype;
int main(){
  // Declare, allocate, and initialize device pointers for input and output
  size_t num_items = 35000000;
  mytype *d_in;
  mytype *h_in;
  mytype *d_out;
  size_t sz = num_items*sizeof(mytype);
  h_in = (mytype *)malloc(sz);
  if (!h_in) {printf("malloc fail\n"); return -1;}
  cudaMalloc(&d_in,  sz);
  cudaMalloc(&d_out, sz);
  for (size_t i = 0; i < num_items; i++) h_in[i] = 1;
  cudaMemcpy(d_in, h_in, sz, cudaMemcpyHostToDevice);
  printf("\nInput:\n");
  for (int i = 0; i < 10; i++) printf("%d ", h_in[i]);
  // Determine temporary device storage requirements
  void *d_temp_storage = NULL;
  size_t temp_storage_bytes = 0;
  cub::DeviceScan::InclusiveSum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
  // Allocate temporary storage
  cudaMalloc(&d_temp_storage, temp_storage_bytes);
  // Run inclusive prefix sum
  cub::DeviceScan::InclusiveSum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
// 
  cudaMemcpy(h_in, d_out, sz, cudaMemcpyDeviceToHost);
  printf("\nOutput:\n");
  for (int i = 0; i < 10; i++) printf("%d ", h_in[i]);
  printf("\n");
  return 0;
}
$ nvcc -o t736 t736.cu
$ cuda-memcheck ./t736
========= CUDA-MEMCHECK
Input:
1 1 1 1 1 1 1 1 1 1
Output:
1 2 3 4 5 6 7 8 9 10
========= ERROR SUMMARY: 0 errors
$

 
I get exactly the same result if I change the typedef from int to unsigned long long.  (arguably the printf statements should be changed in this case)

From: dibid <sepid....@gmail.com>
To: cub-...@googlegroups.com
Sent: Tuesday, April 21, 2015 8:39 PM
Subject: [cub-users: 202] prefix scan on large input arrays

--
http://nvlabs.github.com/cub
---
You received this message because you are subscribed to the Google Groups "cub-users" group.
To post to this group, send email to cub-...@googlegroups.com.
Visit this group at http://groups.google.com/group/cub-users.


Reply all
Reply to author
Forward
0 new messages