weird results with cudamemcpy

David Pham

unread,

Apr 6, 2020, 1:46:41 AM4/6/20

to Parallel and Distributed Systems at SFU (431)

Hi everyone,

I'm been trying to figure out what I've been doing wrong with the way Im copying host memory to device and vice versa for the last hour, but I can't seem to figure it out. I've rewritten my code to share only the the memory copying section.

int numBlocks = (img_size + threadsPerBlock - 1 ) / 256;
 gpu_create_histogram<<<numBlocks, threadsPerBlock >>>(d_hist, d_img, result.w, result.h);
 cudaDeviceSynchronize();

 cudaMemcpy(h_hist, d_hist, sizeof(int) * 256, cudaMemcpyDeviceToHost);
 for(int i = 0; i<256; i++){
   printf("hist[%d] : %d\n", i, h_hist[i]);
 }

Enter code here...

// Generate a histogram
__global__ void gpu_create_histogram(int *hist_out, unsigned char *img_in,
 int imgWidth, int imgHeight) {
     // some calculation for index 
  
    if(index < imgWidth * imgHeight){
      atomicAdd(&hist_out[img_in[index]], 1);
    }
}

I'm expecting that my hist array would be filled with the tallies for pixel values in the image, but it is still at its initial values of 0s. If anyone could point me in the right direction, that would be awesome.

Nicholas Fung

unread,

Apr 6, 2020, 3:28:38 AM4/6/20

to Parallel and Distributed Systems at SFU (431)

Have you been retrieving the CUDA status your cudaMemcpy calls? Those can help you make sure that you're copying data correctly. Also, am I correct to assume that the first image is a section of your gpu_perform_equalization? If so, you don't need to be calculating numBlocks: you're already passing in dim3 pointers to your grid and block sizes. In my implementation, gpu_perform_equalization is the only function needs to call gpu_create_histogram.

Bruno Da Silva

unread,

Apr 6, 2020, 3:34:21 AM4/6/20

to Parallel and Distributed Systems at SFU (431)

Just tried following the implementation you gave @David. Assuming you're calculating the index correctly, and setting up/mallocing/memcpying d_hist/d_img correctly; this implementation works.

I have a feeling your problem is likely not in the copying of data, but instead in your condition (ie. index/width/height have bad values) OR your img_in isnt setup correctly so its not indexing into the hist array properly.

Also; does the CUDA API _require_ that your block size/thread size be dim3's? Because you're currently not using dim3's here.

Once you figure this out (I'm assuming this is for testing); Nicholas is right, you should probably stick to the equalize function if you can.

Khang Bùi

unread,

Apr 8, 2020, 4:13:39 AM4/8/20

to Parallel and Distributed Systems at SFU (431)

@David did you end up figuring out what happened here? I'm encountering the same problems - my histogram was filled inside gpu_create_histogram but then after copying them back from device into host memory the values are empty.

dj olay

unread,

Apr 8, 2020, 5:25:28 AM4/8/20

to Parallel and Distributed Systems at SFU (431)

I had the same problem. Are you calling gpu_create_histogram properly? make sure that you specify gpu_create_histogram<<<dimGrid,dimBlock>>> in that order. Doing it the other way around is what my issue was. Also, make sure that you're using the right dimensions.

If you're going to index through your gpu in 1D then you want to set the y, and z dimensions to 1. for example dim3 threadsperBlk(32,1,1) and dim3 dimGrid(Nx*Ny/threadsperBlk). If you're gonna index using more than one dimension then account for that in your index.

Also, remember to copy from device back to host.

Hopefully that helps

A

unread,

Apr 8, 2020, 11:17:42 AM4/8/20

to parallel-s...@googlegroups.com

It turns out that my problem was that I was setting up dimGrid, dimBlock incorrectly and that that I had an error with my memcopy. After adding error checks, it was much easier to track down my issue. If CUDAmemcopy fails, it’s a silent fail. In my case, my image wasn’t actually being copied over from host to device.

Best,

Dave

On Apr 8, 2020, at 2:25 AM, dj olay <djol...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Parallel and Distributed Systems at SFU (431)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to parallel-systems...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/parallel-systems-sfu/e46c7b83-4355-4fb5-ab74-5f46602e6c4d%40googlegroups.com.

Khang Bùi

unread,

Apr 9, 2020, 5:29:53 AM4/9/20

to Parallel and Distributed Systems at SFU (431)

Thank guys for the comments.

Reply all

Reply to author

Forward