Hey Shin Yee,
This looks great. I don't have much experience with GPGPU. How are big, real-life CUDA programs written generally? Do they involve writing multiple CUDA kernel programs, and running them one after another?
I remember from a project I did back in the university (there was no CUDA back then I think; we'd to write these pixel shader kernels to get work done), the most expensive part about this was getting data in and out of the GPU memory.
What kinds of CUDA programs have you built?
Cheers,
--
Harish Mallipeddi
http://kodekabuki.com