GPU/CUDA programming is easy if we ignore the performance, or even the correctness of the program. It becomes tough when the performance is critical, one has to optimize very hard on the specific hardware. Fortunately, GPU hardware performance improves drastically every 2 years. Unfortunately, the performance is not portable across different generations of GPUs.
Prof Chen from Tshing Hua University is proposing MapCG, a MapReduce framework as a resolution to the portability problem.
Check out the details of the seminar in the following link: