I am pleased to announce that a CUDA-enabled, dual-mode solver hierarchy has been developed and organized into SOLVCON in the name space of solvcon.kerpak. I have made a note on the porting:
http://solvcon.net/yyc/writing/2011/cuda_port.html . Although there're still things to be done before a meaningful benchmark, for double-precision, the non-memory-optimized CUDA code on Tesla M2050 is about 10 times faster than one core of Xeon 5506. It doesn't sound too bad for a two-/three-dimensional, mixed-shape CESE code.
The development will be continued immediately after this milestone.