CUDA 4.0 is going to release on March 4.
NVIDIA anounces features in
http://gpgpu.org/2011/03/01/cuda-4-0-release#more-3309,
I think some of them are related to PFAC library.
■Thrust C++ Template Performance Primitives Libraries
we use Thrust to do prefix-sum, we may link Thrust to incoming CUDA
4.0
■Multi-GPU Sharing by Single CPU Thread - A single CPU host thread can
access all GPUs in a system. Developers can easily coordinate work
across multiple GPUs for tasks such as "halo" exchange in applications
This is a good feature that we don't need to use OpenMP, or
equivalently, one PFAC context can bind to several GPUs.
■NVIDIA GPUDirect(tm) 2.0 Technology - Offers support for peer-to-peer
communication among GPUs within a single server or workstation. This
enables easier and faster multi-GPU programming and application
performance.
this is useful when one PFAC context wants to utilize more than one
GPU.
■MPI Integration with CUDA Applications - Modified MPI implementations
automatically move data from and to the GPU memory over Infiniband
when an application does an MPI send or receive call.
We don't provide MPI + PFAC library so far. We can consider a HUGE
app, for example DNA analysis and then apply MPI to deal with a large
pattern set.