There has been a lot of research over the years on this, though mostly regarding analytical workloads -- search, currently, isn't quite a fit. That said, the points you bring up -- moving data from system memory to GPU memory, etc. -- are a significant part of the work people in academia and companies alike are working to address, both in software and in hardware.
For the purposes of my reply, I'm going to ignore the benchmark numbers; what you quoted is pretty much apples to oranges, and it's what it doesn't say that's the important part (were the data structures optimized for, and was the code written to benefit from, the monster $925k setup?). What I'll do is address the point about moving data around being slow, relatively speaking, and provide a handful of resources to do some exploration on your own. I find the use of GPUs in databases compelling (not to mention other configurations, e.g., arrays of lower-power, lower-cost processors with solid state storage, FPGA configurations, & perhaps most of all, hybrids). The more I follow the developments, the more I find it's both many-faceted and not without nuance.
I also want to say up front that I do not have in-depth experience applying any of this, and certainly not in a production setting -- yet. I've been reading everything I can get my hands on and following a handful of companies and related subjects. In that light, I'll share some of the resources I've gotten the most from.
So, to address the bandwidth issues, I'll start by saying that nVidia has been working on this for some time on both the hardware and software front, mostly by way of what they call "GPUDirect":
NVIDIA GPUDIRECT
NVIDIA GPUDirect Technology
Regarding 1 GB of VRAM, that's pretty outdated; the latest cards (e.g., Tesla K10, M2090, M2075, ...) come with up to 6 GB of GDDR5 w/ a single GPU (the Fermis), or 8 GB w/ two GPUs (Kepler) and up to 512 cores w/ a Fermi, or 3072 with the Kepler -- these things are monsters:
Another project you will probably want to take a look at is The Virginian Database, written summer of 2010 @ NEC Labs:
The Virginian Database
"an experimental heterogeneous SQL database written to compare data processing on the CPU and NVIDIA GPUs"
directly related papers:
Efficient Data Management for GPU Databases
Accelerating SQL Database Operations on a GPU with CUDA
ParStream is one GPU-specific database that I've been following for some time (though undoubtedly there are others -- let me know if you find others like it!):
Here are a handful of general GPU-specific, in terms of databases, pubs I keep going back to:
Oncilla - Optimizing Accelerator Clouds for Data Warehousing Applications
Relational Query Co-Processing on Graphics Processors
High-Throughput Transaction Executions on Graphics Processors
Self-Tuning Distribution of DB-Operations on Hybrid CPU/GPU Platforms
Scaling PostgreSQL Using CUDA
Comparing CPU and GPU in OLAP Cube Creation
GPU Processors in Databases
MOLAP based on parallel scan
I would like to add that exploring other similar types of systems, FPGAs in particular, provide, if nothing else, a useful contrast for the fundamental data structures, processing mechanics, and underlying operations that are typical targets for hardware accelerated database systems; to that end, here are a few ones I've found to be particularly interesting:
The “Chimera”: An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform
With the above paper in particular, check out section 5, and especially section 6, "Berkeley's Thirteen Dwarves". The decomposition of database operations into their fundamentals is both a fun exercise, though their application in specialized hardware will quickly give you a picture of the big and small aspects of each type of acceleration (i.e., CPU vs. FPGA vs. GPU).
Other FPGA pubs (two of which directly address search):
FPGA: What’s in it for a Database?
FPGAs: A New Point in the Database Design Space
An FPGA-based Search Engine for Unstructured Database
FPGA based hardware implementation and parallel processing of database operations on streaming projections in C-Store (a column oriented database)
If you're interested in doing some hacking, here are some libraries that I've played with to varying degrees that are good candidates for tinkering:
cudpp: CUDA Data Parallel Primitives Library
thrust: a parallel algorithms library which resembles the C++ Standard Template Library (STL);
thrust graph library
Rootbeer
The Rootbeer GPU Compiler makes it easy to use Graphics Processing Units from within Java.
Lastly, if you find yourself itching to try some of this stuff out, there are several "cloud GPU" on-demand HPC commercial offerings; you're undoubtedly aware of EC2's stuff, but there are a few that are lesser known:
In short, I think it's a great idea, and given the economics of scale that have come into effect, and will likely continue, with respect to both GPU and FPGA, not to mention ARM and ARM-like processors (check out CUDA for ARM
http://www.nvidia.com/object/carma-devkit.html), the prices of the related hardware should continue to be, or become an even better, value prop. Combined with the dropping prices of SSDs, esp. eMLC-backed drives, and given the more or less stable costs of typical server hardware you'll put all this fancy stuff in, it's probably a good bet for the future too -- in my opinion. The research seems to agree.
Hopefully this is helpful information! I'm interested in any other companies, open-source or commercial systems, libraries, etc. that you might come across.
Cheers,
John