The version of TriTree checked into G3D is actually about 2x-3x faster than what we used for the paper, so the CPU steps should improve a lot. If you used an even better ray tracer (e.g., Intel's IPP), you should see another 4x on top of that, which makes the CPU cost minimal. The problem then is the CPU<->GPU data transfers. They are much more expensive than they should be from reading the OpenGL specification. ATI and NVIDIA engineers have been very helpful with optimizing these routines in G3D and I hope to improve the data transfer performance in later releases.
OpenGL is in an awkward state right now. Khronos has done a very good job of taking over the APIs but has not yet jettisoned the legacy code paths. We expect that they will do that soon, merging GL and GL/ES. G3D 9.0 is currently being designed by Corey and myself to match the GL/ES specification very closely, with all fixed-function rendering removed. This will look a lot like DirectX 10 as well. If you want to be cross-platform (which increasingly means iPhone and console, not just PC/Mac/Linux), OpenGL is the only game in town. If you're writing windows/xbox-only code, there's a lot to be said for DirectX. We can't effectively support both, so we've stuck with OpenGL.
There's also been substantial work on OpenCL lately and that will greatly imact the design of G3D 9.0.
-m