When we have start the project we have used the "stl sort" to compare the sort rate.
Our current CPU version is really faster and it is why we have remove the STL sort.
If you are interested by a pure CPU version, you can check the AMD samples.
You can also check Duane Merril web site, his method can also work on the CPU but
request some works. (Intel has implement it in one of their library I think).
Another interesting point will be to run it on the Fusion and Sandry Bridge APU.
Regards
Krys