We have an implementation of a subset of cantera (ideal gas thermodynamics, mixture-averaged transport, and homogeneous kinetics) running on GPU (CUDA at the moment) as well as multicore. It is tuned for performance on PDE solvers (e.g. it is vectorized). We will be porting it to Xeon Phi in the coming 1-2 years.
We are currently seeing at least 10x speedups (some cases as high as 100x) on K20 cards over CPU speeds, and our vectorized CPU implementation is slightly faster than stock Cantera.
We don't yet have these results published (this work is very new), and are not yet ready to publicly release the code, but it will be open source (MIT license) and should be available in the coming year. It is strictly a C++ implementation, geared toward HPC applications with PDEs.
Best wishes,
James