GPU acceleration in Cantera

Seamus Kane

unread,

Mar 11, 2016, 6:18:18 PM3/11/16

to Cantera Users' Group

Hello Everyone,

I have been using Cantera to model fuel reforming processes for several months and my work could benefit greatly from some form of GPU acceleration.
Is there support for Cantera using CUDA-capable GPU's that exists at the moment? I understand that there exists a python wrapper/compiler within the CUDA dev kit, but I'm not sure where to start in implementing Cantera on a GPU.

Does anyone have experience with this that could weigh in? Any help would be greatly appreciated.

-Seamus

Nick Curtis

unread,

Mar 14, 2016, 2:33:29 PM3/14/16

to Cantera Users' Group

Seamus,
As someone who does have a quite a bit of experience along these lines I can tell you it is not an easy problem to solve.
The state of the art shows substantial speedups for explicit solvers with moderate to little amounts of chemical stiffness, and modest speedups to substantial slowdowns when using implicit solvers (needed for high stiffness) -- in fact my current research is looking at how to solve/work around that second problem.
To my knowledge there is no current CUDA support in Cantera, nor is there an easy way to integrate such support.

To do so I would suggest the following approach:

Select a solver. CVODE would be the standard choice, but this may limit to you to a per-block approach as thread-divergence is a huge issue for implicit GPU solvers. Alternatively the hybrid MTS solver or a fixed order Richardson extrapolant method may be a better choice
Port to CUDA. This is a huge step and not easy to accomplish. If you chose a per-block approach, you will likely have all sort of fun trying to generalize your code to be reasonably efficient for a generic mechanism. A per-thread approach is easier to make generic, but more difficult to effectively utilize the GPU caches and shared memory, also it has much more potential for thread divergence
Write a wrapper for the CUDA integrator for Cantera, look at the CVODEs wrapper for inspiration (http://www.cantera.org/docs/doxygen/html/CVodesIntegrator_8cpp_source.html)
Add some support for batch processing (or alternatively a different base reactor type which integrates multiple types)
Write several papers and pat yourself on the back, you just accomplished a huge task. Seriously, this is not at all straightforward or easy to achieve. Feel free to use pyJac to generate Jacobian / rate evaluation subroutines for CUDA. AFAIK it's the only tool out there available for this task (per-thread)

For further reading, seek out Stone & Davis' CVODE implementation paper (Techniques for solving stiff chemical kinetics on GPUs), Sewerin et al's implicit GPU solver paper (A methodology for the integration of stiff chemical kinetics on GPUs) or one of several papers from my own research group (Niemeyer's moderately stiff paper http://kyleniemeyer.com/pubs/paper-moderately-stiff-GPU/ or my own upcoming manuscript)

Best,

Nick

Ray Speth

unread,

Mar 15, 2016, 1:40:29 PM3/15/16

to Cantera Users' Group

Seamus and Nick,

There are some cases where you can use the GPU to get some significant performance improvements for stiff problems without a huge amount of implementation effort. A while back, Yu Shi and I did a proof-of-concept implementation for using CUDA and MAGMA to do just the LU factorization and linear solves in Sundials for the Cantera reactor network model. This works out pretty well for very large mechanisms (i.e. 1000+ species) since almost all of the work normally is the LU factorization (which scales as number of species cubed) compared to the evaluation of the governing equations (which is roughly number of species squared, given the finite difference approach to constructing the Jacobian).

The modifications to Sundials can be seen at https://github.com/athlonshi/Sundials-MAGMA, and the very minor modifications to Cantera are posted at https://github.com/athlonshi/cantera/commits/reactor-work. I think there may be better ways of doing this without having to modify Sundials, in which case it might be feasible to bring this capability into Cantera.

Regards,

Ray

Nick Curtis

unread,

Mar 15, 2016, 4:55:25 PM3/15/16

to Cantera Users' Group

Ray,

Now that I think of it, another use case (which would potentially be more useful than 1000+ species mechanisms as these are limited to essentially 0-D only) may be to utilize cuBLAS for linear algebra in the 1-D solvers (LU Factorization, matrix operations etc.) as those Jacobian's can get large in a hurry.

Nick

usama meraj

unread,

Mar 6, 2024, 5:22:52 PM3/6/24

to Cantera Users' Group

Hello all, i am trying to simulate recirculation flow and reactor network within classified combustion, using n Dodecane PAH mechanism any guidance on how i could push this on GPU are there any capabilities

Ray Speth

unread,

Mar 15, 2024, 2:16:07 PM3/15/24

to Cantera Users' Group

Hi,

GPU acceleration for the reactor network solver is still a “wishlist” item, which we’re keeping track of as Enhancement Proposal #33. Feel free to give that proposal a “thumbs up” on GitHub to express your interest.

You might also want to try using the sparse, preconditioned solver for reactor networks that was introduced in Cantera 3.0. To use it, you just need to use the IdealGasMoleReactor and IdealGasConstPressureMoleReactor reactor types instead of their mass-fraction based counterparts, and assign an AdaptivePreconditioner object to the reactor network’s preconditioner property. See the example preconditioned_integration.py. Depending on the size of your mechanism and the reactor network, this can speed things up by more than an order of magnitude.

Regards,
Ray

Reply all

Reply to author

Forward