[sundials-users] CVode on multiple CUDA-streams

83 views

Skip to first unread message

Nurdinova, Aizada

unread,

Jun 21, 2022, 11:29:37 AM6/21/22

to SUNDIAL...@listserv.llnl.gov

Hi, my name is Aizada and I'm trying to use CVODE solvers on GPU for MRI simulations. I had a question about using CVode on multiple CUDA streams.

I wanted to have several CUDA non-default streams to run in parallel and solve the differential equations for different parts of the grid independently. For that, I defined several models and assigned a certain stream for each, which the host launches in a loop.

However, I see in the Nsights profiler that some CVode kernels (linearSum, Scale, wL2NormSquare, etc) are using cudaStreamSynchronize which blocks my host on one stream and doesn't allow calling the next model on the other stream.

So my question: is creating several pthreads the only way to run CVode solvers on different CUDA streams in parallel? Or is there another workaround?

Best regards,
Aizada.

To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV

Balos, Cody Joe

unread,

Jun 21, 2022, 12:52:18 PM6/21/22

to SUNDIAL...@listserv.llnl.gov

Hello Aizada,

Yes, you will need to use some sort of CPU multithreading (e.g., Pthreads, OpenMP, std::thread) to launch the solvers to avoid blocking on the host. When using CVODE with GPUs, the integrator logic still lives on the CPU and the data operations (e.g., vector operations) are done on the GPU. The CUDA streams only apply to execution on the device, so naturally if you want fully asynchronous behavior on the CPU and GPU you will need separate CPU threads of execution.

Regards,

Cody

Reply all

Reply to author

Forward

0 new messages