Questions on GPU Acceleration in ParFlow

zitong jia

unread,

Sep 26, 2025, 1:50:07 PM9/26/25

to ParFlow

Dear everyone,

I have one question about the GPU Acceleration in ParFlow.

I am currently working on a study region with a grid size of 538 × 904 × 10. I have compiled ParFlow 3.14.1 using CUDA 12.8, OpenMPI 4.0.3, UCX 1.17.0, Umpire, and Hypre (FoundHypre with CUDA backend). I am running simulations with the MGsemi solver and FullJacobian.

When using two NVIDIA H800 GPUs, I observe a speedup of only about 4x compared to a 144-core CPU run. Increasing to four H800 GPUs does not further improve the speedup—it still remains around 4x.

Could you please advise if this performance is expected for this configuration? Also, are there any recommended strategies or settings to further improve GPU acceleration in this scenario?

Thank you very much for your time and guidance.

Georgios Artavanis

unread,

Jan 10, 2026, 9:27:19 AMJan 10

to ParFlow

Hi Zitong,

The CUDA backend is best used for large domains (>500K cells). That's when significant speedups can be achieved. For smaller domains, the speedup that you observe is reasonable. You can read more about it in this paper, which presents the development of the parflow GPU backend:

https://link.springer.com/article/10.1007/s10596-021-10051-4

Best,

George

zitong jia

unread,

Jan 16, 2026, 10:25:41 AMJan 16

to ParFlow

Thank you very much for your explanation.

But I have one question about I observe specifically during spin-up simulations on my study domain.

When I run the spin-up using the CUDA backend, I notice that the simulation becomes progressively slower as the spin-up, meaning that per solver iteration increases with runtime. In contrast, when I run the same spin-up using MPI on CPUs, I do not observe this behavior. The MPI-based runs show relatively stable.

Below is the solver configuration I am using for these runs setting for CUDA. I have tried several configurations and checks on my side, but I have not been able to resolve this issue.

I would greatly appreciate any insight or suggestions you may have on these points.

Thank you again for your time and help.

pfset OverlandFlowSpinUp  1
pfset OverlandSpinupDampP1 10
pfset OverlandSpinupDampP2 0.1
#-----------------------------------------------------------------------------
# Set solver parameters
#-----------------------------------------------------------------------------
# ParFlow Solution

pfset Solver.Spinup                                   True
                                                          
pfset Solver                                          Richards
pfset Solver.TerrainFollowingGrid                     True
pfset Solver.TerrainFollowingGrid.SlopeUpwindFormulation  Upwind

                                                          
pfset Solver.MaxIter                                  500000


pfset Solver.MaxIter                                  87600
pfset Solver.Drop                                     1E-20
pfset Solver.AbsTol                                   1E-8
pfset Solver.MaxConvergenceFailures                   8
pfset Solver.Nonlinear.MaxIter                          100
pfset Solver.Nonlinear.ResidualTol                    1e-4

## new solver settings for Terrain Following Grid

pfset Solver.Nonlinear.EtaChoice                         EtaConstant

pfset Solver.Nonlinear.EtaValue                           0.01
pfset Solver.Nonlinear.UseJacobian                       True
pfset Solver.Nonlinear.DerivativeEpsilon                 1e-15
pfset Solver.Nonlinear.StepTol                           1e-20
pfset Solver.Nonlinear.Globalization                     LineSearch
pfset Solver.Linear.KrylovDimension                      50
pfset Solver.Linear.MaxRestarts                           2
pfset Solver.OverlandKinematic.Epsilon                1e-5



pfset Solver.Linear.Preconditioner                       MGSemi
pfset Solver.Linear.Preconditioner.SymmetricMat          Symmetric
pfset Solver.Linear.Preconditioner.MGSemi.MaxIter        1
pfset Solver.Linear.Preconditioner.MGSemi.MaxLevels      10




pfset Solver.Drop                                       1E-20
pfset Solver.AbsTol                                     1E-10

Best regards,
Jia

Georgios Artavanis

unread,

Jan 30, 2026, 11:22:50 AMJan 30

to ParFlow

Hello,

when you say your simulation becomes slower, can you explain more? Are the kinsol files from the GPU and the CPU versions showing similar solver performance for every timestep? Or is the time it takes to do the same work getting slower with time? In the second case, can you show a plot of the performance over time (e.g. from the timestamps of the parflow output files, like pressure files).