Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[sundials-users] FW: Unable to solve using SUNLinSol_cuSolverSp_batchQR( )

43 views
Skip to first unread message

Balos, Cody

unread,
Jul 22, 2024, 11:55:41 AM7/22/24
to SUNDIAL...@listserv.llnl.gov

Forwarding this to the mailing list as it never went through due to the IT outage on Friday.

 

 

From: utpal kiran <utpal...@gmail.com>
Date: Thursday, July 18, 2024 at 8:25
AM
To: sundial...@llnl.gov <sundial...@llnl.gov>
Subject: Unable to solve using SUNLinSol_cuSolverSp_batchQR( )

Hi, 

 

I have an application (as a part of a bigger problem) consisting of many independent ODE systems having the same sparsity pattern. I am able to solve this problem on CPU by calling CVODE functions inside a loop,  by using a dense matrix (SUNDenseMatrix) and dense linear solver (SUNLinSol_Dense). Now, I want to solve my problem on a GPU using CUDA  for which i have identified cuSOLVER sparse batched QR linear solver. I have done the problem setup in exactly the same manner as given in 'cvRoberts_block_cusolversp_batchqr.cu' example, with my own RHS and Jacobian functions. However, CVODE throws me the following error.

 

[ERROR][rank 0][/src/cvode/cvode.c:3698][cvHandleFailure] At t = 0 and h = 6.90801e-310, the corrector convergence test failed repeatedly or with |h| = hmin
SUNDIALS_ERROR: CVode() failed with retval = -4

 

What could be wrong? 

It is noted that the jacobian (size: 53x53) for my problem is singular (one whole row consists of zero entries). Is this problem arising because my jacobian is singular? If yes, how SUNLinSol_Dense is able to solve it? If that is the case, how to solve this problem on GPU?

 

Thank you

Utpal Kiran

 

 


To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV



To unsubscribe from the SUNDIALS-USERS list: write to: mailto:SUNDIALS-USERS-...@LISTSERV.LLNL.GOV

Balos, Cody

unread,
Jul 22, 2024, 12:12:52 PM7/22/24
to SUNDIAL...@listserv.llnl.gov

Hi Utpal,

 

The cuSolver sparse QR requires that your matrices have full rank (see https://docs.nvidia.com/cuda/cusolver/index.html#cusolverspxcsrqrbatched). Since your Jacobian matrix is singular (and therefore not full rank) this solver will not work for you. I suggest trying the MAGMA linear solver interface we have in SUNDIALS.

 

Cody

Reynolds, Daniel

unread,
Jul 22, 2024, 1:30:37 PM7/22/24
to SUNDIAL...@listserv.llnl.gov

Hi Utpal,

 

Your Jacobian is only one part of the linear system that CVODE solves within its Newton iteration.  In fact, CVODE solves linear systems of the form (I – gamma*J(t,y))*x = b, where J(t,y) is your singular Jacobian, gamma is a scalar that is proportional to the time step size, and I is the identity matrix.  Thus although your Jacobian is singular, the linear system is not. 

 

That said, the SUNDIALS dense linear solver uses partial pivoting (so does MAGMA), while the cuSolver sparse QR does not.  We have experienced that will ill-conditioned linear systems, partial pivoting can be critical for numerical stability, and we have seen identical results as you mentioned when using cuSolver. 

 

I will thus echo Cody’s recommendation to try MAGMA.

 

Daniel R. Reynolds (he/him)

Professor, SMU Mathematics

214-768-4339

https://people.smu.edu/dreynolds/

utpal kiran

unread,
Aug 26, 2024, 12:50:40 PM8/26/24
to SUNDIAL...@listserv.llnl.gov
Hi Cody and Daniel,

Thank you for your suggestions. I have got MAGMA based linear solver interface working for my problem on GPU. I have a few further questions related to this.

1). For MAGMA based solver, I had to decrease absolute tolerance to 1e-14 to get convergence. I could get my solution converged at abstol=1e-11 on CPU. Why could be the reason behind this? Does it indicate an error in my implementation? or Is it the nature of the solver? I am using MagmaDenseBlock type matrix and MagmaDense linear solver to solve ODE systems in a batch on V100 GPU.

2). I have found that my code works fine if I provide Jacobian as a dense matrix to cuSolver for batched QR decomposition. However, sending the Jacobian as sparse matrix 
 leads to failure. Can you comment on this behaviour? I observed that the performance of cuSolver with dense Jacobian is very poor as compared with MAGMA based solver.

Thanks again,
Utpal 

Brorson, Stuart

unread,
Aug 26, 2024, 2:04:16 PM8/26/24
to SUNDIAL...@listserv.llnl.gov
This is just a shot in the dark, and I am not an expert, so take it with a grain of salt.

I see MAGMA is meant to facilitate running jobs on GPGPU systems.  Many GPUs are 32 bits wide, not 64 like most contemporary CPUs.  Are you seeing differences because you are running on 32 bit hardware?

Stuart



From: sundial...@llnl.gov <sundial...@llnl.gov> on behalf of utpal kiran <utpal...@GMAIL.COM>
Sent: Saturday, August 24, 2024 2:25 AM
To: SUNDIAL...@LISTSERV.LLNL.GOV <SUNDIAL...@LISTSERV.LLNL.GOV>
Subject: Re: [sundials-users] Unable to solve using SUNLinSol_cuSolverSp_batchQR( )
 

utpal kiran

unread,
Aug 28, 2024, 12:32:19 PM8/28/24
to SUNDIAL...@listserv.llnl.gov
Hi Stuart,

I am already using double precision arithmetic for my work on both CPU and GPU. I believe numerical accuracy depends on the capability of the machine to perform high precision arithmetic computation. I am using the NVIDIA Volta V100 GPU that has double precision support.

Utpal

Balos, Cody

unread,
Sep 4, 2024, 6:38:46 PM9/4/24
to SUNDIAL...@listserv.llnl.gov

Hi Utpal,

 

Try turning on the advanced CMAKE options SUNDIALS_DEBUG, SUNDIALS_DEBUG_ASSERT, SUNDIALS_DEBUG_CUDA_LASTERROR when you build.  This may help debug the second case (with cuSolver) in particular.

 

Cody

Reply all
Reply to author
Forward
0 new messages